Module Overview¶

AI Agent for LabVIEW currently consists of two core modules:

LLM
ImageGeneration

These two modules together form a complete capability loop from "task understanding" to "result generation".

Module Overview¶

In a typical workflow:

LLM is responsible for understanding user intent, organizing context, calling tools, image understanding, and result output
ImageGeneration is responsible for image generation and image-text fusion processing (can also be called by LLM as a tool)
Generated results can be passed back to LLM to continue to the next task orchestration

This can be understood as:

LLM is the "brain" (understanding, decision-making, image understanding, and output orchestration)
ImageGeneration is the "image generation executor" (generation and fusion processing)

Module 1: LLM¶

The LLM module handles large language model-related capabilities and is the core entry point of the Agent.

Main Capabilities¶

Multi-model service provider integration and switching, streaming output
Intent understanding and multi-turn context management
Tool calling (Function Calling)
Image understanding (describing, analyzing, and explaining input images)
Structured output, task decomposition, and result output

Typical Inputs¶

User natural language instructions
Conversation history (context)
Image input (for image understanding tasks)
Tool list and tool descriptions
Model parameters (temperature, max output length, etc.)

Typical Outputs¶

Text responses
Image understanding results (descriptions, key points, analysis conclusions)
Tool calling requests (with parameters)
Structured results and final output (for subsequent process handling)

Applicable Scenarios¶

Intelligent Q&A and engineering assistant
Document understanding and summarization
Image content understanding and explanation
LabVIEW project documentation generation
Multi-step task orchestration and execution
Web search and code generation

Module 2: ImageGeneration¶

The ImageGeneration module handles image generation and fusion capabilities, supporting text-to-image, image-to-image, and image-text fusion tasks.

Prerequisites (Important)¶

Currently, ImageGeneration is primarily based on Doubao model capabilities. Please complete the following preparations before use:

Activate Doubao-related services on Volcano Engine and enable the doubao-seedream model
Complete Doubao API configuration in the Agent (endpoint ID used as API Key)
Install Yiqi Intelligence's visual toolkit AI Vision Toolkit for GPU, download: https://www.virobotics.net/product_AIVT_GPU
Confirm that the local environment meets the visual toolkit requirements

Main Capabilities¶

Text-to-image: Generate images based on text prompts
Image-to-image: Style transfer, redrawing, or variant generation based on input images
Image-text fusion: Combine images and text instructions for editing, enhancement, or content completion

Typical Inputs¶

Text prompts (target content, style, constraints)
Input images (for image-to-image or image-text fusion)
Optional model parameters (size, clarity, aspect ratio, random seed, etc.)

Typical Outputs¶

Newly generated or edited image files
Generation task status/result information (success, failure, error reasons)

Applicable Scenarios¶

Experimental workflow diagrams and documentation illustrations (text-to-image)
Style unification and version iteration of existing images (image-to-image)
Collaborative creation of documents, reports, and teaching materials (image-text fusion)

How the Two Modules Work Together¶

Recommended minimum loop:

User inputs task (e.g., "Generate a device flowchart and provide explanations")
LLM analyzes requirements and generates image prompts
LLM directly calls ImageGeneration tool to generate image
LLM supplements explanations based on image results and outputs final response

This pattern is suitable for gradual extension to more complex multi-tool workflows (such as integrating VI-specific modules later).

Development Recommendations¶

First verify the LLM conversation pipeline separately, then integrate ImageGeneration
Define unified prompt templates for image generation to facilitate team reuse
Make models, parameters, and tool descriptions as configuration items to avoid hardcoding in block diagram logic
Add error handling at key nodes (timeout, quota exceeded, invalid parameters)

Capability Classification and Detailed Descriptions¶

The following content is extracted from the "Capability Classification and Detailed Descriptions" in the release documentation to supplement the current module definitions.

A. LLM Capability Classification¶

1) Interaction and Understanding Capabilities¶

Flexible multi-model integration: Support unified integration and switching between multiple mainstream model service providers
Contextual conversation: Support multi-turn dialogue and context management
Streaming output: Support real-time word-by-word return to improve interaction feedback speed
Image understanding: Can describe, analyze, and explain input images

2) Execution and Orchestration Capabilities¶

Tool calling (Function Calling): Can call user-written VI tools to complete tasks
Task decomposition and structured output: Support decomposing complex requirements into executable steps
Result output orchestration: Support unified organization and return of "text results + tool results"

3) Extension and Engineering Capabilities¶

Multi-Agent collaboration: Support multiple Agents for parallel or collaborative processing of complex tasks
Low-code visual development: Follow LabVIEW graphical development paradigm to reduce integration barriers
Pre-built examples: Support rapid verification from basic chat to comprehensive tool calling flows

B. ImageGeneration Capability Classification¶

ImageGeneration is primarily responsible for image generation and image-text fusion processing, and can be called by LLM as a tool.

Text-to-image: Generate images based on text prompts
Image-to-image: Generate variants, redrawing, or stylized results based on input images
Image-text fusion: Perform targeted editing and enhancement combining images and text constraints

Module Overview¶

Module Overview¶

Module 1: LLM¶

Main Capabilities¶

Typical Inputs¶

Typical Outputs¶

Applicable Scenarios¶

Module 2: ImageGeneration¶

Prerequisites (Important)¶

Main Capabilities¶

Typical Inputs¶

Typical Outputs¶

Applicable Scenarios¶

How the Two Modules Work Together¶

Development Recommendations¶

Capability Classification and Detailed Descriptions¶

A. LLM Capability Classification¶

1) Interaction and Understanding Capabilities¶

2) Execution and Orchestration Capabilities¶

3) Extension and Engineering Capabilities¶

B. ImageGeneration Capability Classification¶

Related Reading¶