Module Overview

AI Agent for LabVIEW currently consists of two core modules:

  • LLM

  • ImageGeneration

These two modules together form a complete capability loop from "task understanding" to "result generation".


Module Overview

In a typical workflow:

  1. LLM is responsible for understanding user intent, organizing context, calling tools, image understanding, and result output

  2. ImageGeneration is responsible for image generation and image-text fusion processing (can also be called by LLM as a tool)

  3. Generated results can be passed back to LLM to continue to the next task orchestration

This can be understood as:

  • LLM is the "brain" (understanding, decision-making, image understanding, and output orchestration)

  • ImageGeneration is the "image generation executor" (generation and fusion processing)


Module 1: LLM

The LLM module handles large language model-related capabilities and is the core entry point of the Agent.

Main Capabilities

  • Multi-model service provider integration and switching, streaming output

  • Intent understanding and multi-turn context management

  • Tool calling (Function Calling)

  • Image understanding (describing, analyzing, and explaining input images)

  • Structured output, task decomposition, and result output

Typical Inputs

  • User natural language instructions

  • Conversation history (context)

  • Image input (for image understanding tasks)

  • Tool list and tool descriptions

  • Model parameters (temperature, max output length, etc.)

Typical Outputs

  • Text responses

  • Image understanding results (descriptions, key points, analysis conclusions)

  • Tool calling requests (with parameters)

  • Structured results and final output (for subsequent process handling)

Applicable Scenarios

  • Intelligent Q&A and engineering assistant

  • Document understanding and summarization

  • Image content understanding and explanation

  • LabVIEW project documentation generation

  • Multi-step task orchestration and execution

  • Web search and code generation


Module 2: ImageGeneration

The ImageGeneration module handles image generation and fusion capabilities, supporting text-to-image, image-to-image, and image-text fusion tasks.

Prerequisites (Important)

Currently, ImageGeneration is primarily based on Doubao model capabilities. Please complete the following preparations before use:

  1. Activate Doubao-related services on Volcano Engine and enable the doubao-seedream model

  2. Complete Doubao API configuration in the Agent (endpoint ID used as API Key)

  3. Install Yiqi Intelligence's visual toolkit AI Vision Toolkit for GPU, download: https://www.virobotics.net/product_AIVT_GPU

  4. Confirm that the local environment meets the visual toolkit requirements

Main Capabilities

  • Text-to-image: Generate images based on text prompts

  • Image-to-image: Style transfer, redrawing, or variant generation based on input images

  • Image-text fusion: Combine images and text instructions for editing, enhancement, or content completion

Typical Inputs

  • Text prompts (target content, style, constraints)

  • Input images (for image-to-image or image-text fusion)

  • Optional model parameters (size, clarity, aspect ratio, random seed, etc.)

Typical Outputs

  • Newly generated or edited image files

  • Generation task status/result information (success, failure, error reasons)

Applicable Scenarios

  • Experimental workflow diagrams and documentation illustrations (text-to-image)

  • Style unification and version iteration of existing images (image-to-image)

  • Collaborative creation of documents, reports, and teaching materials (image-text fusion)


How the Two Modules Work Together

Recommended minimum loop:

  1. User inputs task (e.g., "Generate a device flowchart and provide explanations")

  2. LLM analyzes requirements and generates image prompts

  3. LLM directly calls ImageGeneration tool to generate image

  4. LLM supplements explanations based on image results and outputs final response

This pattern is suitable for gradual extension to more complex multi-tool workflows (such as integrating VI-specific modules later).


Development Recommendations

  • First verify the LLM conversation pipeline separately, then integrate ImageGeneration

  • Define unified prompt templates for image generation to facilitate team reuse

  • Make models, parameters, and tool descriptions as configuration items to avoid hardcoding in block diagram logic

  • Add error handling at key nodes (timeout, quota exceeded, invalid parameters)


Capability Classification and Detailed Descriptions

The following content is extracted from the "Capability Classification and Detailed Descriptions" in the release documentation to supplement the current module definitions.

A. LLM Capability Classification

1) Interaction and Understanding Capabilities

  • Flexible multi-model integration: Support unified integration and switching between multiple mainstream model service providers

  • Contextual conversation: Support multi-turn dialogue and context management

  • Streaming output: Support real-time word-by-word return to improve interaction feedback speed

  • Image understanding: Can describe, analyze, and explain input images

2) Execution and Orchestration Capabilities

  • Tool calling (Function Calling): Can call user-written VI tools to complete tasks

  • Task decomposition and structured output: Support decomposing complex requirements into executable steps

  • Result output orchestration: Support unified organization and return of "text results + tool results"

3) Extension and Engineering Capabilities

  • Multi-Agent collaboration: Support multiple Agents for parallel or collaborative processing of complex tasks

  • Low-code visual development: Follow LabVIEW graphical development paradigm to reduce integration barriers

  • Pre-built examples: Support rapid verification from basic chat to comprehensive tool calling flows

B. ImageGeneration Capability Classification

ImageGeneration is primarily responsible for image generation and image-text fusion processing, and can be called by LLM as a tool.

  • Text-to-image: Generate images based on text prompts

  • Image-to-image: Generate variants, redrawing, or stylized results based on input images

  • Image-text fusion: Perform targeted editing and enhancement combining images and text constraints