# LLM & Chat Streaming Stream LLM responses in real time with thinking/answer separation, session history, and multi-turn conversations. ## Overview LLM (Large Language Model) requests on Wiro work differently from standard model runs: - Responses are delivered via `debugoutput`, not the `outputs` file array - Streaming `task_output` messages contain structured `thinking` and `answer` arrays — not plain strings - Multi-turn conversations are supported via `session_id` and `user_id` parameters - `pexit` is the primary success indicator (outputs will be empty) Available LLM models include: - [openai/gpt-5-2](https://wiro.ai/models/openai/gpt-5-2) - [openai/gpt-oss-20b](https://wiro.ai/models/openai/gpt-oss-20b) - [qwen/qwen3-5-27b](https://wiro.ai/models/qwen/qwen3-5-27b) ## Session & Chat History Wiro maintains conversation history per session. By sending a `session_id` and `user_id` parameters: | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `session_id` | string | No | UUID identifying the conversation session. Reuse for follow-up messages. | | `user_id` | string | No | UUID identifying the user. | | `prompt` | string | Yes | The user's message or question. | ```json // First message — start a new session { "prompt": "What is quantum computing?", "session_id": "550e8400-e29b-41d4-a716-446655440000", "user_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7" } // Follow-up — reuse the same session_id { "prompt": "Can you explain qubits in more detail?", "session_id": "550e8400-e29b-41d4-a716-446655440000", "user_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7" } ``` > **Tip:** Generate a UUID for `session_id` when starting a new conversation. Pass the same UUID for all follow-up messages to maintain context. ## Thinking & Answer Phases Many LLM models separate their output into two phases: - **Thinking** — the model's internal reasoning process (chain-of-thought) - **Answer** — the final response to the user When streaming via WebSocket, `task_output` messages for LLM models contain a structured object: ```json { "type": "task_output", "id": "534574", "tasktoken": "eDcCm5yy...", "message": { "thinking": ["Let me analyze this step by step...", "The key factors are..."], "answer": ["Quantum computing uses qubits that can exist in superposition..."] } } ``` | Field | Type | Description | |-------|------|-------------| | `message.thinking` | `string[]` | Array of reasoning/chain-of-thought chunks. May be empty. | | `message.answer` | `string[]` | Array of response chunks. This is the content to show the user. | | `message.raw` | `string` | Optional raw output before thinking/answer separation. | > **Note:** Standard (non-LLM) models send `message` as a plain string. LLM models send it as a `{ thinking, answer }` object. Check the type before parsing. ## Streaming Flow 1. **Run** the model with `prompt`, `session_id`, and `user_id` 2. **Connect** to WebSocket and send `task_info` 3. **Receive** `task_output` messages — each contains the growing `thinking` and `answer` arrays 4. **Display** the latest `answer` array content to the user (optionally show `thinking` in a collapsible section) 5. **Complete** — on `task_postprocess_end`, check `pexit` for success Each `task_output` event contains the **full accumulated** thinking and answer arrays — not just the new chunk. Simply replace your displayed content with the latest arrays. ## Polling Alternative If you don't need real-time streaming, poll `POST /Task/Detail` instead. The final response will be in `debugoutput` as merged plain text: ```json { "result": true, "tasklist": [{ "status": "task_postprocess_end", "pexit": "0", "debugoutput": "Quantum computing uses qubits that can exist in superposition...", "outputs": [] }] } ``` > **Note:** When polling, `debugoutput` contains the merged text. To access separate `thinking` and `answer` arrays, use WebSocket streaming instead.