• Home
  • Dashboard
  • Models
  • Wiro AppsApps
  • Pricing
  • Blog
  • Sign In
  • Sign Up
HomeDashboardModelsWiro AppsAppsPricing
Blog
Documentation
Sign In
Sign Up

Task History

  • Runnings
  • Models
  • Trains
Select project...
The list is empty
No results

You don't have task yet.

Go to Models
  • Models
  • wiro/ask-video
Models
Task History

wiro/
ask-video
Copy Prompt for LLM

View as Markdown
View as Markdown (Full)

ask-video

Convert the video data into textual descriptions.

222runs
0Comments

API Sample: wiro/ask-video

📚 For LLM Integration:

For complete parameter details and examples, please also review the markdown documentation at:
/models/wiro/ask-video/llms.txt
/models/wiro/ask-video/llms-full.txt

You don't have any projects yet. To be able to use our api service effectively, please sign in/up and create a project.

Get your api key
  • curl
  • nodejs
  • csharp
  • php
  • swift
  • dart
  • kotlin
  • go
  • python

Prepare Authentication (Signature)

                            //Sign up Wiro dashboard and create project
export YOUR_API_KEY="YOUR_WIRO_API_KEY";
export YOUR_API_SECRET="XXXXXXXXX";

//unix time or any random integer value
export NONCE=$(date +%s);

//hmac-SHA256 (YOUR_API_SECRET+Nonce) with YOUR_API_KEY
export SIGNATURE="$(echo -n "${YOUR_API_SECRET}${NONCE}" | openssl dgst -sha256 -hmac "${YOUR_API_KEY}")";
    
                        

Create a New Folder - Make HTTP Post Request

Create a New Folder - Response

Upload a File to the Folder - Make HTTP Post Request

Upload a File to the Folder - Response

Run Command - Make HTTP Post Request (Multipart)

                          
# ⚠️ IMPORTANT: Remove all commented lines (starting with #) before running
# Bash doesn't support comments in command continuation (lines ending with \)

curl -X POST "https://api.wiro.ai/v1/Run/wiro/ask-video"  \
-H "x-api-key: ${YOUR_API_KEY}" \
-H "x-nonce: ${NONCE}" \
-H "x-signature: ${SIGNATURE}" \
  // ⚠️ IMPORTANT:
  // - inputVideo: 1 file or URL (send either file or URL, not both)

  // Option 1: Send inputVideo as FILE
  -F "inputVideo=@path/to/video.mp4" \
  -F "inputVideoUrl=" \

  // Option 2: Send inputVideo as URL
  // -F "inputVideo=" \
  // -F "inputVideoUrl=https://cdn.wiro.ai/uploads/sampleinputs/short-video-1.mp4" \
  -F "prompt=Please describe this video in detail." \
  -F "numBeams=1" \
  -F "temperature=0.1" \
  -F "callbackUrl=Optional: Webhook URL for task completion notifications";

    
                        

Run Command - Response

                          
//response body
{
    "errors": [],
    "taskid": "2221",
    "socketaccesstoken": "eDcCm5yyUfIvMFspTwww49OUfgXkQt",
    "result": true
}
    
                        

Get Task Detail - Make HTTP Post Request with Task Token

                          
curl -X POST "https://api.wiro.ai/v1/Task/Detail"  \
-H "Content-Type: application/json" \
-H "x-api-key: ${YOUR_API_KEY}" \
-H "x-nonce: ${NONCE}" \
-H "x-signature: ${SIGNATURE}" \
-d '{
  "tasktoken": "eDcCm5yyUfIvMFspTwww49OUfgXkQt"
}';

    
                        

Get Task Detail - Response

                          
//response body
{
  "total": "1",
  "errors": [],
  "tasklist": [
      {
          "id": "534574",
          "uuid": "15bce51f-442f-4f44-a71d-13c6374a62bd",
          "name": "",
          "socketaccesstoken": "eDcCm5yyUfIvMFspTwww49OUfgXkQt",
          "parameters": {
              "inputImage": "https://api.wiro.ai/v1/File/mCmUXgZLG1FNjjjwmbtPFr2LVJA112/inputImage-6060136.png"
          },
          "debugoutput": "",
          "debugerror": "",
          "starttime": "1734513809",
          "endtime": "1734513813",
          "elapsedseconds": "6.0000",
          "status": "task_postprocess_end",
          "cps": "0.000585000000",
          "totalcost": "0.003510000000",
          "guestid": null,
          "projectid": "699",
          "modelid": "598",
          "description": "",
          "basemodelid": "0",
          "runtype": "model",
          "modelfolderid": "",
          "modelfileid": "",
          "callbackurl": "",
          "marketplaceid": null,
          "createtime": "1734513807",
          "canceltime": "0",
          "assigntime": "1734513807",
          "accepttime": "1734513807",
          "preprocessstarttime": "1734513807",
          "preprocessendtime": "1734513807",
          "postprocessstarttime": "1734513813",
          "postprocessendtime": "1734513814",
          "pexit": "0",
          "categories": "["tool","image-to-image","quick-showcase","compare-landscape"]",
          "outputs": [
              {
                  "id": "6bc392c93856dfce3a7d1b4261e15af3",
                  "name": "0.png",
                  "contenttype": "image/png",
                  "parentid": "6c1833f39da71e6175bf292b18779baf",
                  "uuid": "15bce51f-442f-4f44-a71d-13c6374a62bd",
                  "size": "202472",
                  "addedtime": "1734513812",
                  "modifiedtime": "1734513812",
                  "accesskey": "dFKlMApaSgMeHKsJyaDeKrefcHahUK",
                  "foldercount": "0",
                  "filecount": "0",
                  "ispublic": 0,
                  "expiretime": null,
                  "url": "https://cdn1.wiro.ai/6a6af820-c5050aee-40bd7b83-a2e186c6-7f61f7da-3894e49c-fc0eeb66-9b500fe2/0.png"
              }
          ],
          "size": "202472"
      }
  ],
  "result": true
}
    
                        

Kill Task - Make HTTP Post Request with Task ID

                          
curl -X POST "https://api.wiro.ai/v1/Task/Kill"  \
-H "Content-Type: application/json" \
-H "x-api-key: ${YOUR_API_KEY}" \
-H "x-nonce: ${NONCE}" \
-H "x-signature: ${SIGNATURE}" \
-d '{
  "taskid": "534574"
}';

    
                        

Kill Task - Response

                          
//response body
{
  "errors": [],
  "tasklist": [
      {
          "id": "534574",
          "uuid": "15bce51f-442f-4f44-a71d-13c6374a62bd",
          "name": "",
          "socketaccesstoken": "ZpYote30on42O4jjHXNiKmrWAZqbRE",
          "parameters": {
              "inputImage": "https://api.wiro.ai/v1/File/mCmUXgZLG1FNjjjwmbtPFr2LVJA112/inputImage-6060136.png"
          },
          "debugoutput": "",
          "debugerror": "",
          "starttime": "1734513809",
          "endtime": "1734513813",
          "elapsedseconds": "6.0000",
          "status": "task_cancel",
          "cps": "0.000585000000",
          "totalcost": "0.003510000000",
          "guestid": null,
          "projectid": "699",
          "modelid": "598",
          "description": "",
          "basemodelid": "0",
          "runtype": "model",
          "modelfolderid": "",
          "modelfileid": "",
          "callbackurl": "",
          "marketplaceid": null,
          "createtime": "1734513807",
          "canceltime": "0",
          "assigntime": "1734513807",
          "accepttime": "1734513807",
          "preprocessstarttime": "1734513807",
          "preprocessendtime": "1734513807",
          "postprocessstarttime": "1734513813",
          "postprocessendtime": "1734513814",
          "pexit": "0",
          "categories": "["tool","image-to-image","quick-showcase","compare-landscape"]",
          "outputs": [
              {
                  "id": "6bc392c93856dfce3a7d1b4261e15af3",
                  "name": "0.png",
                  "contenttype": "image/png",
                  "parentid": "6c1833f39da71e6175bf292b18779baf",
                  "uuid": "15bce51f-442f-4f44-a71d-13c6374a62bd",
                  "size": "202472",
                  "addedtime": "1734513812",
                  "modifiedtime": "1734513812",
                  "accesskey": "dFKlMApaSgMeHKsJyaDeKrefcHahUK",
                  "foldercount": "0",
                  "filecount": "0",
                  "ispublic": 0,
                  "expiretime": null,
                  "url": "https://cdn1.wiro.ai/6a6af820-c5050aee-40bd7b83-a2e186c6-7f61f7da-3894e49c-fc0eeb66-9b500fe2/0.png"
              }
          ],
          "size": "202472"
      }
  ],
  "result": true
}
    
                        

Cancel Task - Make HTTP Post Request (For tasks on queue)

                          
curl -X POST "https://api.wiro.ai/v1/Task/Cancel"  \
-H "Content-Type: application/json" \
-H "x-api-key: ${YOUR_API_KEY}" \
-H "x-nonce: ${NONCE}" \
-H "x-signature: ${SIGNATURE}" \
-d '{
  "taskid": "634574"
}';

    
                        

Cancel Task - Response

                          
//response body
{
  "errors": [],
  "tasklist": [
      {
          "id": "634574",
          "uuid": "15bce51f-442f-4f44-a71d-13c6374a62bd",
          "name": "",
          "socketaccesstoken": "ZpYote30on42O4jjHXNiKmrWAZqbRE",
          "parameters": {
              "inputImage": "https://api.wiro.ai/v1/File/mCmUXgZLG1FNjjjwmbtPFr2LVJA112/inputImage-6060136.png"
          },
          "debugoutput": "",
          "debugerror": "",
          "starttime": "1734513809",
          "endtime": "1734513813",
          "elapsedseconds": "6.0000",
          "status": "task_cancel",
          "cps": "0.000585000000",
          "totalcost": "0.003510000000",
          "guestid": null,
          "projectid": "699",
          "modelid": "598",
          "description": "",
          "basemodelid": "0",
          "runtype": "model",
          "modelfolderid": "",
          "modelfileid": "",
          "callbackurl": "",
          "marketplaceid": null,
          "createtime": "1734513807",
          "canceltime": "0",
          "assigntime": "1734513807",
          "accepttime": "1734513807",
          "preprocessstarttime": "1734513807",
          "preprocessendtime": "1734513807",
          "postprocessstarttime": "1734513813",
          "postprocessendtime": "1734513814",
          "pexit": "0",
          "categories": "["tool","image-to-image","quick-showcase","compare-landscape"]",
          "outputs": [
              {
                  "id": "6bc392c93856dfce3a7d1b4261e15af3",
                  "name": "0.png",
                  "contenttype": "image/png",
                  "parentid": "6c1833f39da71e6175bf292b18779baf",
                  "uuid": "15bce51f-442f-4f44-a71d-13c6374a62bd",
                  "size": "202472",
                  "addedtime": "1734513812",
                  "modifiedtime": "1734513812",
                  "accesskey": "dFKlMApaSgMeHKsJyaDeKrefcHahUK",
                  "foldercount": "0",
                  "filecount": "0",
                  "ispublic": 0,
                  "expiretime": null,
                  "url": "https://cdn1.wiro.ai/6a6af820-c5050aee-40bd7b83-a2e186c6-7f61f7da-3894e49c-fc0eeb66-9b500fe2/0.png"
              }
          ],
          "size": "202472"
      }
  ],
  "result": true
}
    
                        

Get Task Process Information and Results with Socket Connection

                          
<script type="text/javascript">
  window.addEventListener('load',function() {
    //Get socketAccessToken from task run response
    var SocketAccessToken = 'eDcCm5yyUfIvMFspTwww49OUfgXkQt';
    WebSocketConnect(SocketAccessToken);
  });

  //Connect socket with connection id and register task socket token
  async function WebSocketConnect(accessTokenFromAPI) {
    if ("WebSocket" in window) {
        var ws = new WebSocket("wss://socket.wiro.ai/v1");
        ws.onopen = function() {
          //Register task socket token which has been obtained from task run API response
          ws.send('{"type": "task_info", "tasktoken": "' + accessTokenFromAPI + '"}');
        };

        ws.onmessage = function (evt) {
          var msg = evt.data;

          try {
              var debugHtml = document.getElementById('debug');
              debugHtml.innerHTML = debugHtml.innerHTML + "\n" + msg;

              var msgJSON = JSON.parse(msg);
              console.log('msgJSON: ', msgJSON);

              if(msgJSON.type != undefined)
              {
                console.log('msgJSON.target: ',msgJSON.target);
                switch(msgJSON.type) {
                    case 'task_queue':
                      console.log('Your task has been waiting in the queue.');
                    break;
                    case 'task_accept':
                      console.log('Your task has been accepted by the worker.');
                    break;
                    case 'task_preprocess_start':
                      console.log('Your task preprocess has been started.');
                    break;
                    case 'task_preprocess_end':
                      console.log('Your task preprocess has been ended.');
                    break;
                    case 'task_assign':
                      console.log('Your task has been assigned GPU and waiting in the queue.');
                    break;
                    case 'task_start':
                      console.log('Your task has been started.');
                    break;
                    case 'task_output':
                      console.log('Your task has been started and printing output log.');
                      console.log('Log: ', msgJSON.message);
                    break;
                    case 'task_error':
                      console.log('Your task has been started and printing error log.');
                      console.log('Log: ', msgJSON.message);
                    break;
                   case 'task_output_full':
                      console.log('Your task has been completed and printing full output log.');
                    break;
                    case 'task_error_full':
                      console.log('Your task has been completed and printing full error log.');
                    break;
                    case 'task_end':
                      console.log('Your task has been completed.');
                    break;
                    case 'task_postprocess_start':
                      console.log('Your task postprocess has been started.');
                    break;
                    case 'task_postprocess_end':
                      console.log('Your task postprocess has been completed.');
                      console.log('Outputs: ', msgJSON.message);
                      //output files will add ui
                      msgJSON.message.forEach(function(currentValue, index, arr){
                          console.log(currentValue);
                          var filesHtml = document.getElementById('files');
                          filesHtml.innerHTML = filesHtml.innerHTML + '<img src="' + currentValue.url + '" style="height:300px;">'
                      });
                    break;
                }
              }
          } catch (e) {
            console.log('e: ', e);
            console.log('msg: ', msg);
          }
        };

        ws.onclose = function() {
          alert("Connection is closed...");
        };
    } else {
        alert("WebSocket NOT supported by your Browser!");
    }
  }
</script>
    
                        

Prepare UI Elements Inside Body Tag

                          
  <div id="files"></div>
  <pre id="debug"></pre>
    
                        

Choose a video that will re-generate

input-video-url-help

Tell us about any details you want to generate

THUDM-cogvlm2-llama3-caption-sample-1.mp4
chat-with-video-sample-1.txt
THUDM-cogvlm2-llama3-caption-sample-3.mp4
chat-with-video-sample-2.txt
1732172821 Report This Model
中文阅读





CogVLM2-Llama3-Caption







Code | 🤗 Hugging Face | 🤖 ModelScope
Typically, most video data does not come with corresponding descriptive text, so it is necessary to convert the video
data into textual descriptions to provide the essential training data for text-to-video models.
CogVLM2-Caption is a video captioning model used to generate training data for the CogVideoX model.









Usage


import io

import argparse
import numpy as np
import torch
from decord import cpu, VideoReader, bridge
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_PATH = "THUDM/cogvlm2-llama3-caption"

DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
TORCH_TYPE = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[
0] >= 8 else torch.float16

parser = argparse.ArgumentParser(description="CogVLM2-Video CLI Demo")
parser.add_argument('--quant', type=int, choices=[4, 8], help='Enable 4-bit or 8-bit precision loading', default=0)
args = parser.parse_args([])


def load_video(video_data, strategy='chat'):
bridge.set_bridge('torch')
mp4_stream = video_data
num_frames = 24
decord_vr = VideoReader(io.BytesIO(mp4_stream), ctx=cpu(0))

frame_id_list = None
total_frames = len(decord_vr)
if strategy == 'base':
clip_end_sec = 60
clip_start_sec = 0
start_frame = int(clip_start_sec * decord_vr.get_avg_fps())
end_frame = min(total_frames,
int(clip_end_sec * decord_vr.get_avg_fps())) if clip_end_sec is not None else total_frames
frame_id_list = np.linspace(start_frame, end_frame - 1, num_frames, dtype=int)
elif strategy == 'chat':
timestamps = decord_vr.get_frame_timestamp(np.arange(total_frames))
timestamps = [i[0] for i in timestamps]
max_second = round(max(timestamps)) + 1
frame_id_list = []
for second in range(max_second):
closest_num = min(timestamps, key=lambda x: abs(x - second))
index = timestamps.index(closest_num)
frame_id_list.append(index)
if len(frame_id_list) >= num_frames:
break

video_data = decord_vr.get_batch(frame_id_list)
video_data = video_data.permute(3, 0, 1, 2)
return video_data


tokenizer = AutoTokenizer.from_pretrained(
MODEL_PATH,
trust_remote_code=True,
)

model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH,
torch_dtype=TORCH_TYPE,
trust_remote_code=True
).eval().to(DEVICE)


def predict(prompt, video_data, temperature):
strategy = 'chat'

video = load_video(video_data, strategy=strategy)

history = []
query = prompt
inputs = model.build_conversation_input_ids(
tokenizer=tokenizer,
query=query,
images=[video],
history=history,
template_version=strategy
)
inputs = {
'input_ids': inputs['input_ids'].unsqueeze(0).to('cuda'),
'token_type_ids': inputs['token_type_ids'].unsqueeze(0).to('cuda'),
'attention_mask': inputs['attention_mask'].unsqueeze(0).to('cuda'),
'images': [[inputs['images'][0].to('cuda').to(TORCH_TYPE)]],
}
gen_kwargs = {
"max_new_tokens": 2048,
"pad_token_id": 128002,
"top_k": 1,
"do_sample": False,
"top_p": 0.1,
"temperature": temperature,
}
with torch.no_grad():
outputs = model.generate(**inputs, **gen_kwargs)
outputs = outputs[:, inputs['input_ids'].shape[1]:]
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response


def test():
prompt = "Please describe this video in detail."
temperature = 0.1
video_data = open('test.mp4', 'rb').read()
response = predict(prompt, video_data, temperature)
print(response)


if __name__ == '__main__':
test()






License


This model is released under the
CogVLM2 LICENSE.
For models built with Meta Llama 3, please also adhere to
the LLAMA3_LICENSE.





Citation


🌟 If you find our work helpful, please leave us a star and cite our paper.
@article{yang2024cogvideox,
title={CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer},
author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
journal={arXiv preprint arXiv:2408.06072},
year={2024}
}

Models

View All

We couldn't find any matching results.

Image to Image

reve/Edit

Modify existing images using text instructions. Upload an image and describe the changes you want to make.
1
Image to Image

reve/Remix-Fast

Combine text prompts with reference images to create new variations. Blend styles, concepts, and visual elements.
1
Text to Image

reve/Generate

Generate images from text descriptions. Perfect for creating original artwork, illustrations, and visual content from your imagination.
1
Image to Image

reve/Remix

Combine text prompts with reference images to create new variations. Blend styles, concepts, and visual elements.
1
Fast Inference

google/nano-banana-pro

Google's Gemini 3 Pro Image Preview, also known as Nano Banana, model for text-to-image and image-to-image generation.
10
Image to Image

reve/Edit-Fast

Modify existing images using text instructions. Upload an image and describe the changes you want to make.
1
Social Media & Viral

wiro/Instagram Pose

Generate stylish Instagram-style poses with trendy angles, natural expressions, and modern aesthetic.
6
Ecommerce

wiro/camera-angle-editor

wiro/camera-angle-editor is an advanced AI tool that instantly changes the camera perspective and angle of any existing image. Leveraging sophisticated spatial reconstruction, it eliminates the need for reshoots by synthesizing photorealistic new viewpoints, making it the fastest way for creators to maximize the versatility of their visual content.
1
Logo of nvidia programLogo of nvidia program
Wiro AI brings machine learning easily accessible to all in the cloud.
  • WIRO
  • About
  • Blog
  • Careers
  • Contact
  • Light Mode
  • Product
  • Models
  • Pricing
  • Status
  • Documentation
  • Introduction
  • Start Your First Project
  • Example Projects

2025 © Wiro.ai | Terms of Service & Privacy Policy