Guest



Sign inSignup
  • Home
  • Dashboard
  • Tools
  • Store
  • Pricing

Welcome

HomeDashboardToolsStore
Use cases
Human Resources
Retail & E-commerce
Interior Design
Fashion AI
Creative Content Solutions
Sports & Fitness
GenAI Video Tools
PricingDocumentation
Guest



Sign inSignup

Task History

  • Runnings
  • Models
  • Trains

You don't have task yet.

Go to Tools

Welcome

  • Tools
  • THUDM/CogVideoX-2b
Tools
Task History

THUDM/ CogVideoX-2b

1577runs
0Comments

API Sample: THUDM/CogVideoX-2b

You don't have any projects yet. To be able to use our api service effectively, please sign in/up and create a project.

Get your api key
  • curl
  • nodejs
  • csharp
  • php
  • swift
  • dart
  • kotlin
  • go
  • python

Prepare Authentication Signature

                          
  //Sign up Wiro dashboard and create project
  export YOUR_API_KEY="{{useSelectedProjectAPIKey}}"; 
  export YOUR_API_SECRET="XXXXXXXXX"; 

   //unix time or any random integer value
  export NONCE=$(date +%s);

  //hmac-SHA256 (YOUR_API_SECRET+Nonce) with YOUR_API_KEY
  export SIGNATURE="$(echo -n "${YOUR_API_SECRET}${NONCE}" | openssl dgst -sha256 -hmac "${YOUR_API_KEY}")";
      
                        

Create a New Folder - Make HTTP Post Request

Create a New Folder - Response

Upload a File to the Folder - Make HTTP Post Request

Upload a File to the Folder - Response

Run Command - Make HTTP Post Request

                          
  curl -X POST "{{apiUrl}}/Run/{{toolSlugOwner}}/{{toolSlugProject}}"  \
  -H "Content-Type: {{contentType}}" \
  -H "x-api-key: ${YOUR_API_KEY}" \
  -H "x-nonce: ${NONCE}" \
  -H "x-signature: ${SIGNATURE}" \
  -d '{{toolParameters}}';

      
                        

Run Command - Response

                          
  //response body
  {
      "errors": [],
      "taskid": "2221",
      "socketaccesstoken": "eDcCm5yyUfIvMFspTwww49OUfgXkQt",
      "result": true
  }
      
                        

Get Task Detail - Make HTTP Post Request

                          
  curl -X POST "{{apiUrl}}/Task/Detail"  \
  -H "Content-Type: {{contentType}}" \
  -H "x-api-key: ${YOUR_API_KEY}" \
  -H "x-nonce: ${NONCE}" \
  -H "x-signature: ${SIGNATURE}" \
  -d '{
    "tasktoken": 'eDcCm5yyUfIvMFspTwww49OUfgXkQt',
  }';

      
                        

Get Task Detail - Response

                          
  //response body
  {
    "total": "1",
    "errors": [],
    "tasklist": [
        {
            "id": "2221",
            "uuid": "15bce51f-442f-4f44-a71d-13c6374a62bd",
            "name": "",
            "socketaccesstoken": "eDcCm5yyUfIvMFspTwww49OUfgXkQt",
            "parameters": {
                "inputImage": "https://api.wiro.ai/v1/File/mCmUXgZLG1FNjjjwmbtPFr2LVJA112/inputImage-6060136.png"
            },
            "debugoutput": "",
            "debugerror": "",
            "starttime": "1734513809",
            "endtime": "1734513813",
            "elapsedseconds": "6.0000",
            "status": "task_postprocess_end",
            "cps": "0.000585000000",
            "totalcost": "0.003510000000",
            "guestid": null,
            "projectid": "699",
            "modelid": "598",
            "description": "",
            "basemodelid": "0",
            "runtype": "model",
            "modelfolderid": "",
            "modelfileid": "",
            "callbackurl": "",
            "marketplaceid": null,
            "createtime": "1734513807",
            "canceltime": "0",
            "assigntime": "1734513807",
            "accepttime": "1734513807",
            "preprocessstarttime": "1734513807",
            "preprocessendtime": "1734513807",
            "postprocessstarttime": "1734513813",
            "postprocessendtime": "1734513814",
            "pexit": "0",
            "categories": "["tool","image-to-image","quick-showcase","compare-landscape"]",
            "outputs": [
                {
                    "id": "6bc392c93856dfce3a7d1b4261e15af3",
                    "name": "0.png",
                    "contenttype": "image/png",
                    "parentid": "6c1833f39da71e6175bf292b18779baf",
                    "uuid": "15bce51f-442f-4f44-a71d-13c6374a62bd",
                    "size": "202472",
                    "addedtime": "1734513812",
                    "modifiedtime": "1734513812",
                    "accesskey": "dFKlMApaSgMeHKsJyaDeKrefcHahUK",
                    "foldercount": "0",
                    "filecount": "0",
                    "ispublic": 0,
                    "expiretime": null,
                    "url": "https://cdn1.wiro.ai/6a6af820-c5050aee-40bd7b83-a2e186c6-7f61f7da-3894e49c-fc0eeb66-9b500fe2/0.png"
                }
            ],
            "size": "202472"
        }
    ],
    "result": true
  }
      
                        

Get Task Process Information and Results with Socket Connection

                          
<script type="text/javascript">
  window.addEventListener('load',function() {
    //Get socketAccessToken from task run response
    var SocketAccessToken = 'eDcCm5yyUfIvMFspTwww49OUfgXkQt';
    WebSocketConnect(SocketAccessToken);
  });

  //Connect socket with connection id and register task socket token
  async function WebSocketConnect(accessTokenFromAPI) {
    if ("WebSocket" in window) {
        var ws = new WebSocket("wss://socket.wiro.ai/v1");
        ws.onopen = function() {  
          //Register task socket token which has been obtained from task run API response
          ws.send('{"type": "task_info", "tasktoken": "' + accessTokenFromAPI + '"}'); 
        };

        ws.onmessage = function (evt) { 
          var msg = evt.data;

          try {
              var debugHtml = document.getElementById('debug');
              debugHtml.innerHTML = debugHtml.innerHTML + "\n" + msg;

              var msgJSON = JSON.parse(msg);
              console.log('msgJSON: ', msgJSON);

              if(msgJSON.type != undefined)
              {
                console.log('msgJSON.target: ',msgJSON.target);
                switch(msgJSON.type) {
                    case 'task_queue':
                      console.log('Your task has been waiting in the queue.');
                    break;
                    case 'task_accept':
                      console.log('Your task has been accepted by the worker.');
                    break;
                    case 'task_preprocess_start':
                      console.log('Your task preprocess has been started.');
                    break;
                    case 'task_preprocess_end':
                      console.log('Your task preprocess has been ended.');
                    break;
                    case 'task_assign':
                      console.log('Your task has been assigned GPU and waiting in the queue.');
                    break;
                    case 'task_start':
                      console.log('Your task has been started.');
                    break;
                    case 'task_output':
                      console.log('Your task has been started and printing output log.');
                      console.log('Log: ', msgJSON.message);
                    break;
                    case 'task_error':
                      console.log('Your task has been started and printing error log.');
                      console.log('Log: ', msgJSON.message);
                    break;
                   case 'task_output_full':
                      console.log('Your task has been completed and printing full output log.');
                    break;
                    case 'task_error_full':
                      console.log('Your task has been completed and printing full error log.');
                    break;
                    case 'task_end':
                      console.log('Your task has been completed.');
                    break;
                    case 'task_postprocess_start':
                      console.log('Your task postprocess has been started.');
                    break;
                    case 'task_postprocess_end':
                      console.log('Your task postprocess has been completed.');
                      console.log('Outputs: ', msgJSON.message);
                      //output files will add ui
                      msgJSON.message.forEach(function(currentValue, index, arr){
                          console.log(currentValue);
                          var filesHtml = document.getElementById('files');
                          filesHtml.innerHTML = filesHtml.innerHTML + '<img src="' + currentValue.url + '" style="height:300px;">'
                      });
                    break;
                }
              }
          } catch (e) {
            console.log('e: ', e);
            console.log('msg: ', msg);
          }
        };

        ws.onclose = function() { 
          alert("Connection is closed..."); 
        };
    } else {              
        alert("WebSocket NOT supported by your Browser!");
    }
  }
</script>
      
                        

Prepare UI Elements Inside Body Tag

                          
  <div id="files"></div>
  <pre id="debug"></pre>
      
                        

Tell us about any details you want to generate

Your request will cost $0.0006 per second.

(Total cost varies depending on the request’s execution time.)
CogVideoX-2b-sample-1.mp4
CogVideoX-2b-sample-2.mp4
1724927572 Report This Model






CogVideoX-2B







📄 中文阅读 |
🤗 Huggingface Space |
🌐 Github |
📜 arxiv







Demo Show







Video Gallery with Captions








A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea. The ship's hull is painted a rich brown, with tiny windows. The carpet, soft and textured, provides a perfect backdrop, resembling an oceanic expanse. Surrounding the ship are various other toys and children's items, hinting at a playful environment. The scene captures the innocence and imagination of childhood, with the toy ship's journey symbolizing endless adventures in a whimsical, indoor setting.





The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it’s tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.





A street artist, clad in a worn-out denim jacket and a colorful bandana, stands before a vast concrete wall in the heart, holding a can of spray paint, spray-painting a colorful bird on a mottled wall.





In the haunting backdrop of a war-torn city, where ruins and crumbled walls tell a story of devastation, a poignant close-up frames a young girl. Her face is smudged with ash, a silent testament to the chaos around her. Her eyes glistening with a mix of sorrow and resilience, capturing the raw emotion of a world that has lost its innocence to the ravages of conflict.










Model Introduction


CogVideoX is an open-source version of the video generation model originating from QingYing. The table below displays the list of video generation models we currently offer, along with their foundational information.


Model Name
CogVideoX-2B (This Repository)
CogVideoX-5B


Model Description
Entry-level model, balancing compatibility. Low cost for running and secondary development.
Larger model with higher video generation quality and better visual effects.


Inference Precision
FP16* (Recommended), BF16, FP32, FP8*, INT8, no support for INT4
BF16 (Recommended), FP16, FP32, FP8*, INT8, no support for INT4


Single GPU VRAM Consumption
FP16: 18GB using SAT / 12.5GB* using diffusersINT8: 7.8GB* using diffusers with torchao
BF16: 26GB using SAT / 20.7GB* using diffusersINT8: 11.4GB* using diffusers with torchao


Multi-GPU Inference VRAM Consumption
FP16: 10GB* using diffusers
BF16: 15GB* using diffusers


Inference Speed(Step = 50, FP/BF16)
Single A100: ~90 secondsSingle H100: ~45 seconds
Single A100: ~180 secondsSingle H100: ~90 seconds


Fine-tuning Precision
FP16
BF16


Fine-tuning VRAM Consumption (per GPU)
47 GB (bs=1, LORA) 61 GB (bs=2, LORA) 62GB (bs=1, SFT)
63 GB (bs=1, LORA) 80 GB (bs=2, LORA) 75GB (bs=1, SFT)


Prompt Language
English*


Prompt Length Limit
226 Tokens


Video Length
6 Seconds


Frame Rate
8 Frames per Second


Video Resolution
720 x 480, no support for other resolutions (including fine-tuning)


Positional Encoding
3d_sincos_pos_embed
3d_rope_pos_embed



Data Explanation

When testing with the diffusers library, the enable_model_cpu_offload() option and pipe.vae.enable_tiling() optimization were enabled. This solution has not been tested on devices other than NVIDIA A100 / H100. Typically, this solution is adaptable to all devices above the NVIDIA Ampere architecture. If the optimization is disabled, memory usage will increase significantly, with peak memory being about 3 times the table value.
The CogVideoX-2B model was trained using FP16 precision, so it is recommended to use FP16 for inference.
For multi-GPU inference, the enable_model_cpu_offload() optimization needs to be disabled.
Using the INT8 model will lead to reduced inference speed. This is done to allow low-memory GPUs to perform inference while maintaining minimal video quality loss, though the inference speed will be significantly reduced.
Inference speed tests also used the memory optimization mentioned above. Without memory optimization, inference speed increases by approximately 10%. Only the diffusers version of the model supports quantization.
The model only supports English input; other languages can be translated to English for refinement by large models.

Note

Using SAT for inference and fine-tuning of SAT version
models. Feel free to visit our GitHub for more information.






Quick Start 🤗


This model supports deployment using the huggingface diffusers library. You can deploy it by following these steps.
We recommend that you visit our GitHub and check out the relevant prompt
optimizations and conversions to get a better experience.

Install the required dependencies

# diffusers>=0.30.1
# transformers>=0.44.0
# accelerate>=0.33.0 (suggest install from source)
# imageio-ffmpeg>=0.5.1
pip install --upgrade transformers accelerate diffusers imageio-ffmpeg


Run the code

import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video

prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."

pipe = CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX-2b",
torch_dtype=torch.float16
)

pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()

video = pipe(
prompt=prompt,
num_videos_per_prompt=1,
num_inference_steps=50,
num_frames=49,
guidance_scale=6,
generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]

export_to_video(video, "output.mp4", fps=8)






Quantized Inference


PytorchAO and Optimum-quanto can be used to quantize the Text Encoder, Transformer and VAE modules to lower the memory requirement of CogVideoX. This makes it possible to run the model on free-tier T4 Colab or smaller VRAM GPUs as well! It is also worth noting that TorchAO quantization is fully compatible with torch.compile, which allows for much faster inference speed.
# To get started, PytorchAO needs to be installed from the GitHub source and PyTorch Nightly.
# Source and nightly installation is only required until next release.

import torch
from diffusers import AutoencoderKLCogVideoX, CogVideoXTransformer3DModel, CogVideoXPipeline
from diffusers.utils import export_to_video
+ from transformers import T5EncoderModel
+ from torchao.quantization import quantize_, int8_weight_only, int8_dynamic_activation_int8_weight

+ quantization = int8_weight_only

+ text_encoder = T5EncoderModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="text_encoder", torch_dtype=torch.bfloat16)
+ quantize_(text_encoder, quantization())

+ transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="transformer", torch_dtype=torch.bfloat16)
+ quantize_(transformer, quantization())

+ vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX-2b", subfolder="vae", torch_dtype=torch.bfloat16)
+ quantize_(vae, quantization())

# Create pipeline and run inference
pipe = CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX-2b",
+ text_encoder=text_encoder,
+ transformer=transformer,
+ vae=vae,
torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()

prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."

video = pipe(
prompt=prompt,
num_videos_per_prompt=1,
num_inference_steps=50,
num_frames=49,
guidance_scale=6,
generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]

export_to_video(video, "output.mp4", fps=8)

Additionally, the models can be serialized and stored in a quantized datatype to save disk space when using PytorchAO. Find examples and benchmarks at these links:

torchao
quanto






Explore the Model


Welcome to our github, where you will find:

More detailed technical details and code explanation.
Optimization and conversion of prompt words.
Reasoning and fine-tuning of SAT version models, and even pre-release.
Project update log dynamics, more interactive opportunities.
CogVideoX toolchain to help you better use the model.
INT8 model inference code support.






Model License


The CogVideoX-2B model (including its corresponding Transformers module and VAE module) is released under the Apache 2.0 License.
The CogVideoX-5B model (Transformers module) is released under the CogVideoX LICENSE.





Citation


@article{yang2024cogvideox,
title={CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer},
author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
journal={arXiv preprint arXiv:2408.06072},
year={2024}
}

.hf-sanitized.hf-sanitized-x4LfeZ5AGWxbgEF4FnjMa .video-container { display: flex; flex-wrap: wrap; justify-content: space-around; }
.hf-sanitized.hf-sanitized-x4LfeZ5AGWxbgEF4FnjMa .video-item { width: 45%; margin-bottom: 20px; transition: transform 0.3s; }
.hf-sanitized.hf-sanitized-x4LfeZ5AGWxbgEF4FnjMa .video-item:hover { transform: scale(1.1); }
.hf-sanitized.hf-sanitized-x4LfeZ5AGWxbgEF4FnjMa .caption { text-align: center; margin-top: 10px; font-size: 11px; }

Tools

View All

We couldn't find any matching results.

THUDM/CogVideoX-2b

Entry-level model, balancing compatibility. Low cost for running and secondary development.
Run time: 1 second
2627 runs
0

THUDM/CogVideoX-5b

Larger model with higher video generation quality and better visual effects.
Run time: 1 second
1180 runs
0

Select Language

Logo of nvidia programLogo of nvidia program
Wiro AI brings machine learning easily accessible to all in the cloud.
  • WIRO
  • About
  • Careers
  • Contact
  • Language Language
  • Product
  • Tools
  • Pricing
  • Roadmap
  • Changelog
  • Status
  • Documentation
  • Introduction
  • Start Your First Project
  • Example Projects

2023 © Wiro.ai | Terms of Service & Privacy Policy