{"id":1811,"date":"2026-04-08T09:18:59","date_gmt":"2026-04-08T09:18:59","guid":{"rendered":"https:\/\/wiro.ai\/blog\/?p=1811"},"modified":"2026-03-20T19:13:24","modified_gmt":"2026-03-20T19:13:24","slug":"nemotron-vs-whisper-large-v3-5-audio-transcription-tests","status":"publish","type":"post","link":"https:\/\/wiro.ai\/blog\/nemotron-vs-whisper-large-v3-5-audio-transcription-tests\/","title":{"rendered":"Nemotron vs Whisper Large V3: 5 Audio Transcription Tests"},"content":{"rendered":"<h2>Nemotron vs Whisper: two very different ASR approaches<\/h2>\n<p>NVIDIA Nemotron-Speech-Streaming-En-0.6b targets low-latency streaming transcription (chunked audio) with punctuation and capitalization support. OpenAI Whisper Large V3 is a general-purpose speech recognition model trained at large scale and widely used for offline transcription.<\/p>\n<p>This post runs a small 5-audio test set and compares the raw text outputs side by side.<\/p>\n<figure>\n  <img decoding=\"async\" src=\"https:\/\/wiro.ai\/blog\/wp-content\/uploads\/2026\/03\/nemotron-vs-whisper-hero.jpeg\" alt=\"Microphone and audio waveform illustration\"\/><figcaption>Prompt: Cinematic studio photo of a matte black microphone on a desk with a soft glowing audio waveform light trail in the background, dark gradient backdrop, shallow depth of field, clean minimal tech aesthetic, high contrast, no text, no logos, no watermark<\/figcaption><\/figure>\n<h2>Model links<\/h2>\n<ul>\n<li><a href=\"https:\/\/wiro.ai\/models\/nvidia\/nemotron\">nvidia\/nemotron<\/a><\/li>\n<li><a href=\"https:\/\/wiro.ai\/models\/openai\/whisper-large-v3\">openai\/whisper-large-v3<\/a><\/li>\n<\/ul>\n<h2>What was tested<\/h2>\n<ul>\n<li>Clean narration (basic accuracy)<\/li>\n<li>Punctuation and capitalization behavior<\/li>\n<li>Names and uncommon words (Vera, game-cock)<\/li>\n<li>Long sentence handling<\/li>\n<li>Customer support style numbers and phrasing<\/li>\n<\/ul>\n<h2>Inputs used<\/h2>\n<ul>\n<li>Nemotron: <code>inputAudio<\/code> (audio URL)<\/li>\n<li>Whisper Large V3: <code>inputAudioUrl<\/code>, <code>language=auto<\/code>, <code>maxNewTokens=256<\/code>, <code>chunkLength=30<\/code>, <code>batchSize=8<\/code>, <code>numSpeakers=1<\/code> (Whisper output includes timestamps\/segments)<\/li>\n<\/ul>\n<h2>Run-time snapshot (elapsed seconds)<\/h2>\n<table>\n<thead>\n<tr>\n<th>Test<\/th>\n<th>Nemotron<\/th>\n<th>Whisper Large V3<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>01<\/td>\n<td>54<\/td>\n<td>21<\/td>\n<\/tr>\n<tr>\n<td>02<\/td>\n<td>3<\/td>\n<td>5<\/td>\n<\/tr>\n<tr>\n<td>03<\/td>\n<td>32<\/td>\n<td>26<\/td>\n<\/tr>\n<tr>\n<td>04<\/td>\n<td>3<\/td>\n<td>4<\/td>\n<\/tr>\n<tr>\n<td>05<\/td>\n<td>5<\/td>\n<td>4<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Results: 5 audio clips with transcripts<\/h2>\n<table>\n<thead>\n<tr>\n<th>Test audio<\/th>\n<th>Nemotron output<\/th>\n<th>Whisper Large V3 output<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><a href=\"https:\/\/cdn.wiro.ai\/uploads\/models\/nvidia-nemotron-sample-1.mp3\">Test 01 audio<\/a><\/td>\n<td><code>persons who knows that they will not be able to rest along the way when they took a path will never get tired<\/code><\/td>\n<td><code>00:00.2 - 00:05.9 \/ Persons who knows that they will not be able to rest along the way when they took a path will never get tired.<\/code><\/td>\n<\/tr>\n<tr>\n<td><a href=\"https:\/\/cdn.wiro.ai\/uploads\/models\/nvidia-nemotron-sample-3.mp3\">Test 02 audio<\/a><\/td>\n<td><code>going along slushy country roads and speaking to damp audiences in drafty schoolrooms day after day for a fortnight he'll have to put in an appearance at some place of worship on sunday morning and he can come to us immediately afterwards<\/code><\/td>\n<td><code>00:00.0 - 00:06.6 \/ going along slushy country roads and speaking to damp audiences in draughty schoolrooms day after day for a fortnight<br \/>00:07.3 - 00:13.5 \/ He'll have to put in an appearance at some place of worship on Sunday morning, and he can come to us immediately afterwards.<\/code><\/td>\n<\/tr>\n<tr>\n<td><a href=\"https:\/\/cdn.wiro.ai\/uploads\/models\/openai-whisper-large-v3-sample-1.mp3\">Test 03 audio<\/a><\/td>\n<td><code>before he had time to answer a much encumbered vera burst into the room with the question i say can i leave these here these were a small black pig and a lusty specimen of black red gamecock<\/code><\/td>\n<td><code>00:00.5 - 00:07.6 \/ before he had time to answer, a much-encumbered Vera burst into the room with the question, \u00ecI say, can I leave these here?\u00ee<br \/>00:08.5 - 00:13.7 \/ These were a small black pig and a lusty specimen of black-red game-cock,<\/code><\/td>\n<\/tr>\n<tr>\n<td><a href=\"https:\/\/cdn.wiro.ai\/uploads\/models\/openbmb-voxcpm-sample-1.mp3\">Test 04 audio<\/a><\/td>\n<td><code>i received a birthday gift from a friend who sent it from afar that unexpected surprise and deep blessing filled my heart with sweet happiness and my smile bloomed like a flower<\/code><\/td>\n<td><code>00:00.0 - 00:03.3 \/ I received a birthday gift from a friend who sent it from afar.<br \/>00:03.9 - 00:05.7 \/ that unexpected surprise<br \/>00:06.0 - 00:09.0 \/ and deep blessing filled my heart with sweet happiness.<br \/>00:09.5 - 00:11.4 \/ and my smile bloomed like a flower.<\/code><\/td>\n<\/tr>\n<tr>\n<td><a href=\"https:\/\/cdn.wiro.ai\/uploads\/models\/openbmb-voxcpm-sample-2.mp3\">Test 05 audio<\/a><\/td>\n<td><code>i completely understand the frustration you're experiencing technical issues are never convenient to help me resolve this for you immediately could you please confirm the last four digits of your account number<\/code><\/td>\n<td><code>00:00.3 - 00:03.2 \/ I completely understand the frustration you're experiencing.<br \/>00:03.6 - 00:05.6 \/ technical issues are never convenient.<br \/>00:06.0 - 00:08.0 \/ to help me resolve this for you immediately.<br \/>00:08.5 - 00:11.5 \/ Could you please confirm the last four digits of your account number?<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Honest take<\/h2>\n<ul>\n<li>Nemotron returned clean, single-line text for all five clips, but it lowercased everything and dropped punctuation.<\/li>\n<li>Whisper Large V3 returned segmented output with timestamps and generally kept punctuation and capitalization, but one clip shows odd quote characters (\u00ec and \u00ee) in the text output.<\/li>\n<li>If you need streaming-first ASR behavior, Nemotron has the right shape. If you want timestamped segments out of the box, Whisper is convenient.<\/li>\n<\/ul>\n<h2>Try it<\/h2>\n<ul>\n<li><a href=\"https:\/\/wiro.ai\/models\/nvidia\/nemotron\">Run Nemotron on Wiro<\/a><\/li>\n<li><a href=\"https:\/\/wiro.ai\/models\/openai\/whisper-large-v3\">Run Whisper Large V3 on Wiro<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Nemotron vs Whisper: two very different ASR approaches NVIDIA Nemotron-Speech-Streaming-En-0.6b targets low-latency streaming transcription (chunked audio) with punctuation and capitalization support. OpenAI&hellip;<\/p>\n","protected":false},"author":4,"featured_media":1809,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[51],"tags":[101,94,73,154,153,92,63,155],"class_list":["post-1811","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-model-comparison","tag-asr","tag-audio","tag-comparison","tag-nemotron","tag-nvidia","tag-openai","tag-speech-to-text","tag-whisper"],"_links":{"self":[{"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/posts\/1811","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/comments?post=1811"}],"version-history":[{"count":1,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/posts\/1811\/revisions"}],"predecessor-version":[{"id":1812,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/posts\/1811\/revisions\/1812"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/media\/1809"}],"wp:attachment":[{"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/media?parent=1811"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/categories?post=1811"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wiro.ai\/blog\/wp-json\/wp\/v2\/tags?post=1811"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}