Skip to content
A YouTube video frame with a caption bar branching into SRT and VTT subtitle file cards

How to Download YouTube Subtitles — SRT, VTT, and Auto-Translations

Save YouTube captions as SRT or VTT, grab auto-generated transcripts, and pull translated subtitle tracks. The differences between formats, what auto-captions get wrong, and the exact yt-dlp flags for power users.

Subtitles are one of the most useful things you can pull from a YouTube video, and one of the least well-documented. They are how you turn a video into a searchable, quotable, translatable text file. They are how you make your own clips accessible. They are how you get a transcript without paying for a transcription service. And they are usually free for the taking, because YouTube already produced them.

This guide covers the full picture: what the formats actually are (SRT, VTT, auto-generated), how to download them in a browser or from the command line, and the specific edge cases — translations, missing tracks, broken timing — that cause people to give up and re-type captions by hand.

Just need the subtitles right now? Paste your YouTube URL into TubePull, set the format dropdown to Subtitles (SRT) or Subtitles (VTT), pick a language, and download. Works on auto-generated and manual tracks. The rest of this guide explains which format to pick and how to fix the captions YouTube got wrong.

SRT vs. VTT — what's the actual difference?

Both formats do the same job: they pair lines of text with start and end timecodes so a video player can show captions on screen. The differences matter when you start moving files between tools.

SRT (SubRip Subtitle) is the older, simpler format. Every cue is a number, a start-and-end timestamp, and one or more lines of plain text:

1
00:00:01,200 --> 00:00:04,500
Welcome back to the channel.

2
00:00:04,800 --> 00:00:07,000
Today we're testing the new robot arm.

No styling, no positioning, no metadata. Just text and time. That simplicity is why SRT is the universal currency of subtitles — Premiere Pro, Final Cut, DaVinci Resolve, VLC, MX Player, OBS, and basically every editing tool reads it without complaint (Ditto Transcripts).

VTT (WebVTT, "Web Video Text Tracks") is the W3C-standardized format designed for the HTML5 <track> element. The structure looks similar, but with three important differences:

WEBVTT

00:00:01.200 --> 00:00:04.500
Welcome back to the channel.

00:00:04.800 --> 00:00:07.000 line:90% align:center
Today we're testing the new robot arm.
  1. The file must start with WEBVTT on the first line.
  2. The millisecond separator is a period (.), not a comma — per the W3C WebVTT spec and MDN.
  3. VTT supports cue settings (line:, align:, position:), inline styling (<b>, <i>, <c.classname>), speaker labels, regions, and chapter tracks.

Practically, the choice is simple:

  • Editing the captions into a video, importing into a video editor, sharing with someone who'll open them in VLC: use SRT.
  • Embedding into a web page with <video><track src="captions.vtt">: use VTT.
  • Re-uploading to YouTube, Vimeo, or another platform: either works, but SRT is more universally accepted on upload.

Converting between them is a find-and-replace job — change the comma to a period, prepend WEBVTT, and you're done. The W3C wiki documents the exact steps, and yt-dlp can do the conversion automatically (more on that below).

Manual captions vs. auto-generated — and why it matters

YouTube serves two distinct kinds of caption tracks, and a downloader has to ask for each separately.

Manual captions are written or reviewed by a human — usually the creator or their team. They have proper punctuation, capitalization, speaker labels, and they're broken into readable cues. If a video has [CC] shown next to its quality settings and the caption track in the YouTube player is not labeled "(auto-generated)", it's a manual track.

Auto-generated captions are produced by YouTube's speech recognition. They're available on most videos with clear speech, and they show up in the caption menu labeled "English (auto-generated)" or similar. Independent testing puts their accuracy at 85–95% in studio conditions and 78–82% on noisy outdoor recordings (Notelm test results, 2026). They typically have no punctuation, words appear in bursts as the recognizer commits to them, and they handle proper nouns and technical jargon poorly.

The reason this distinction matters for downloading: manual captions and auto-captions are stored in different tracks on YouTube's servers. Most downloaders default to manual only. If a video has auto-generated captions but no manual ones, the downloader will report "no subtitles available" unless you explicitly request the auto track.

Downloading subtitles with TubePull

The web flow is the same as for video and audio.

  1. Paste the YouTube URL into the input on tubepull.com.
  2. In the format dropdown, choose Subtitles (SRT) or Subtitles (VTT).
  3. Pick the language. The list shows manual tracks first, then auto-generated ones, then auto-translated tracks. Auto tracks are flagged so you know what you're getting.
  4. Download. The file lands in your default download folder, named with the video's title and the language code — for example, team-17792-state-championship.en.srt.

If a video only has auto-generated captions, TubePull will surface them with the "auto" label so the choice is explicit. If a video has neither — for example, a music video with no spoken word and the creator never enabled auto-captions — there is nothing to download, and we say so instead of returning an empty file.

For translation: YouTube can auto-translate a manual or auto track into roughly 130 languages on the fly. TubePull surfaces the most common translation targets directly. For less common languages, the cleaner workflow is to download the source-language file and run it through a translator that handles SRT or VTT structure (DeepL, Google Translate's document upload, or a CLI like subtranslator) so the timing stays intact.

Downloading subtitles with yt-dlp (power users)

If you live on the command line, yt-dlp is the reference tool, and its subtitle flags are worth memorizing. Every flag below is from the official yt-dlp documentation.

See what's available

Before downloading, list the tracks the video actually has:

yt-dlp --list-subs "https://www.youtube.com/watch?v=VIDEO_ID"

This prints two tables — one for manual subtitles and one for auto-generated ones — with language codes (en, es, fr, pt-BR, etc.) and the available formats per track (usually vtt, ttml, srv1, srv2, srv3, json3, srt).

Download manual captions only

yt-dlp --write-subs --sub-langs en --skip-download \
  "https://www.youtube.com/watch?v=VIDEO_ID"

--write-subs enables subtitle writing. --sub-langs en requests English. --skip-download is the important one — without it, you'll also pull the full video file. With it, you get subtitles only.

Include auto-generated captions

yt-dlp --write-subs --write-auto-subs --sub-langs en --skip-download \
  "https://www.youtube.com/watch?v=VIDEO_ID"

Adding --write-auto-subs tells yt-dlp to fall back to (or also include) the auto-generated track if a manual track isn't available. This is the flag people miss — the most common bug report on yt-dlp's subtitle handling is "no subtitles found" when the user just needed to add this flag.

Multiple languages at once

yt-dlp --write-subs --sub-langs "en,es,fr,pt-BR" --skip-download URL

Or every available language:

yt-dlp --write-subs --sub-langs all --skip-download URL

--sub-langs also accepts regex (en.* for any English variant) and exclusion with a - prefix.

Force SRT output

By default, yt-dlp downloads YouTube's native VTT (specifically a YouTube-flavored VTT with timing tags that can clutter cue text). To get a clean SRT instead, convert during download:

yt-dlp --write-subs --sub-langs en --convert-subs srt --skip-download URL

--convert-subs accepts srt, vtt, ass, or lrc.

Embed into the video file

If you do want the video and you want the captions baked in as a soft-subtitle track (so VLC, mpv, and most TVs will offer them as a toggle):

yt-dlp --write-subs --sub-langs en --embed-subs URL

This works for mp4, webm, and mkv outputs.

Auto-translations from the command line

yt-dlp doesn't expose a separate "translate" flag — translations show up as additional language codes on the auto track. For example, a video with only English speech may list dozens of auto- translated targets in --list-subs. Request them by language code:

yt-dlp --write-auto-subs --sub-langs ja --skip-download URL

Quality drops noticeably on translated auto tracks — they're machine-translating an already-imperfect speech recognition transcript, so errors compound. Use them as a starting point, not a finished translation.

Common edge cases

"No subtitles available" on a video that obviously has captions

Three usual culprits: you forgot --write-auto-subs, the captions are community-contributed (which YouTube deprecated in 2020 — old contributed tracks may still exist but be flaky), or the video is age-restricted and you need authentication. For the third case, yt-dlp's cookies-from-browser flag lets you pass through your logged-in session.

Caption timing is off after editing the video

If you trim the source video and reuse the original SRT, every cue after the cut will be wrong. The fix is to shift timecodes — ffmpeg can do this with -itsoffset, and tools like Subtitle Edit handle it visually. Don't try to nudge SRT timestamps by hand for anything longer than a 30-second clip; the math gets ugly.

Garbled text or weird tags in auto-generated VTT

YouTube's auto-generated VTT includes per-word timing tags (<00:00:01.500>) so individual words can highlight as they're spoken. Most editors don't expect those and will display them as literal text. The cleanest fix is --convert-subs srt, which strips the inline timing tags during conversion.

Right-to-left languages (Arabic, Hebrew, Farsi)

The VTT and SRT specs handle RTL fine, but some players don't. If the captions display reversed or garbled, check that the player honors UTF-8 BOM and Unicode bidi markers. VLC and mpv handle this correctly out of the box; some embedded TV players don't.

Music videos and lyric tracks

A small number of music videos have lyrics as a manual subtitle track. They're rare, but worth checking with --list-subs before assuming captions don't exist for music content.

Subtitles are part of the copyrighted work — same rules as the video itself. Downloading them for personal use (search, accessibility, language learning, study) is on solid ground in most jurisdictions. Re-publishing them, embedding them into a re-uploaded copy of someone else's video, or using them as the basis for an unauthorized translation gets murky fast.

For the long version, see our plain-English legal guide to downloading YouTube content. The short version: if you're using captions to make a video accessible to yourself, to quote in a review, or to study a language, you're fine. If you're rebroadcasting someone else's work, you need their permission.

Quick reference cheat sheet

TaskTubePullyt-dlp
Download manual English captionsSubtitles (SRT) → English--write-subs --sub-langs en --skip-download
Download auto-generated captionsSubtitles → English (auto)--write-auto-subs --sub-langs en --skip-download
Get all available languagesLoop per language--sub-langs all
Force SRT outputPick “Subtitles (SRT)”--convert-subs srt
Embed captions into MP4Not yet — coming soon--embed-subs
List what’s availableVisible in language dropdown--list-subs