Scority

YouTube API Captions and Subtitles

Understand YouTube captions, subtitles and transcript access. Learn how Scority returns caption text and timestamped transcript segments.

Direct answer

Captions are the source material for most transcript APIs

For many YouTube transcript workflows, the useful data comes from caption or subtitle tracks. A transcript API turns those tracks into normalized text and timestamped segments that server-side apps can use.

  • Scority returns full transcript text for reading, search and summarization.
  • Scority also returns segments with start and duration values.
  • Language selection depends on caption tracks available for the video.
Definitions

Captions vs subtitles vs transcripts

The terms overlap in everyday use, but they matter when you build an API integration.

  • Captions are timed text tracks associated with a video.
  • Subtitles often refer to translated or language-specific caption tracks.
  • A transcript is the readable text form of speech, often with optional segment timing.
  • For AI workflows, transcript text and segment timing are usually more useful than raw caption files.
Official API

What the official YouTube API does and does not provide

The official YouTube APIs are the right starting point for many platform workflows, but transcript access is a narrower problem. Before publishing exact claims about official caption APIs, check current Google documentation and record the access date.

  • Use official Google APIs when you need official platform workflows.
  • Use a transcript-specific API when you need text and segments for public video content.
  • Do not assume a visible YouTube page means a transcript is available through an API.
Availability

Public accessible captions

Scority works with many public YouTube videos that have accessible captions. The current API does not guarantee transcripts for every public video.

  • Captions may be missing, disabled, unavailable for a requested language or blocked by upstream behavior.
  • caption availability can differ by language.
  • transcript_not_available means the API could not find usable captions for that request.
  • upstream_transcript_failed means fetching the upstream transcript path failed.
Language

Language selection

Send language or lang to request a caption language. Do not send both. The response language reflects the selected caption track when available.

  • Use values such as en, en-US, ru or ru-RU.
  • invalid_language means the language value is malformed.
  • ambiguous_language means both language and lang were provided.
  • If language-specific captions are not available, your app should handle the returned error or selected language explicitly.
Segments

Timed segments

Segments let AI and search workflows connect text back to time ranges in the video.

  • Each segment includes text, start and duration.
  • Segment timing is useful for citations, chapter summaries and playback links.
  • The full text field is useful for summarization and indexing.

Example API request

This request asks for the English caption track and returns text, segments, source and language when a transcript is available.

curl "https://api.scority.ai/v1/youtube/transcript?video_id=dQw4w9WgXcQ&language=en" \
  -H "x-api-key: YOUR_API_KEY"
AI workflows

How AI workflows use captions

Caption-derived transcripts are useful when an AI workflow needs source text from video without storing or processing the media file.

  • RAG systems can index transcript text and segment timestamps.
  • Agents can inspect a video before answering a question.
  • Summarizers can turn segments into notes, chapters or briefings.
  • Search tools can match spoken content across a video library.
Reference

API reference

See language parameters, response fields and current endpoint behavior.

Open →
Errors

Error codes

Understand transcript_not_available, upstream_transcript_failed and language errors.

Open →
Guide

YouTube API transcript

Compare transcript-specific API access with broader YouTube API intent.

Open →