Overview
The audio transcriber node turns audio into text. The Worker loads node configuration from Postgres, then either:- AI transcription — Uses the Vercel AI SDK
experimental_transcribewith OpenAI or Groq Whisper models, reading the audio from a URL indata.inputPreview. - YouTube transcript — When
data.typeisytTranscript, fetches captions via SearchAPI (no speech-to-text model).
audioTranscriberActivity delegates to AudioTranscriberService.process.
Unlike the text generator, the
processSingleNode workflow does not call validate model access or charge tokens for this node type. Token or billing behavior for transcription is handled outside this workflow path if applicable.When it runs
audioTranscriberActivity is invoked from the processSingleNode workflow when the node type is audio transcriber (NodeType.AUDIO_TRANSCRIBER).
Activity signature
AudioTranscriberResponse:
{ text }; the same string is merged into flows_nodes.data (see Persisted node data).
Routing by data.type
data.type | Behavior |
|---|---|
ytTranscript (or missing with YouTube-only setup) | Requires youtubeUrl. Uses SearchAPI youtube_transcripts engine. |
audio (default when type is omitted) | Requires inputPreview (audio URL) and aiModel. Uses AI transcription. |
type === 'ytTranscript', youtubeUrl must be present or the service throws. Otherwise the audio path requires inputPreview and a supported aiModel.
AI transcription flow
- Load node —
SELECTfromflows_nodesbynodeId; readdata. - Resolve model —
resolveTranscriptionModel(aiModel)maps the stored key to a transcription model instance fromAiProviderService(see Models). - Discover size —
HEADrequest toinputPreview; readcontent-lengthandcontent-type(MIME). - Small file — If size ≤ 24 MB (
CHUNK_SIZE_LIMIT), calltranscribe({ model, audio: new URL(url) })so the SDK fetches the URL directly. - Large file — If size > 24 MB:
GETthe full file into a buffer.- Segment with
fluent-ffmpeg: stream copy (-c copy), segment format, 600 seconds per segment (-segment_time 600), output patternaudio-chunk-<token>-%03d.<ext>. - For each segment file (sorted by name), call
transcribe({ model, audio: chunkBuffer }). - Concatenate segment texts with newline (
join('\n')).
- Persist —
UPDATE flows_nodeswith mergeddata(see below). - Return —
{ text }.
Context.current().heartbeat() is called before/after heavy steps (download, per-chunk transcription, YouTube fetch) so long runs stay healthy.
Chunking details
- Threshold: 24 MB — files at or below this size use a single
transcribecall with the remote URL. - MIME → extension: Unknown MIME defaults to
.mp3for temp files; known types includeaudio/mpeg,audio/wav,audio/ogg,audio/webm,audio/flac,audio/aac,audio/mp4,audio/x-m4a, etc. (seeMIME_TO_EXTin the service). - FFmpeg: Uses
ffmpeg-staticwhen available (ffmpeg.setFfmpegPath). Segments are written under the OS temp directory and deleted after reading.
Models
aiModel must match one of these keys; anything else throws Unsupported model.
aiModel value | Provider API model |
|---|---|
openAi | whisper-1 |
groq | whisper-large-v3 |
groq_v3_turbo | whisper-large-v3-turbo |
YouTube transcript flow (ytTranscript)
- Parse video id from
youtubeUrlwith a regex foryoutu.be/<id>oryoutube.com/watch?v=<id>. - Call SearchAPI:
GET https://www.searchapi.io/api/v1/search?engine=youtube_transcripts&video_id=<id>&api_key=<SEARCH_API_KEY>with a 150 s abort timeout. - Read
transcriptsfrom the JSON body; each item contributestext. Join all with spaces. - If no transcripts, throw
No transcript available for this video.
process.env.SEARCH_API_KEY at runtime.
Persisted node data
On success, the service sets:
| Field | Role |
|---|---|
text | Full transcript (AI or YouTube). |
executionStatus | "COMPLETED". |
previewResponses | Previous previews plus the new text appended. |
data fields are preserved via spread (...nodeData).
Errors
- Missing node row →
Node not found. - AI path: bad or missing URL, non-OK
HEAD/GET, FFmpeg failure, or provider/SDK errors — logged and rethrown. - YouTube path: invalid URL, SearchAPI non-2xx, empty
transcripts, or missingSEARCH_API_KEY(undefined in URL).