Audio transcriber

Overview

The audio transcriber node turns audio into text. The Worker loads node configuration from Postgres, then either:

AI transcription — Uses the Vercel AI SDK experimental_transcribe with OpenAI or Groq Whisper models, reading the audio from a URL in data.inputPreview.
YouTube transcript — When data.type is ytTranscript, fetches captions via SearchAPI (no speech-to-text model).

audioTranscriberActivity delegates to AudioTranscriberService.process.

Unlike the text generator, the processSingleNode workflow does not call validate model access or charge tokens for this node type. Token or billing behavior for transcription is handled outside this workflow path if applicable.

When it runs

audioTranscriberActivity is invoked from the processSingleNode workflow when the node type is audio transcriber (NodeType.AUDIO_TRANSCRIBER).

Activity signature

async audioTranscriberActivity({
  nodeId: string,
  sessionId: string,
  userId: string,
}): Promise<AudioTranscriberResponse>

AudioTranscriberResponse:

{
  text: string;
}

The activity returns { text }; the same string is merged into flows_nodes.data (see Persisted node data).

Routing by `data.type`

`data.type`	Behavior
`ytTranscript` (or missing with YouTube-only setup)	Requires `youtubeUrl`. Uses SearchAPI `youtube_transcripts` engine.
`audio` (default when `type` is omitted)	Requires `inputPreview` (audio URL) and `aiModel`. Uses AI transcription.

If type === 'ytTranscript', youtubeUrl must be present or the service throws. Otherwise the audio path requires inputPreview and a supported aiModel.

AI transcription flow

Load node — SELECT from flows_nodes by nodeId; read data.
Resolve model — resolveTranscriptionModel(aiModel) maps the stored key to a transcription model instance from AiProviderService (see Models).
Discover size — HEAD request to inputPreview; read content-length and content-type (MIME).
Small file — If size ≤ 24 MB (CHUNK_SIZE_LIMIT), call transcribe({ model, audio: new URL(url) }) so the SDK fetches the URL directly.
Large file — If size > 24 MB:
- GET the full file into a buffer.
- Segment with fluent-ffmpeg: stream copy (-c copy), segment format, 600 seconds per segment (-segment_time 600), output pattern audio-chunk-<token>-%03d.<ext>.
- For each segment file (sorted by name), call transcribe({ model, audio: chunkBuffer }).
- Concatenate segment texts with newline (join('\n')).
Persist — UPDATE flows_nodes with merged data (see below).
Return — { text }.

Temporal Context.current().heartbeat() is called before/after heavy steps (download, per-chunk transcription, YouTube fetch) so long runs stay healthy.

Chunking details

Threshold: 24 MB — files at or below this size use a single transcribe call with the remote URL.
MIME → extension: Unknown MIME defaults to .mp3 for temp files; known types include audio/mpeg, audio/wav, audio/ogg, audio/webm, audio/flac, audio/aac, audio/mp4, audio/x-m4a, etc. (see MIME_TO_EXT in the service).
FFmpeg: Uses ffmpeg-static when available (ffmpeg.setFfmpegPath). Segments are written under the OS temp directory and deleted after reading.

Models

aiModel must match one of these keys; anything else throws Unsupported model.

`aiModel` value	Provider API model
`openAi`	`whisper-1`
`groq`	`whisper-large-v3`
`groq_v3_turbo`	`whisper-large-v3-turbo`

YouTube transcript flow (`ytTranscript`)

Parse video id from youtubeUrl with a regex for youtu.be/<id> or youtube.com/watch?v=<id>.
Call SearchAPI: GET https://www.searchapi.io/api/v1/search?engine=youtube_transcripts&video_id=<id>&api_key=<SEARCH_API_KEY> with a 150 s abort timeout.
Read transcripts from the JSON body; each item contributes text. Join all with spaces.
If no transcripts, throw No transcript available for this video.

Requires process.env.SEARCH_API_KEY at runtime.

Persisted node `data`

On success, the service sets:

Field	Role
`text`	Full transcript (AI or YouTube).
`executionStatus`	`"COMPLETED"`.
`previewResponses`	Previous previews plus the new `text` appended.

Other data fields are preserved via spread (...nodeData).

Errors

Missing node row → Node not found.
AI path: bad or missing URL, non-OK HEAD/GET, FFmpeg failure, or provider/SDK errors — logged and rethrown.
YouTube path: invalid URL, SearchAPI non-2xx, empty transcripts, or missing SEARCH_API_KEY (undefined in URL).

Getting started

Ai Workflows

Main API

Worker

Client

Overview

When it runs

Activity signature

Routing by `data.type`

AI transcription flow

Chunking details

Models

YouTube transcript flow (`ytTranscript`)

Persisted node `data`

Errors

Getting started

Ai Workflows

Main API

Worker

Client

​Overview

​When it runs

​Activity signature

​Routing by data.type

​AI transcription flow

​Chunking details

​Models

​YouTube transcript flow (ytTranscript)

​Persisted node data

​Errors

​Related documentation

Overview

When it runs

Activity signature

Routing by `data.type`

AI transcription flow

Chunking details

Models

YouTube transcript flow (`ytTranscript`)

Persisted node `data`

Errors

Related documentation