shrp api for speech, tts and transcripts

use shrp from your scripts, automations, and ai agents. build with a speech-to-text api, audio transcription api, video transcription api, text-to-speech api, and youtube transcript api using bearer token auth and predictable json responses.

generate your api key to get started

free accounts get 2 keys. authenticated requests only — no anonymous access.

get api key →

// authentication

all requests require a bearer token in the Authorization header. generate your key at /dashboard/api-keys. keys begin with shrp_live_ and are shown only once — store them securely.

auth header
Authorization: Bearer shrp_live_<your-key>
200request succeeded
401missing or invalid api key
429rate limit or quota reached
400bad request — check field names and values
404resource not found (e.g. no transcript available)
502 / 504upstream provider error or timeout

// endpoints

POST
/api/v1/youtube-transcriptlive

youtube transcript api for retrieving the available transcript from a video url. returns readable text or srt subtitle format. no credits charged — rate limited to 20 requests per key per day.

POST
/api/v1/ttslive

text-to-speech api for generating natural voice audio with google neural voices. returns base64-encoded mp3 audio and uses your existing tts character quota.

POST
/api/v1/transcribelive

speech-to-text api for audio transcription and video transcription via assemblyai. returns transcript text, duration, word count, confidence, and optional speaker labels.

// youtube transcript api

Need the short version for this endpoint? Read the focused YouTube Transcript API overview →

RATE LIMIT
20 requests / key / day
CREDITS
none charged
FORMATS
text (default), srt
METHOD
POST

REQUEST BODY

json
{
  "url": "https://youtube.com/watch?v=...",  // required — full URL or 11-char video ID
  "format": "text"                           // optional — "text" (default) or "srt"
}

EXAMPLES

curl
shell
curl -X POST https://shrp.app/api/v1/youtube-transcript \
  -H "Authorization: Bearer shrp_live_..." \
  -H "Content-Type: application/json" \
  -d '{"url": "https://youtube.com/watch?v=dQw4w9WgXcQ"}'
javascript
javascript
const res = await fetch('https://shrp.app/api/v1/youtube-transcript', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer shrp_live_...',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({ url: 'https://youtube.com/watch?v=dQw4w9WgXcQ' }),
})
const data = await res.json()
// data.transcript — plain text
// data.language   — detected language code
python
python
import requests

resp = requests.post(
    'https://shrp.app/api/v1/youtube-transcript',
    headers={'Authorization': 'Bearer shrp_live_...'},
    json={'url': 'https://youtube.com/watch?v=dQw4w9WgXcQ'},
)
data = resp.json()
print(data['transcript'])

RESPONSE

json
{
  "success": true,
  "videoId": "dQw4w9WgXcQ",
  "language": "en",
  "transcript": "We're no strangers to love...",
  "format": "text",
  "wordCount": 312
}

// text-to-speech api

VOICES
2,000+ google voices
OUTPUT
base64 mp3
QUOTA
shared with web ui
METHOD
POST

REQUEST BODY

json
{
  "text": "Hello from SHRP.",           // required — the text to synthesize
  "voiceName": "en-US-Neural2-J",       // required — see /api/tts/voices for options
  "languageCode": "en-US",              // required — BCP-47 language code
  "speed": 1.0                          // optional — 0.25 to 4.0, default 1.0
}

to list available voices and their language codes, call GET /api/tts/voices (no auth required). quota is shared with your web ui usage — the same monthly character allowance applies.

EXAMPLES

curl
shell
curl -X POST https://shrp.app/api/v1/tts \
  -H "Authorization: Bearer shrp_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from SHRP.",
    "voiceName": "en-US-Neural2-J",
    "languageCode": "en-US",
    "speed": 1.0
  }'
javascript
javascript
const res = await fetch('https://shrp.app/api/v1/tts', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer shrp_live_...',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    text: 'Hello from SHRP.',
    voiceName: 'en-US-Neural2-J',
    languageCode: 'en-US',
    speed: 1.0,
  }),
})
const data = await res.json()
// data.audioBase64 — MP3 as base64 string
python
python
import requests, base64

resp = requests.post(
    'https://shrp.app/api/v1/tts',
    headers={'Authorization': 'Bearer shrp_live_...'},
    json={
        'text': 'Hello from SHRP.',
        'voiceName': 'en-US-Neural2-J',
        'languageCode': 'en-US',
        'speed': 1.0,
    },
)
data = resp.json()
audio = base64.b64decode(data['audioBase64'])
with open('output.mp3', 'wb') as f:
    f.write(audio)

RESPONSE

json
{
  "success": true,
  "audioBase64": "//NExAA...",
  "format": "mp3",
  "charactersUsed": 16,
  "charsUsed": 1234,
  "charsLimit": 10000,
  "charsRemaining": 8766,
  "voiceName": "en-US-Neural2-J",
  "languageCode": "en-US"
}

// speech-to-text api

INPUT
multipart file upload
MAX FILE
100MB direct upload
PROVIDER
assemblyai
METHOD
POST

REQUEST BODY

multipart/form-data
file=@meeting.mp3            // required — audio or video file
language=en                  // optional — BCP-47 language code
speaker_labels=true          // optional — Starter/Pro only

this audio transcription api returns transcript json directly for uploaded audio and video files. it does not store your uploaded file or create a dashboard history item. use the web upload flow if you need saved transcription history.

EXAMPLES

curl
shell
curl -X POST https://shrp.app/api/v1/transcribe \
  -H "Authorization: Bearer shrp_live_..." \
  -F "file=@meeting.mp3" \
  -F "speaker_labels=true"
javascript
javascript
const form = new FormData()
form.append('file', fileInput.files[0])
form.append('speaker_labels', 'true')

const res = await fetch('https://shrp.app/api/v1/transcribe', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer shrp_live_...',
  },
  body: form,
})
const data = await res.json()
// data.text — transcript text
// data.utterances — speaker-labelled segments, when available
python
python
import requests

with open('meeting.mp3', 'rb') as f:
    resp = requests.post(
        'https://shrp.app/api/v1/transcribe',
        headers={'Authorization': 'Bearer shrp_live_...'},
        files={'file': f},
        data={'speaker_labels': 'true'},
    )

data = resp.json()
print(data['text'])

RESPONSE

json
{
  "success": true,
  "filename": "meeting.mp3",
  "text": "Thanks everyone for joining...",
  "language": "en",
  "confidence": 0.94,
  "durationSeconds": 184,
  "wordCount": 512,
  "speakerLabels": true,
  "words": [],
  "utterances": [],
  "usage": {
    "tier": "starter",
    "durationMinutes": 4,
    "monthlyMinutesUsed": 18,
    "monthlyMinutesLimit": 200
  }
}

// rate limits & quotas

/api/v1/youtube-transcript
limit20 requests per key per day
creditsno credits charged
resetresets at midnight utc
/api/v1/tts
limitfree: 500 chars/request · paid: 2,000 chars/request
creditsdeducted from monthly character quota
resetmonthly quota resets on 1st of month
/api/v1/transcribe
limitcloud: 60 min/mo · starter: 200 min/mo · pro: 1500 min/mo
creditsuses transcription plan minutes or pay-as-you-go credits
resetdaily requests reset midnight utc; minutes reset monthly

when a limit is hit, the api returns 429 with an error field describing what was exceeded. credits and quotas are tied to the user account associated with the api key — not to the key itself.

// why build with shrp

no infra to manage

google cloud tts, assemblyai, and youtube transcript retrieval handled for you. one endpoint, one key.

works in any language

60+ languages for tts, 100+ transcription languages, and youtube transcripts in whatever language the video was recorded in.

designed for agents

all responses are clean json. consistent error codes. predictable schema. easy to chain into llm pipelines and automations.

same quota as the ui

api usage and web ui usage share the same character balance. no double billing, no separate api quota to track.

// mcp — for ai agents

if you are building with claude, cursor, or windsurf, use the mcp server instead of the rest api. your ai agent calls shrp tools natively — no fetch calls, no response parsing, no glue code.

claude_desktop_config.json
{
  "mcpServers": {
    "shrp": {
      "url": "https://shrp.app/api/mcp",
      "headers": {
        "Authorization": "Bearer shrp_live_..."
      }
    }
  }
}
youtube_transcriptlive

url, format? → plain text transcript

text_to_speechlive

text, voiceName, languageCode, speed? → base64 mp3

mcp uses the same api key and same quota as the rest api — not separate auth, not separate billing. full mcp docs →

// mcp server for ai agents

mcp server

use shrp directly inside claude, cursor, and windsurf via the model context protocol — no fetch calls, no glue code. your ai agent calls youtube_transcript, text_to_speechas native tools. speech-to-text is available today through the rest api; the mcp transcribe tool will be added separately once file upload semantics are finalized.

learn more about the mcp server →

// api faq

Does SHRP have a speech-to-text API?

Yes. POST /api/v1/transcribe accepts multipart audio or video uploads and returns transcript text, confidence, duration, word count, and optional speaker-labelled utterances.

Does SHRP have a text-to-speech API?

Yes. POST /api/v1/tts converts text to speech and returns base64 MP3 audio using the same SHRP TTS quota as the web app.

Can I extract YouTube transcripts with the API?

Yes. POST /api/v1/youtube-transcript retrieves the available transcript for a YouTube video as plain text or SRT.

Is the API designed for AI agents?

Yes. SHRP returns predictable JSON responses and uses bearer token authentication, so it works well in scripts, automations, and AI agent workflows.

api usage draws from your existing shrp plan and credits — no separate api subscription required.

ready to automate?

generate api key →view pricing

questions? reach out at hello@shrp.app