Transcription Python Packages

funasr

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

422K 19K 2K

speechmatics-rt

Python SDKs for Speechmatics APIs

211K 18 9

soniox

Python SDK for the Soniox API: realtime/async STT and TTS.

207K 9 5

speechmatics-voice

Python SDKs for Speechmatics APIs

157K 18 9

speechmatics-python

Python library and CLI for Speechmatics

55K 75 23

speechmatics-batch

Python SDKs for Speechmatics APIs

53K 18 9

speach

🐍🍑 Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF, etc.)

17K 21 6

whispergram

Local, offline transcriber for Telegram & Instagram chat exports — voice/video notes via Whisper (faster-whisper), screenshots via OCR, photos/stickers/GIFs via a local vision model. Interactive menu or CLI; merges everything into one chronological, LLM-ready Markdown file. No cloud, no API key.

10K 1 0

funasr-onnx

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

9K 19K 2K

pvcheetah

On-device streaming speech-to-text engine powered by deep learning

6K 667 77

plaud-tools

Python CLI and MCP server for Plaud recordings — bring transcripts, summaries, and audio into your AI assistant.

6K 1 2

diart

A python package to build AI-powered real-time audio applications

5K 2K 164

youty-mcp

Turn YouTube, Instagram, and TikTok videos into a local, AI-readable knowledge base — 100% on-device. macOS app + CLI + MCP server.

5K 0 0

lyrics-transcriber

Automatically create synchronised lyrics files in ASS and LRC with word-level timestamps, using Whisper and lyrics from online sources, with anchor sequences and LLMs to auto-correct transcription

4K 93 17

pvleopard

On-device speech-to-text engine powered by deep learning

4K 482 29

transkun

A simple yet effective Audio-to-Midi Automatic Piano Transcription system

4K 368 35

whisper-run

Faster Whisper with Speaker Diarization

3K 9 1

whisperflow

Whisper-Flow is a framework designed to enable real-time transcription of audio content using OpenAI’s Whisper model. Rather than processing entire files after upload (“batch mode”), Whisper-Flow accepts a continuous stream of audio chunks and produces incremental transcripts immediately.

3K 800 115

tubescrape

A fast, lightweight Python toolkit for scraping YouTube without an API key. Search videos with filters, browse channel content (videos, shorts, playlists), fetch and translate transcripts, and scrape full playlists

3K 12 1

whisper-smith

CLI and Python library for transcribing audio with OpenAI Whisper — supports txt, json, srt, vtt output and optional speaker diarization.

2K 0 0

whisper-local

Free, open-source, 100% offline AI dictation for Windows & macOS. Wispr Flow / Dragon alternative. Push-to-talk hotkey, voice commands, transforms, sub-second latency. Powered by Whisper. No cloud, no subscription, no telemetry.

2K 5 2

deepctl

Official Deepgram CLI — speech-to-text, text-to-speech, and audio intelligence from your terminal

2K 8 2

fow-cli

Personal CLI note-taker for turning meeting audio into cleaned Markdown notes.

2K 1 0

hermes-chat-recorder

Hermes Agent plugin: record every gateway chat message (any platform) to an Obsidian-style Markdown vault — voice transcripts and image descriptions included

2K 0 0