Asr Python Packages

youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!

39.6M 8K 796

deepgram-sdk

Official Python SDK for Deepgram.

3.2M 448 136

whisperx

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

1.1M 23K 2K

nemo-toolkit

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

1.1M 18K 3K

sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages

689K 13K 2K

vosk

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

661K 15K 2K

sherpa-onnx-core

587K 13K 2K

whisper-normalizer

A python package for whisper normalizer

539K 79 17

funasr

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

422K 19K 2K

cn2an

📦 快速转化「中文数字」和「阿拉伯数字」～ (最新特性：分数，日期、温度等转化）

273K 763 84

speechmatics-rt

Python SDKs for Speechmatics APIs

211K 18 9

soniox

Python SDK for the Soniox API: realtime/async STT and TTS.

207K 9 5

speechmatics-voice

Python SDKs for Speechmatics APIs

157K 18 9

onnx-asr

A lightweight Python package for Automatic Speech Recognition using ONNX models

108K 337 31

whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

106K 3K 210

speechmatics-batch

Python SDKs for Speechmatics APIs

53K 18 9

voice-mode

Natural voice conversations with Claude Code

37K 1K 176

sherpa-onnx-bin

30K 13K 2K

sherpa-ncnn

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.

24K 2K 216

whisper-s2t

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

15K 574 76

achatbot

An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and remote(I/O bound) to run.

14K 89 18

transcribe-cpp-native

ggml speech-to-text inference for 16+ model families

9K 95 5

funasr-onnx

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

9K 19K 2K

werpy

🐍📦 Ultra-fast Python package for calculating and analyzing the Word Error Rate (WER). Built for the scalable evaluation of speech and transcription accuracy.

9K 28 6