Speech Processing Python Packages

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

1.2M 10K 796

whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

106K 3K 210

torchscale

Foundation Architecture for (M)LLMs

72K 3K 225

indic-num2words

Python library for converting numbers to words for all Indian Languages.

31K 38 14

pysptk

A python wrapper for Speech Signal Processing Toolkit (SPTK).

14K 451 80

spafe

:sound: spafe: Simplified Python Audio Features Extraction

13K 485 77

swift-f0

Fast and accurate fundamental frequency (F0) detector using convolutional neural networks

10K 168 21

resemble-enhance

AI powered speech denoising and enhancement

6K 2K 288

nnmnkwii

Library to build speech synthesis systems designed for easy and fast prototyping.

6K 399 71

voicefixer

General Speech Restoration

5K 1K 160

silero-vad-lite

Lightweight wrapper for Silero VAD using internal ONNX Runtime and with no python package dependencies

4K 17 1

diarize

Speaker diarization for Python — "who spoke when?" CPU-only, no API keys, Apache 2.0. ~10.8% DER on VoxConverse, 8x faster than real-time.

3K 91 8

stark-engine

S.T.A.R.K. - Speech And Text Algorithmic Recognition Kit

3K 62 4

bournemouth-forced-aligner

Extract phoneme-level timestamps from speeh audio.

3K 150 15

attenlabs-saa

Addressee detection for voice agents: device-directed speech detection that runs before STT, so background speech, side conversations, and the agent's own TTS echo never trigger it. No wake word, model-agnostic, drop-in for LiveKit, Pipecat, ElevenLabs, Twilio, and OpenAI. The layer your VAD and turn detection are missing.

2K 170 117

saa-livekit-client

2K 170 117

deepaudio-x

A python library to train Deep Neural Networks on various audio tasks using Self-Supervised backbones.

2K 32 1

everyvoice

The EveryVoice TTS Toolkit - Text To Speech for your language

2K 43 4

rutextnorm

Fast Russian Text normalization for TTS using only RegEx.

1K 33 3

saa-pipecat-client

1K 170 117

russian-tts-normalization

Fast Russian Text normalization for TTS using only RegEx.

1K 33 3

scoreq

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

1K 116 8

vak

A neural network framework for researchers studying acoustic communication

1K 91 16

voicetut-tts

VoiceTut-TTS is an Egyptian-Arabic text-to-speech system fine-tuned from OmniVoice on ~380 hours of Egyptian podcast speech. It produces natural Egyptian speech with seamless Arabic ↔ English code-switching, ships 15 built-in studio voices, supports zero-shot voice cloning

1K 10 1