Speech Python Packages

datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

132.7M 22K 3K

torchaudio

Data manipulation and transformation for audio signal processing, powered by PyTorch

12.3M 3K 783

modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

5.5M 9K 956

gtts

Python library and CLI tool to interface with Google Translate's text-to-speech API

3.8M 3K 385

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

1.2M 10K 796

whisperx

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

1.1M 23K 2K

silero

Silero Models: pre-trained text-to-speech models made embarrassingly simple

198K 6K 366

tts

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

137K 46K 6K

monotonic-alignment-search

Monotonically align text and speech

114K 3 1

whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

106K 3K 210

voxcpm

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

100K 32K 4K

hume

Python client for Hume AI

79K 177 46

deepfilternet

Noise supression using deep filtering

66K 4K 482

supertonic

Lightning-Fast, On-Device TTS — running natively via ONNX.

40K 82 17

penn

Pitch Estimating Neural Networks (PENN)

21K 276 26

pysptk

A python wrapper for Speech Signal Processing Toolkit (SPTK).

14K 451 80

achatbot

An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and remote(I/O bound) to run.

14K 89 18

allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

11K 735 100

inaspeechsegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

10K 900 152

nkululeko

Machine learning speaker characteristics

9K 46 12

clearvoice

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

7K 4K 351

nlp

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

7K 22K 3K

africanwhisper

🚀 Framework for seamless fine-tuning of Whisper model on a multi-lingual dataset and deployment to prod.

7K 38 6

aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

7K 3K 276