PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Speech Python Packages

Python packages with the GitHub topic speech. Sorted by relevance, with stars and monthly downloads.
huggingface
datasets

πŸ€— The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

127.8M 22K 3K
pytorch
torchaudio

Data manipulation and transformation for audio signal processing, powered by PyTorch

13.4M 3K 776
modelscope
modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

4.5M 9K 943
pndurette
gtts

Python library and CLI tool to interface with Google Translate's text-to-speech API

1.5M 3K 384
m-bain
whisperx

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

1M 22K 2K
snakers4
silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

825K 9K 771
coqui-ai
tts

πŸΈπŸ’¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

183K 45K 6K
eginhard
monotonic-alignment-search

Monotonically align text and speech

176K 4 1
snakers4
silero

Silero Models: pre-trained text-to-speech models made embarrassingly simple

175K 6K 366
OpenBMB
voxcpm

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

116K 19K 2K
linto-ai
whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

96K 3K 211
HumeAI
hume

Python client for Hume AI

95K 174 44
Rikorose
deepfilternet

Noise supression using deep filtering

68K 4K 459
supertone-inc
supertonic

Lightning-Fast, On-Device TTS β€” running natively via ONNX.

30K 51 8
interactiveaudiolab
penn

Pitch Estimating Neural Networks (PENN)

26K 273 26
ai-bot-pro
achatbot

An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and remote(I/O bound) to run.

18K 89 18
r9y9
pysptk

A python wrapper for Speech Signal Processing Toolkit (SPTK).

15K 451 80
sensein
senselab

senselab is a Python package that simplifies building pipelines for biometric (e.g. speech, voice, video, etc) analysis.

14K 38 9
ina-foss
inaspeechsegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

8K 888 150
modelscope
clearvoice

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

8K 4K 341
readbeyond
aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

8K 3K 275
xinjli
allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

8K 730 101
huggingface
nlp

πŸ€— The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

8K 22K 3K
Sinapsis-AI
sinapsis

Modular and Universal AI platform

8K 40 11
    • Data from PyPI, GitHub, ClickHouse, and BigQuery