A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
-
Updated
Jun 12, 2024 - Python
A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
An open source NLP as a service project focused on providing state of the art systems with ease. Training and inference by simple docker commands
🚀 Framework for seamless fine-tuning of Whisper model on a multi-lingual dataset and deployment to prod.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Voice to Text Model using OpenAI's Whisper
Port of OpenAI's Whisper model in C/C++
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Live speech transcription and translation in your browser
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.
Freeswitch ASR module to working with wisper_cpp
Demo using PhoWhisper models of VinAI built with Transformers.js + Next.js
Official Python SDK for Deepgram's automated speech recognition APIs.
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Go SDK for Deepgram's automated speech recognition APIs.
OBS plugin for local speech recognition and captioning using AI
Local ML voice chat using high-end models.
This repository accompanies my MSc Thesis for the degree Voice Technology, storing all referenced data and other relevant resources.
Add a description, image, and links to the speech-recognition topic page so that developers can more easily learn about it.
To associate your repository with the speech-recognition topic, visit your repo's landing page and select "manage topics."