News
Kopernica uses multimodal inputs, such as real-time audio and video, in combination with behavioral intelligence to allow AI ...
Speech and language impairments affect over a million children every year, and identifying and treating these conditions early is key to helping these children overcome them.
ManaTTS is the largest open Persian speech dataset with 100+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.
Parakeet TDT 0.6B is a 600-million-parameter automatic speech recognition model. It can transcribe 60 minutes of audio per ...
An attractive proposition for commercial enterprises and indie developers looking to build speech recognition and transcription ...
Cross-Dataset Representation Learning for Unsupervised Deep Clustering in Human Activity Recognition
Abstract: This study introduces a novel representation learning method to enhance unsupervised deep clustering in Human Activity ... not only advances recognition accuracy in HAR but also demonstrates ...
In a defining moment for Arabic-language artificial intelligence, CNTXT AI has unveiled Munsit, a next-generation Arabic speech recognition model that is not only the most accurate ever created for ...
Built in the UAE, Munsit sets a new global standard for Arabic speech ... recognition systems — including OpenAI’s Whisper and GPT-4o Transcribe, Meta’s SeamlessM4T, ElevenLabs’ Scribe, and Microsoft ...
The participants tried out different designs for AI content moderation systems, which varied in how they labeled and presented speech that is ... 2025 CHI Conference on Human Factors in Computing ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results