Reachy Mini goes fully local
Reachy Mini can now run its conversation app entirely locally without cloud dependency, enhancing privacy and reducing latency. This is achieved by running the full stack, including speech-to-speech and the LLM, on the user's machine.
Reachy Mini, a conversational robot, has been updated to run its entire conversation application locally. This eliminates the previous requirement of sending audio to a server, significantly enhancing privacy and reducing latency. The new setup allows users to operate the robot without an internet connection or reliance on external APIs.
The system is powered by a speech-to-speech pipeline, which includes Voice Activity Detection (VAD), Speech-to-Text (STT), a Large Language Model (LLM), and Text-to-Speech (TTS). Users can serve the LLM locally using tools like Hugging Face's llama.cpp, and the speech-to-speech library handles the rest of the audio processing.
This local operation provides greater flexibility and control. Users can customize components within the cascaded architecture, swapping in different models for VAD, STT, LLM, and TTS to optimize for specific needs like multilingual support or single-language performance.
Serving your own speech-to-speech server also offers significant advantages. It provides a single command-line interface for booting a WebSocket server, and it decouples the LLM from the voice loop, allowing for more efficient inference and reduced latency. This setup supports various LLM serving options, including local execution with llama.cpp or vLLM, or utilizing cloud-managed services like Hugging Face Inference Endpoints.
Related articles
Build real agentic apps using CUGA: two dozen working examples on a lightweight harness
CUGA, IBM's open-source Agent Harness, simplifies building agentic applications by handling infrastructure, allowing developers to focus on tools and prompts. It offers pre-assembled components for planning, execution, and state management, significantly reducing development time. CUGA has topped agent benchmarks like AppWorld and WebArena.
OpenAI launches new initiative to help find and patch open source bugs
OpenAI has launched "Patch the Planet," a new initiative in partnership with cybersecurity firm Trail of Bits, to enhance the security of open-source projects. This program aims to assist maintainers in identifying and patching bugs, utilizing OpenAI's AI-powered security tools while reducing the burden on project teams.
PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters
Baidu has released PP-OCRv6, an advanced optical character recognition (OCR) model supporting 50 languages. Available on Hugging Face, this version significantly improves accuracy and efficiency across various parameter sizes, from 1.5 million to 34.5 million, marking a substantial leap in multilingual OCR technology.
