Browse latest
Tools & PlatformsHugging Face - Blog · May 27, 2026

Reachy Mini goes fully local

Reachy Mini can now run its conversation app entirely locally without cloud dependency, enhancing privacy and reducing latency. This is achieved by running the full stack, including speech-to-speech and the LLM, on the user's machine.

Author: Morein.ai Editorial

Reachy Mini, a conversational robot, has been updated to run its entire conversation application locally. This eliminates the previous requirement of sending audio to a server, significantly enhancing privacy and reducing latency. The new setup allows users to operate the robot without an internet connection or reliance on external APIs.

The system is powered by a speech-to-speech pipeline, which includes Voice Activity Detection (VAD), Speech-to-Text (STT), a Large Language Model (LLM), and Text-to-Speech (TTS). Users can serve the LLM locally using tools like Hugging Face's llama.cpp, and the speech-to-speech library handles the rest of the audio processing.

This local operation provides greater flexibility and control. Users can customize components within the cascaded architecture, swapping in different models for VAD, STT, LLM, and TTS to optimize for specific needs like multilingual support or single-language performance.

Serving your own speech-to-speech server also offers significant advantages. It provides a single command-line interface for booting a WebSocket server, and it decouples the LLM from the voice loop, allowing for more efficient inference and reduced latency. This setup supports various LLM serving options, including local execution with llama.cpp or vLLM, or utilizing cloud-managed services like Hugging Face Inference Endpoints.

Read original source

Related articles