We tried Google’s AI glasses and they’re almost there
Google showcased its new AI-powered glasses at the I/O developer conference, offering a glimpse into the future of augmented reality. These prototype glasses feature an in-lens display for real-time information and AI integration, with a focus on core functionality. While still in early development, the glasses demonstrate promising applications like live translation and AI-enhanced photography.
Google unveiled a prototype of its AI-powered glasses at the I/O developer conference, offering a preview of its ambitious augmented reality vision. These glasses, distinct from the upcoming audio-only version, feature an in-lens display that overlays helpful information onto the real world. This includes widgets for weather, directions, and live translation, along with user-designed AI functionalities. The glasses are designed to pair with both iOS and Android phones.
The prototype, developed in partnership with companies like Warby Parker and Samsung, prioritizes experimental display technology and battery life over cosmetic refinement. This early version provides a foundation for Google to test the core "insides" of the device within a basic, comfortable frame, acknowledging that the final shipping product will differ significantly in aesthetics and features such as automatic head detection.
User interaction with the glasses is primarily through Gemini, activated by a two-second press on the frame. While the prototype's camera activates simultaneously with Gemini, the production version will offer user configuration. Initial tests demonstrated music playback and photo capture, with the latter allowing for AI manipulation of images, such as transforming a person into an anime character – though this process took around 45 seconds due to heavy Wi-Fi load at the venue.
The in-lens display presented a home screen with preloaded widgets. While the single display over the right eye had some focus issues potentially related to the reviewer's prescription contacts, the platform supports both single and dual displays. A standout feature was the live translation, where spoken Spanish was automatically translated into English text on the display and spoken English audio, showcasing a highly practical application for travelers. Navigation through Google Maps was also demonstrated, highlighting the potential for seamless integration into daily activities.
Related articles
Build real agentic apps using CUGA: two dozen working examples on a lightweight harness
CUGA, IBM's open-source Agent Harness, simplifies building agentic applications by handling infrastructure, allowing developers to focus on tools and prompts. It offers pre-assembled components for planning, execution, and state management, significantly reducing development time. CUGA has topped agent benchmarks like AppWorld and WebArena.
OpenAI launches new initiative to help find and patch open source bugs
OpenAI has launched "Patch the Planet," a new initiative in partnership with cybersecurity firm Trail of Bits, to enhance the security of open-source projects. This program aims to assist maintainers in identifying and patching bugs, utilizing OpenAI's AI-powered security tools while reducing the burden on project teams.
PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters
Baidu has released PP-OCRv6, an advanced optical character recognition (OCR) model supporting 50 languages. Available on Hugging Face, this version significantly improves accuracy and efficiency across various parameter sizes, from 1.5 million to 34.5 million, marking a substantial leap in multilingual OCR technology.
