Probably raises $9M to build a more reliable kind of AI
Probably, a new AI company, has secured $9 million in seed funding to address the persistent issue of hallucinations in large language models. The company aims to achieve 99.99% accuracy by developing a rigorous error-detection system, potentially allowing AI to run on smaller, more cost-effective models.
Despite advancements in large language models (LLMs), errors and "hallucinations" remain a significant challenge. These inaccuracies appear even in the most sophisticated models, and the industry is actively seeking effective solutions to mitigate them.
Probably, a new company backed by $9 million in seed funding from Andreessen Horowitz, is tackling this problem head-on. Their goal is to prevent factual errors and hallucinations from ever reaching users, striving for an accuracy level of 99.99%—a standard common in deterministic systems but notoriously difficult to achieve with AI.
Probably's initial product is a data science tool designed for rapid analysis of complex datasets. Each result includes a citation and a detailed audit trail, reflecting a growing trend in AI transparency. To maintain accuracy, the system employs an "elaborate harness" where an LLM's preliminary answers are validated against a deterministic system that flags inconsistencies.
This innovative approach allows Probably's data science tool to operate effectively on significantly smaller AI models. Founder Peter Elias notes that their current model is "four classes weaker than the frontier models," enabling local hardware operation and substantially reducing the token costs associated with AI processing. The company plans to extend this technology to other precision-sensitive applications like accounting and medical services.
Related articles
The AI world is getting ‘loopy’
AI models are taking a significant leap forward with the adoption of "agentic loops," where AI agents continuously prompt each other to improve code and solve complex problems. This approach, though potentially resource-intensive, promises to unlock new levels of autonomous problem-solving and efficiency in AI applications.
Codex-maxxing for long-running work
Codex is increasingly being used by organizations to support long-running projects that go beyond a single prompt. This whitepaper by Jason Liu offers practical strategies for leveraging Codex as a persistent workspace, managing complex workflows and sustaining progress.
Nobel laureate John Jumper is leaving DeepMind for rival Anthropic
Nobel laureate John Jumper is departing Google DeepMind to join its competitor, Anthropic, after dedicating nearly nine years to DeepMind, where he led the AlphaFold team. Jumper, who shared a Nobel Prize for his work on AlphaFold, expressed gratitude for his time at DeepMind while looking forward to new endeavors.
