Browse latest
Tools & PlatformsAI News & Artificial Intelligence | TechCrunch · June 2, 2026

New Microsoft tool lets devs spin up AI behavior tests using text descriptions

Microsoft has introduced ASSERT, an open-source framework designed to simplify the testing of AI systems. It allows developers to create specific behavioral tests for their AI applications using natural language descriptions, ensuring the AI performs as intended for a particular product or service.

Author: Morein.ai Editorial

Microsoft has launched ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing), an open-source framework that helps developers ensure their AI systems behave as intended for specific products or services. This initiative addresses a growing need in the industry to move beyond general AI evaluations towards application-specific testing.

ASSERT allows developers to use natural language descriptions of desired AI goals, policies, or behaviors. The tool then translates these descriptions into structured sets of acceptable and unacceptable behaviors, generates test cases, and evaluates the AI system against them. It also records the AI’s actions for detailed inspection of any failures.

Developers can customize these evaluations further by providing system context, tools, and constraints. For example, a developer could specify that a document research AI agent should not send emails outside the company or should limit confidential information sharing. ASSERT will then generate tests to continuously verify adherence to these rules.

According to Sarah Bird, chief product officer of Responsible AI at Microsoft, ASSERT fills a crucial gap where broader evaluations fall short. She emphasized that understanding an AI system's behavior through application-specific testing is critical for making sound decisions and ensuring a trustworthy system.

This release aligns with a broader industry trend focusing on repeatable testing and regression checks for increasingly capable AI models. ASSERT can be utilized during the AI system's development, after deployment, and for ongoing monitoring, offering a comprehensive solution for AI behavior evaluation.

Read original source

Related articles