🥋Sensei

The open-source qualification engine for AI agents

Test. Evaluate. Certify.
Before you hire an agent, ask the Sensei.

View on GitHub Read the Docs 🥋 Marketplace

Three-Layer Evaluation

Every agent is tested across three dimensions. No shortcuts.

Task Execution

Can the agent do the job?

Measure real performance against domain-specific KPIs. Each task is scored on concrete, quantifiable metrics — not vibes.

Reasoning

Can it explain its decisions?

Probe the agent’s thought process. Great execution means nothing if the agent can’t articulate why it made a choice.

Self-Improvement

Can it learn from feedback?

Give the agent feedback and watch it adapt. The best agents don’t just perform — they evolve.

See It In Action

Watch Sensei evaluate an agent in real-time. Pick a suite and see how the three layers unfold.

sensei — evaluation

▋

How It Works

A simple, structured pipeline from agent to verdict.

🤖Your AgentHTTP / Stdio

🥋SenseiEngine

🎯 Task

🧠 Reasoning

📈 Growth

📊Score0–100

🏅BadgeDecision

🤖Your AgentHTTP / Stdio

🥋SenseiEngine

🎯 Task

🧠 Reason

📈 Growth

📊Score0–100

🏅BadgeDecision

Built-In Test Suites

Battle-tested evaluation suites for professional and fun agent roles. Create your own in minutes.

📞5 scenarios

SDR

Cold outreach, email personalization, discovery call analysis, and pipeline qualification

🎧5 scenarios

Customer Support

Ticket resolution, de-escalation, multi-issue handling, and reasoning about approach

✍️5 scenarios

Content Writer

Blog posts, LinkedIn threads, product launch emails, and editorial adaptation

🍸5 scenarios

Bartender

🎮 Fun

Cocktail crafting, drunk customer handling, chaotic group orders, and allergy awareness

🎲5 scenarios

Dungeon Master

🎮 Fun

Tavern scenes, creative combat, player management, and chaotic party handling

🐱3 scenarios

Cat Interview

🎮 Fun

Job interview for Senior Napping Engineer at MeowCorp — stay in character

Three Lines to Qualify

Load a suite, create an adapter, run. That's it.

qualify.ts

import { SuiteLoader, Runner, createAdapter } from '@mondaycom/sensei-engine';

const suite = await loader.loadFile('./suites/sdr-qualification/suite.yaml');

const adapter = createAdapter(suite.agent);

const result = await new Runner(adapter).run(suite);

// result.scores.overall → 87.3

// result.badge → "silver"

Suite Marketplace

Discover 70+ professional evaluation suites built by the community. From SDR and engineering to marketing and product — find the right benchmark for your agent, or publish your own.

70+ Suites

Professional benchmarks across engineering, sales, marketing, design, and more

Belt Rankings

9-tier karate-inspired belt system from White to Red — earn your rank

Publish Yours

Create and publish your own evaluation suites for the community

Browse the Marketplace →