Multi Random Retrieval Benchmark

Chronos-1 was evaluated on 5,000 multi-random retrieval (MRR) debugging scenarios, a benchmark designed to mimic real conditions where bug context is scattered across dozens of files and months of code history.

A public sample of 500 scenarios is available, with the full benchmark releasing in Q1 2026.

Key Results

Debug Success Rate: 67.3% ± 2.1%
Root Cause Accuracy: 89%
Retrieval Precision: 92%
Retrieval Recall: 85%
Average Fix Iterations: 7.8
40% faster than manual debugging

Compared to baseline models:

Baselines achieve 13.8–15% success
Chronos-1 is 5–7× stronger at root-cause attribution
Retrieval recall is 2–3× higher
Retrieval precision is 18–25% higher

These improvements reflect the strength of Chronos’ Adaptive Graph-Guided Retrieval and multi-hop dependency understanding.

Multi Random Retrieval Benchmark

Key Results

On this page