Multi Random Retrieval Benchmark
Chronos’ performance on complex debugging scenarios with dispersed context.
Chronos-1 was evaluated on 5,000 multi-random retrieval (MRR) debugging scenarios, a benchmark designed to mimic real conditions where bug context is scattered across dozens of files and months of code history.
A public sample of 500 scenarios is available, with the full benchmark releasing in Q1 2026.
Key Results
- Debug Success Rate: 67.3% ± 2.1%
- Root Cause Accuracy: 89%
- Retrieval Precision: 92%
- Retrieval Recall: 85%
- Average Fix Iterations: 7.8
- 40% faster than manual debugging
Compared to baseline models:
- Baselines achieve 13.8–15% success
- Chronos-1 is 5–7× stronger at root-cause attribution
- Retrieval recall is 2–3× higher
- Retrieval precision is 18–25% higher
These improvements reflect the strength of Chronos’ Adaptive Graph-Guided Retrieval and multi-hop dependency understanding.