Multi Random Retrieval Benchmark

Chronos’ performance on complex debugging scenarios with dispersed context.

Chronos-1 was evaluated on 5,000 multi-random retrieval (MRR) debugging scenarios, a benchmark designed to mimic real conditions where bug context is scattered across dozens of files and months of code history.

A public sample of 500 scenarios is available, with the full benchmark releasing in Q1 2026.

Key Results

  • Debug Success Rate: 67.3% ± 2.1%
  • Root Cause Accuracy: 89%
  • Retrieval Precision: 92%
  • Retrieval Recall: 85%
  • Average Fix Iterations: 7.8
  • 40% faster than manual debugging

Compared to baseline models:

  • Baselines achieve 13.8–15% success
  • Chronos-1 is 5–7× stronger at root-cause attribution
  • Retrieval recall is 2–3× higher
  • Retrieval precision is 18–25% higher

These improvements reflect the strength of Chronos’ Adaptive Graph-Guided Retrieval and multi-hop dependency understanding.

On this page