Performance Overview
Summary of Chronos’ real-world debugging performance across benchmarks, repositories, and languages.
Chronos-1 delivers state-of-the-art results on real-world debugging tasks, consistently outperforming both general-purpose language models and agent-based systems. Its performance reflects a system built specifically for repository-scale debugging, not code generation.
Highlights
- 80.33% on SWE-Bench Lite, the highest recorded score to date
- 67.3% real-world fix accuracy across large multi-file scenarios
- 89% developer preference in blind evaluations
- 40% faster debugging cycles compared to manual workflows
- Strong performance across complex ecosystems, including SymPy, Django, and Sphinx
- 4–5× improvement over general LLMs on multi-file debugging tasks
Chronos-1 is the first model designed entirely around debugging workflows, enabling accuracy levels that general code-completion models cannot achieve.