Performance Overview

Summary of Chronos’ real-world debugging performance across benchmarks, repositories, and languages.

Chronos-1 delivers state-of-the-art results on real-world debugging tasks, consistently outperforming both general-purpose language models and agent-based systems. Its performance reflects a system built specifically for repository-scale debugging, not code generation.

Highlights

80.33% on SWE-Bench Lite, the highest recorded score to date
67.3% real-world fix accuracy across large multi-file scenarios
89% developer preference in blind evaluations
40% faster debugging cycles compared to manual workflows
Strong performance across complex ecosystems, including SymPy, Django, and Sphinx
4–5× improvement over general LLMs on multi-file debugging tasks

Chronos-1 is the first model designed entirely around debugging workflows, enabling accuracy levels that general code-completion models cannot achieve.

Performance Overview

Highlights

On this page