Performance Overview

Summary of Chronos’ real-world debugging performance across benchmarks, repositories, and languages.

Chronos-1 delivers state-of-the-art results on real-world debugging tasks, consistently outperforming both general-purpose language models and agent-based systems. Its performance reflects a system built specifically for repository-scale debugging, not code generation.

Highlights

  • 80.33% on SWE-Bench Lite, the highest recorded score to date
  • 67.3% real-world fix accuracy across large multi-file scenarios
  • 89% developer preference in blind evaluations
  • 40% faster debugging cycles compared to manual workflows
  • Strong performance across complex ecosystems, including SymPy, Django, and Sphinx
  • 4–5× improvement over general LLMs on multi-file debugging tasks

Chronos-1 is the first model designed entirely around debugging workflows, enabling accuracy levels that general code-completion models cannot achieve.

On this page