SWE Bench Leaderboard
Compare models on SWE Bench benchmarks.
Rank | Model | Score | Organization | License |
---|---|---|---|---|
1 | o3OpenAI | 69.1 | OpenAI | Proprietary |
2 | o3OpenAI | 69.1 | OpenAI | Proprietary |
3 | o4-miniOpenAI | 68.1% | OpenAI | Proprietary |
4 | Gemini 2.5 Pro PreviewGoogle | 63.8 | Proprietary | |
5 | Claude 3.7 SonnetAnthropic | 62.3 | Anthropic | Proprietary |
6 | GPT-4o miniOpenAI | 61% | OpenAI | Proprietary |
7 | o3-miniOpenAI | 61 | OpenAI | Proprietary |
8 | GPT-4.1OpenAI | 55 | OpenAI | Proprietary |
9 | Gemini 2.0 FlashGoogle | 51.8 | Proprietary | |
10 | Gemini 2.0 FlashGoogle | 51.8% | Proprietary | |
11 | Claude 3.5 SonnetAnthropic | 49 | Anthropic | Proprietary |
12 | o1OpenAI | 48.9 | OpenAI | Proprietary |
13 | Claude 3.5 HaikuAnthropic | 40.6 | Anthropic | Proprietary |
14 | Claude 3 HaikuAnthropic | 40.6 | Anthropic | Proprietary |
15 | GPT-4o 2024-05-13OpenAI | 31 | OpenAI | Proprietary |
16 | GPT-4oOpenAI | 31 | OpenAI | Proprietary |
17 | GPT-4oOpenAI | 31 | OpenAI | Proprietary |
18 | GPT-4oOpenAI | 31% | OpenAI | Proprietary |
19 | GPT-4.1 miniOpenAI | 23.6 | OpenAI | Proprietary |