Skip to main content

LLM rankings and prices

ranking

Model	grade
GPT-4-Turbo-2024-04-09	1258
Claude 3 Opus	1253
Gemini 1.5 Pro API-0409-Preview	1249
Meta Llama 3 70b Instruct	1213
Claude 3 Sonnet	1201
Command R+	1192
Claude 3 Haiku	1181
Mistral-Large-2402	1158
Qwen1.5-72B-Chat	1153
Command R	1150
Mistral Medium	1147
Meta Llama 3 8b Instruct	1147
Mixtral-8x22b-Instruct-v0.1	1145
Qwen1.5-32B-Chat	1134
GPT-3.5-Turbo-0613	1119
Qwen1.5-14B-Chat	1119
Mixtral-8x7b-Instruct-v0.1	1114
Yi-34B-Chat	1109
WizardLM-70B-v1.0	1108

evaluation standard

baichuan 13B-Form

RAG score

https://mp.weixin.qq.com/s/EdoA5fcyzgTw3LarMMe00g

LMSYS

🏆 LMSYS Chatbot Arena Leaderboard

echelon	grade	representative
the first echelon	Above 200 minute	GPT-4 Claude 3 medium mug and tankard
second echelon	Above 150 minute	Mistral medium mug and tankard Claude 3 small glasses Thousand Questions 72B
the third echelon	110 minute or more	GPT-3.5

GPT-4 has a 12% improvement compared to GPT-3.5

Claude 3 small cup has a 6% improvement compared to GPT-3.5

CLUE test review

CLUE test language understanding evaluation Baseline

https://mp.weixin.qq.com/s/cI92Fp2ic13_BKaRSgZw4g

price

Gemini: Currently only 1.0 Pro price