Claude 3.7 Sonnet and Claude Code

cyrano@lemmy.dbzer0.com · 2 days ago

Claude 3.7 Sonnet and Claude Code

simple@lemm.ee · 2 days ago

in developing our reasoning models, we’ve optimized somewhat less for math and computer science competition problems, and instead shifted focus towards real-world tasks that better reflect how businesses actually use LLMs.

I was just about to say how useless these benchmarks are. Plenty of LLMs claim to be better than Claude and GPT4, but in real world use they’ve always been more reliable. Claude especially. Good to hear they’re not just chasing scores.