You must log in or register to comment.
Those charts are hilarious: wow, it gives the right answer 62.5% of the time and only makes up completely false answers 37.1% of the time! It’s like Russian roulette, but worse!
If you play Russian roulette with two bullets like a real man, then this model is about the same outcome!
Surely, people won’t use the slop generator in applications where being correct is important, right?
In their human choice benchmarks it was only chosen 59% of the time compared to 4o. That’s a 15-20x cost increase for 9% difference.