• cygnus@lemmy.ca
    link
    fedilink
    English
    arrow-up
    18
    arrow-down
    1
    ·
    12 hours ago

    Those charts are hilarious: wow, it gives the right answer 62.5% of the time and only makes up completely false answers 37.1% of the time! It’s like Russian roulette, but worse!

    • olympicyes@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      ·
      11 hours ago

      If you play Russian roulette with two bullets like a real man, then this model is about the same outcome!

    • regrub@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      11 hours ago

      Surely, people won’t use the slop generator in applications where being correct is important, right?

  • BetaDoggo_@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    11 hours ago

    In their human choice benchmarks it was only chosen 59% of the time compared to 4o. That’s a 15-20x cost increase for 9% difference.