• 1 Post
  • 1 Comment
Joined 2 years ago
cake
Cake day: June 16th, 2023

help-circle
  • Lying requires intent. Currently popular LLMs build responses one token at a time—when it starts writing a sentence, it doesn’t know how it will end, and therefore can’t have an opinion about the truth value of it. (I’d go further and claim it can’t really “have an opinion” about anything, but even if it can, it can neither lie nor tell the truth on purpose.) It can consider its own output (and therefore potentially have an opinion about whether it is true or false) only after it has been generated, when generating the next token.

    “Admitting” that it’s lying only proves that it has been exposed to “admission” as a pattern in its training data.