Skip to content
0
  • Home
  • Piero Bosio
  • Blog
  • World
  • Fediverso
  • News
  • Categories
  • Old Web Site
  • Recent
  • Popular
  • Tags
  • Users
  • Home
  • Piero Bosio
  • Blog
  • World
  • Fediverso
  • News
  • Categories
  • Old Web Site
  • Recent
  • Popular
  • Tags
  • Users
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse

Piero Bosio Social Web Site Personale Logo Fediverso

Social Forum federato con il resto del mondo. Non contano le istanze, contano le persone
  1. Home
  2. Categories
  3. Uncategorized
  4. Your LLM Won’t Stop Lying Any Time Soon'nResearchers call it “hallucination”; you might more accurately refer to it as confabulation, hornswaggle, hogwash, or just plain BS.

Your LLM Won’t Stop Lying Any Time Soon'nResearchers call it “hallucination”; you might more accurately refer to it as confabulation, hornswaggle, hogwash, or just plain BS.

Scheduled Pinned Locked Moved Uncategorized
1 Posts 1 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Cybersecurity & cyberwarfareundefined This user is from outside of this forum
    Cybersecurity & cyberwarfareundefined This user is from outside of this forum
    Cybersecurity & cyberwarfare
    wrote last edited by
    #1

    Your LLM Won’t Stop Lying Any Time Soon

    Researchers call it “hallucination”; you might more accurately refer to it as confabulation, hornswaggle, hogwash, or just plain BS. Anyone who has used an LLM has encountered it; some people seem to find it behind every prompt, while others dismiss it as an occasional annoyance, but nobody claims it doesn’t happen. A recent paper by researchers at OpenAI (PDF) tries to drill down a bit deeper into just why that happens, and if anything can be done.

    Spoiler alert: not really. Not unless we completely re-think the way we’re training these models, anyway. The analogy used in the conclusion is to an undergraduate in an exam room. Every right answer is going to get a point, but wrong answers aren’t penalized– so why the heck not guess? You might not pass an exam that way going in blind, but if you have studied (i.e., sucked up the entire internet without permission for training data) then you might get a few extra points. For an LLM’s training, like a student’s final grade, every point scored on the exam is a good point.

    The problem is that if you reward “I don’t know” in training, you may eventually produce a degenerate model that responds to every prompt with “IDK”. Technically, that’s true– the model is a stochastic mechanism; it doesn’t “know” anything. It’s also completely useless. Unlike some other studies, however, the authors do not conclude that so-called hallucinations are an inevitable result of the stochastic nature of LLMs.

    While that may be true, they point out it’s only the case for “base models”– pure LLMs. If you wrap the LLM with a “dumb” program able to parse information into a calculator, for example, suddenly the blasted thing can pretend to count. (That’s how undergrads do it these days, too.) You can also provide the LLM with a cheat-sheet of facts to reference instead of hallucinating; it sounds like what’s being proposed is a hybrid between an LLM and the sort of expert system you used to use Wolfram Alpha to access. (A combo we’ve covered before.)

    In that case, however, some skeptics might wonder why bother with the LLM at all, if the knowledge in the expert system is “good enough.” (Having seen one AI boom before, we can say with the judgement of history that the knowledge in an expert system isn’t good enough often enough to make many viable products.)

    Unfortunately, that “easy” solution runs back into the issue of grading: if you want your model to do well on the scoreboards and beat ChatGPT or DeepSeek at popular benchmarks, there’s a certain amount of “teaching to the test” involved, and a model that occasionally makes stuff up will apparently do better on the benchmarks than one that refuses to guess. The obvious solution, as the authors propose, is changing the benchmarks.

    If you’re interested in AI (and who isn’t, these days?), the paper makes an interesting, read. Interesting if, perhaps disheartening if you were hoping the LLMs would graduate from their eternal internship any time soon.

    Via ComputerWorld, by way of whereisyouredat.


    hackaday.com/2025/10/10/your-l…

    1 Reply Last reply
    0
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    Feed RSS
    Your LLM Won’t Stop Lying Any Time Soon'nResearchers call it “hallucination”; you might more accurately refer to it as confabulation, hornswaggle, hogwash, or just plain BS.

    Gli ultimi otto messaggi ricevuti dalla Federazione
    • Associazione Peacelinkundefined
      Associazione Peacelink

      #Gaza: i punti della prima fase dell'accordo di #pace | ANSA.it
      http://www.ansa.it/sito/notizie/mondo/2025/10/10/gaza-i-punti-della-prima-fase-dellaccordo-di-pace_2a6b5b5c-2ffc-4a97-b011-f90b75b1ff91.html

      read more

    • Maronno Winchester :antifa:undefined
      Maronno Winchester :antifa:

      bruscolina (@semedizecca.bsky.social)

      https://bsky.app/profile/semedizecca.bsky.social/post/3m2vgwqpmbs2i

      > Gattuso eunuco.

      read more

    • Associazione Peacelinkundefined
      Associazione Peacelink

      Droni e caccia, la #Turchia e la diplomazia degli armamenti

      I caccia #Kaan
      https://it.insideover.com/difesa/la-turchia-e-la-diplomazia-degli-armamenti-100-caccia-kaan-allarabia-saudita.html

      read more

    • Alistella 🧚🍰:snwfnw:undefined
      Alistella 🧚🍰:snwfnw:

      Lara Fabian - Adagio in Italiano - Piped

      https://piped.casasnow.noho.st/watch?v=xXdQj2Vxcp4

      > An alternative privacy-friendly YouTube frontend which is efficient by design.
      ❤️🎶🎧
      #buongiorno
      #UnoRadio
      #replay

      read more

    • L'Anarchiversitarioundefined
      L'Anarchiversitario

      Palestina: ufficialmente in vigore il cessate il fuoco a gaza. corrispondenze dalle striscia e analisi sul (cosiddetto) “piano di pace”.
      @anarchia
      Palestina. A mezzogiorno di venerdì 10 ottobre 2025, dopo 735 giorni di genocidio per mano israeliana contro l’intero popolo palestinese, è ufficialmente entrato in

      read more

    • Associazione Peacelinkundefined
      Associazione Peacelink

      #F35 addio, ora la #Spagna guarda al caccia turco #Kaan
      https://it.insideover.com/difesa/f-35-addio-la-spagna-guarda-al-caccia-turco.html

      read more

    • Maronno Winchester :antifa:undefined
      Maronno Winchester :antifa:

      Shibui Shashin (@shibui-shashin.bsky.social)

      https://bsky.app/profile/shibui-shashin.bsky.social/post/3m2tl44yyx6ow

      > Drying laundry - Porto, Portugal #laundry #porto #cityscape #scape #classicMono

      read more

    • saioundefined
      saio

      la terra vista da qua

      read more
    @pierobosio@soc.bosio.info
    Running NodeBB v4.6.0 Contributors
    Post suggeriti
    • Associazione Peacelinkundefined

      Droni e caccia, la #Turchia e la diplomazia degli armamenti'nI caccia #Kaanhttps://it.insideover.com/difesa/la-turchia-e-la-diplomazia-degli-armamenti-100-caccia-kaan-allarabia-saudita.html

      Watching Ignoring Scheduled Pinned Locked Moved Uncategorized turchia kaan
      1
      0 Votes
      1 Posts
      0 Views
      No one has replied
    • L'Anarchiversitarioundefined

      Palestina: ufficialmente in vigore il cessate il fuoco a gaza.

      Watching Ignoring Scheduled Pinned Locked Moved Uncategorized
      1
      0 Votes
      1 Posts
      0 Views
      No one has replied
    • saioundefined

      la terra vista da qua

      Watching Ignoring Scheduled Pinned Locked Moved Uncategorized
      1
      1
      0 Votes
      1 Posts
      0 Views
      No one has replied
    • Bruce Sterling @brucesundefined

      This post did not contain any content.

      Watching Ignoring Scheduled Pinned Locked Moved Uncategorized
      1
      1
      0 Votes
      1 Posts
      0 Views
      No one has replied
    • Login

    • Login or register to search.
    • First post
      Last post