Skip to content
0
  • Home
  • Piero Bosio
  • Blog
  • World
  • Fediverso
  • News
  • Categories
  • Old Web Site
  • Recent
  • Popular
  • Tags
  • Users
  • Home
  • Piero Bosio
  • Blog
  • World
  • Fediverso
  • News
  • Categories
  • Old Web Site
  • Recent
  • Popular
  • Tags
  • Users
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse

Piero Bosio Social Web Site Personale Logo Fediverso

Social Forum federato con il resto del mondo. Non contano le istanze, contano le persone
gert@poliversity.itundefined

Gino Martorelli

@gert@poliversity.it
About
Posts
2
Topics
2
Shares
0
Groups
0
Followers
0
Following
0

View Original

Posts

Recent Best Controversial

  • Language models cannot reliably distinguish belief from knowledge and fact
    gert@poliversity.itundefined gert@poliversity.it

    Language models cannot reliably distinguish belief from knowledge and fact

    Abstract
    -----------
    ยซAs language models (LMs) increasingly infiltrate into high-stakes domains such as law, medicine, journalism and science, their ability to distinguish belief from knowledge, and fact from fiction, becomes imperative. Failure to make such distinctions can mislead diagnoses, distort judicial judgments and amplify misinformation. Here we evaluate 24 cutting-edge LMs using a new KaBLE benchmark of 13,000 questions across 13 epistemic tasks. Our findings reveal crucial limitations. In particular, all models tested systematically fail to acknowledge first-person false beliefs, with GPT-4o dropping from 98.2% to 64.4% accuracy and DeepSeek R1 plummeting from over 90% to 14.4%. Further, models process third-person false beliefs with substantially higher accuracy (95% for newer models; 79% for older ones) than first-person false beliefs (62.6% for newer; 52.5% for older), revealing a troubling attribution bias. We also find that, while recent models show competence in recursive knowledge tasks, they still rely on inconsistent reasoning strategies, suggesting superficial pattern matching rather than robust epistemic understanding. Most models lack a robust understanding of the factive nature of knowledge, that knowledge inherently requires truth. These limitations necessitate urgent improvements before deploying LMs in high-stakes domains where epistemic distinctions are crucial.ยป

    #ai #LLMs #epistemology #knowledge

    https://www.nature.com/articles/s42256-025-01113-8

    Uncategorized llms epistemology knowledge

  • Gli eserciti non possono gestire ciรฒ che non misurano.
    gert@poliversity.itundefined gert@poliversity.it

    Gli eserciti non possono gestire ciรฒ che non misurano. Questo rapporto sottolinea l'importanza della valutazione e della misurazione delle emissioni militari e l'impatto di tale valutazione sulla mitigazione delle emissioni e sull'orientamento dei finanziamenti per il clima.

    #climatechange #war #MilitaryEmission

    https://www.sciencedirect.com/science/article/pii/S2214790X25000437

    Uncategorized climatechange war militaryemission
  • Login

  • Login or register to search.
  • First post
    Last post