Why a 94% Citation Hallucination in Grok-3 Forced a Rethink of Factuality Benchmarks

https://wool-wiki.win/index.php/What_%22Low_Vectara_%2B_High_AA-Omniscience%22_Teaches_Us_About_Summarization_vs_Factuality

Grok-3 hit 94% citation hallucination while the FACTS benchmark reported a 68.8 score — hard numbers that changed production risk estimates The data suggests the situation was worse than the vendor materials implied

Submitted on 2026-03-05 21:29:29