faithfulness
Geometric LLM grounding verification — deterministic, auditable, no second LLM. Python library for measuring how faithfully model outputs reflect their sources.
RAG Benchmarking — Framework-agnostic RAG/agentic-AI evaluation harness. Faithfulness, agentic metrics, EU AI Act Article 15 accuracy evidence. Apache 2.0.
Verify LLM output against your source documents. Catch hallucinations in RAG pipelines and agentic workflows before they reach users.
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models