Turn Chaos Into Structure. A Type-Safe AI Agent that extracts valid JSON from unstructured data using PydanticAI, FastHTML, and Gemini 2.5.
-
Updated
Jan 10, 2026 - Python
Turn Chaos Into Structure. A Type-Safe AI Agent that extracts valid JSON from unstructured data using PydanticAI, FastHTML, and Gemini 2.5.
Interactive Phoenix LiveView demonstrations of the Crucible Framework - showcasing ensemble voting, request hedging, statistical analysis, and more with mock LLMs
Reference implementation of CAAF — three-pillar agent framework with monotonic convergence.
Reliability and audit-evidence testing for LLM agents - wrap any agent, assert behavior, measure determinism, check grounding, emit an audit-grade report.
Official implementation of CHARM: Cascading Hallucination Aware Resolution and Mitigation for multi-step agentic RAG pipelines.
Map where your bolted-on AI feature breaks before customers do. A free Claude Code tool: fragility map, reliability score across six dimensions, ranked gaps, and a 30-day plan. Built by a threat-intel practitioner.
CrucibleFramework: A scientific platform for LLM reliability research on the BEAM
Reliability and hallucination mitigation research for tool-augmented legal AI agents using QC-Sentinel verification architecture.
Collection of LLM failure modes used on failmodes.com
Public artifact bundle for the preprint 'Lightweight Evaluation and Operational Scorecards for Tool-Using AI Agents'
Production-style LLM evaluation harness for structured clinical extraction — compares prompt strategies across accuracy, cost, and hallucination.
AI agent metacognition skills distilled from David Dunning's Self-Insight (2005): calibration, Dunning–Kruger awareness, the outside view, and feedback discipline for Claude Code, Codex, and compatible AI agents.
Preprint paper package — Lightweight Evaluation and Operational Scorecards for Tool-Using AI Agents (Zenodo DOI 10.5281/zenodo.20034550)
Add a description, image, and links to the llm-reliability topic page so that developers can more easily learn about it.
To associate your repository with the llm-reliability topic, visit your repo's landing page and select "manage topics."