Production client engagements, described in general terms, alongside public reference implementations of the patterns behind them — and a few personal projects.
Client engagements
Construction-litigation document analysis Confidential client
Event-driven pipeline producing citation-grounded analysis over large litigation document sets.
Multi-step LLM orchestration with structured outputs at each boundary, model-tier routing (small models for extraction/classification, a larger model for synthesis), and retrieval over a client knowledge base with explicit chains of reasoning.
Event-driven ingestion · multi-step orchestration · model-tier routing · citation-grounded RAG · AWS Bedrock · Anthropic
Representative implementation: a retrieval-with-citations agent →
AI fraud detection for a certification body Confidential · advisory
Advisory engagement for applying AI to detect fraud in certification submittals.
Covering detection approach, model and data strategy, and how to evaluate and triage flags for manipulated or fabricated submissions.
Advisory · fraud detection · evaluation & triage strategy · Azure AI Foundry · OpenAI
Open-source reference implementations
US data-center map
Interactive map of operational, under-construction, and planned US data centers — refreshed daily.
Pulls live from OpenStreetMap plus a curated layer of announced AI megacampuses; Claude classifies each facility — purpose, AI-vs-general compute, operator type, and editorial notes. Static JSON, no backend.
Vite · TypeScript · MapLibre GL JS · Python (OSM Overpass) · Claude classification · GitHub Pages
live map → · repo →
Agentic legacy modernization
An LLM as primary developer modernizing a money-critical legacy Java calculator.
Wrapped in a golden-master characterization harness proving behavioral equivalence (552 tests across a 512-input grid), plus a trust report documenting where the model could not be trusted.
Java 17 · jqwik property tests · golden-master harness · GitHub Actions CI
github.com/stephenpadgett1/agentic-legacy-modernization →
Compliance RAG agent (SOC 2 / NIST 800-53)
Retrieval-augmented compliance Q&A with citations back to specific criteria.
Includes a 30-case eval harness measuring faithfulness and citation accuracy, with prompt caching applied where the retrieval context benefits most.
LangGraph · Claude (agent + LLM-as-judge) · Chroma · eval harness
github.com/stephenpadgett1/compliance-rag-agent →
Event-driven multi-step AI workflow
Webhook → durable queue → three-step workflow producing a structured recommendation with citations.
Multi-step AI with model-tier routing and failure isolation, end to end.
Mastra · webhook + queue · structured outputs
github.com/stephenpadgett1/event-driven-ai-workflow →
Video-understanding eval harness
Side-by-side eval of video-understanding models: a commercial stack vs. an open-source baseline.
LLM judge scoring relevance, faithfulness, and specificity; cost-aware per-call estimates; an audio-modality ablation. A reference for evaluating models honestly against a real use case.
Twelve Labs (Marengo / Pegasus) · CLIP + Claude baseline · LLM-as-judge
github.com/stephenpadgett1/video-understanding-eval-harness →
Mini-TMS (transportation management)
End-to-end transportation-management demo seeded with real carriers and freight lanes.
A GraphQL API, Kafka status events projected into an event store, a React timeline UI, and an LLM endpoint that narrates what happened on a load.
GraphQL (Apollo) · Kafka · Node / TypeScript · React · Claude
github.com/stephenpadgett1/mini-tms →