← Centaur Services

Selected work

Production client engagements, described in general terms, alongside public reference implementations of the patterns behind them — and a few personal projects.

Client engagements

Construction-litigation document analysis Confidential client

Event-driven pipeline producing citation-grounded analysis over large litigation document sets.

Multi-step LLM orchestration with structured outputs at each boundary, model-tier routing (small models for extraction/classification, a larger model for synthesis), and retrieval over a client knowledge base with explicit chains of reasoning.

Event-driven ingestion · multi-step orchestration · model-tier routing · citation-grounded RAG · AWS Bedrock · Anthropic

AI fraud detection for a certification body Confidential · advisory

Advisory engagement for applying AI to detect fraud in certification submittals.

Covering detection approach, model and data strategy, and how to evaluate and triage flags for manipulated or fabricated submissions.

Advisory · fraud detection · evaluation & triage strategy · Azure AI Foundry · OpenAI

Open-source reference implementations

US data-center map

Interactive map of operational, under-construction, and planned US data centers — refreshed daily.

Pulls live from OpenStreetMap plus a curated layer of announced AI megacampuses; Claude classifies each facility — purpose, AI-vs-general compute, operator type, and editorial notes. Static JSON, no backend.

Vite · TypeScript · MapLibre GL JS · Python (OSM Overpass) · Claude classification · GitHub Pages

Agentic legacy modernization

An LLM as primary developer modernizing a money-critical legacy Java calculator.

Wrapped in a golden-master characterization harness proving behavioral equivalence (552 tests across a 512-input grid), plus a trust report documenting where the model could not be trusted.

Java 17 · jqwik property tests · golden-master harness · GitHub Actions CI

Compliance RAG agent (SOC 2 / NIST 800-53)

Retrieval-augmented compliance Q&A with citations back to specific criteria.

Includes a 30-case eval harness measuring faithfulness and citation accuracy, with prompt caching applied where the retrieval context benefits most.

LangGraph · Claude (agent + LLM-as-judge) · Chroma · eval harness

Event-driven multi-step AI workflow

Webhook → durable queue → three-step workflow producing a structured recommendation with citations.

Multi-step AI with model-tier routing and failure isolation, end to end.

Mastra · webhook + queue · structured outputs

Video-understanding eval harness

Side-by-side eval of video-understanding models: a commercial stack vs. an open-source baseline.

LLM judge scoring relevance, faithfulness, and specificity; cost-aware per-call estimates; an audio-modality ablation. A reference for evaluating models honestly against a real use case.

Twelve Labs (Marengo / Pegasus) · CLIP + Claude baseline · LLM-as-judge

Mini-TMS (transportation management)

End-to-end transportation-management demo seeded with real carriers and freight lanes.

A GraphQL API, Kafka status events projected into an event store, a React timeline UI, and an LLM endpoint that narrates what happened on a load.

GraphQL (Apollo) · Kafka · Node / TypeScript · React · Claude

Personal

AI video & image production

Personal R&D in generative video and image work.

Automated text-to-video and image-synthesis pipelines built around current generative models, with a repeatable brief-to-render workflow.

Generative video · image synthesis · pipeline automation