Engineering · Q&A Assistant

How the grounded assistant is built

This page is the public companion to the system: curated knowledge, hybrid retrieval on Supabase, structured answers with citations, and guardrails you would expect in a serious AI product — not a demo prompt taped to a chat widget.

Open Q&A Assistant Read the architecture

Overview Request flow Knowledge layer Embeddings Data model Retrieval LLM & response API & security Product surface Evals & ops Configuration For recruiters Trade-offs

Overview

The Q&A Assistant is a conversational layer on top of an explicitly governed knowledge base. Sources are curated in the repo (markdown and structured data), modeled as records and chunks, synced to Supabase, and retrieved with a hybrid pipeline (vectors + full-text + aliases + intent-aware scoring). Responses are generated with Anthropic Claude, with structured output, citations, and deterministic fallbacks when parsing or the API fails.

Grounding: answers are tied to approved evidence, not the model’s latent “memory” of the web.
Governance: public vs excluded content is enforced in the corpus and at runtime (explicit refusals for sensitive file names).
Observability: sessions, messages, query logs, blocked events, feedback, handoffs, and eval runs are persisted when Supabase is configured.

End-to-end request flow

Each user message travels through a single server pipeline (runGroundedChat) with timing breakdowns for profiling (session, intent, history, retrieval, LLM, normalize, logging).

Session & validation

Ensures a stable session id (client or generated), loads recent history for follow-up questions, validates input size (Zod).

Intent, language, entity hint

Rule-based intent classification, language detection, and optional entity resolution (canonical keys from titles, slugs, aliases).

Safety gates

Requests pointing at excluded internal files get a refusal and a blocked-event log; handoff rules decide when to suggest contacting you.

Hybrid retrieval

Expanded query → embedding → Supabase RPC qa_assistant_hybrid_candidates → composite scoring → dedupe by canonical key → chunk budget per response mode.

LLM + structure

Claude receives grounding prompts and evidence; output is normalized to a structured payload (lead, bullets, sections, citations).

Persistence

Messages and query logs stored; optional feedback and lead handoff endpoints rate-limited separately.

Response to client

JSON includes rendered content, citations, structured UI hints, and timing metadata for debugging.

Knowledge layer

The corpus is built in curated-records.ts: approved paths (e.g. about-me/*.md, project slices) and an explicit exclusion list for private or internal interview analyses. Each fact becomes a record with entity_type, canonical_key, visibility, assistant_summary, evidence text, tags, and query_aliases for retrieval.

Chunks split long material for embedding and FTS; each chunk carries enriched embed_text (not raw-only dumps), metadata for narrative ordering, recruiter evidence, and answer_roles so retrieval can prefer chunks suited to the current intent.

Embeddings

Query and chunk vectors use the OpenAI Embeddings API (configurable model, default text-embedding-3-small, 1536 dimensions). Embeddings are version-tagged in storage; an in-memory cache reduces duplicate calls. If the API key is missing, vector similarity degrades gracefully while lexical and alias signals remain.

Data model (Supabase)

Tables include qa_assistant_records, qa_assistant_chunks, qa_assistant_sessions, qa_assistant_messages, qa_assistant_query_logs, qa_assistant_feedback, qa_assistant_handoffs, qa_assistant_blocked_events, qa_assistant_eval_runs, and admin override state. The RPC qa_assistant_hybrid_candidates returns vector, FTS, alias, and title-match scores for candidate chunks filtered by visibility and entity type.

Retrieval pipeline

Retrieval combines: (1) Supabase hybrid candidates when available, (2) local cosine similarity against stored chunk embeddings, (3) composite scoreChunk weighting titles, summaries, content, tags, intent alignment, leadership / AI-tool boosts, and penalties for common confusions (e.g. spoken languages vs programming languages). Results are deduplicated by canonical entity, optionally forced to include an entity anchor, then trimmed via applyChunkBudgetAndOrdering for modes like narrative, recruiter fit, or project focus.

LLM & response shaping

Anthropic models are selected via environment variables (defaults: Sonnet-class main model, Haiku for fast paths where used). System prompts enforce grounding to provided evidence. The client receives structured payloads for rich rendering (bullets, sections, project/fit cards). If the provider fails or JSON parsing fails, a deterministic fallback builds an answer from chunk summaries so the UI never silently hallucinates a full narrative.

API & security

POST /api/qa-assistant/chat is rate limited (per IP + session). Handoff submissions use stricter limits, honeypot fields, and Cloudflare Turnstile when configured. Admin routes are protected via Supabase auth and an allowlist of admin emails. Service-role keys never reach the browser.

Product surface

Full-page chat at /qa-assistant and a floating widget share the same API, differentiated by an entrypoint flag for analytics. Citations, follow-ups, structured layouts, feedback thumbs, and a lead handoff flow are integrated in the UI layer.

Evals & operations

Seed scenarios live in docs; scripts under scripts/ run factual and product audits, with optional persistence to qa_assistant_eval_runs. This supports regression checks when the corpus or prompts change.

Configuration

Typical production variables (names only — set values in your host):

ANTHROPIC_API_KEY	Claude API access
OPENAI_API_KEY	Embeddings API
NEXT_PUBLIC_SUPABASE_URL / ANON / SERVICE ROLE	Database + server sync
QA_ASSISTANT_ADMIN_EMAILS	Admin UI allowlist
QA_ASSISTANT_TURNSTILE_SECRET_KEY	Handoff captcha verification
QA_ASSISTANT_EMBEDDING_MODEL / ANTHROPIC_MODEL	Optional model overrides
NEXT_PUBLIC_APP_URL	Absolute URLs for emails and callbacks

Talking points for recruiters

Designed a full RAG pipeline with explicit content governance, not generic file upload.
Hybrid retrieval + intent/entity routing + mode-specific chunk budgets.
Production-minded API: validation, rate limits, structured logging, lead capture with bot mitigation.
Typed end-to-end (TypeScript + Zod) and eval hooks for quality over time.

Honest trade-offs

Intent classification is primarily rule-based; scaling may need labeled data and metrics.
In-memory rate limits are per server instance unless you add a shared store (e.g. Redis).
Each message may incur embedding + LLM cost — caching mitigates repeated phrasing, not novel questions.

Try the assistant on real questions

The best proof is interactive: ask about background, projects, stack, or availability — and inspect how citations line up with the architecture above.

Go to Q&A Assistant

ANTHROPIC_API_KEY

Claude API access

OPENAI_API_KEY

Embeddings API

NEXT_PUBLIC_SUPABASE_URL / ANON / SERVICE ROLE

Database + server sync

QA_ASSISTANT_ADMIN_EMAILS

Admin UI allowlist

QA_ASSISTANT_TURNSTILE_SECRET_KEY

Handoff captcha verification

QA_ASSISTANT_EMBEDDING_MODEL / ANTHROPIC_MODEL

Optional model overrides

NEXT_PUBLIC_APP_URL

Absolute URLs for emails and callbacks