Why This Playbook?

RAG is often described as retrieve then generate, but production reliability requires explicit grounding policies and measurable abstention behavior. This playbook focuses on those engineering contracts.

Repos:

Concept Primer: What Is RAG?

RAG combines retrieval and generation:

  1. Retrieve relevant chunks from local knowledge.
  2. Generate an answer constrained by retrieved context.

This lowers hallucination risk compared to prompt-only generation.

Broader RAG Use Cases

  • Enterprise policy/compliance assistants.
  • Internal engineering/support knowledge assistants.
  • Product/API documentation Q&A.
  • Customer support copilots grounded on approved KB content.

Demo scope in this repo:

  • Policy-style Q&A focused on grounding, abstention, and citation validation.

Concept Comparison (GenAI vs Agentic vs RAG)

User Need
   |
   +--> Fast content draft from prompt/context
   |      -> Choose GENERATIVE AI
   |
   +--> Multi-step planning + tool orchestration
   |      -> Choose AGENTIC AI
   |
   +--> Answers grounded in source documents with citations
          -> Choose RAG

What It Demonstrates

  • Retrieval + ranking over local docs.
  • Grounding guardrails (MIN_RETRIEVAL_SCORE, MIN_QUESTION_COVERAGE).
  • Citation validation against retrieved chunk IDs.
  • Blocked reasons for unsupported/unsafe answers.
  • Scenario-based anti-hallucination evaluation.

Flow

  1. User question is embedded and matched against indexed chunks.
  2. Retriever returns top-k ranked chunks.
  3. Score and coverage gates are evaluated.
  4. If gates fail, system abstains with guarded reason.
  5. If gates pass, generation uses only retrieved context.
  6. Citations are validated before final response.

ASCII Diagram

User Question
    |
    v
Embed Query -> Retrieve Top-K -> Score/Sort
                              |
                              v
                     Gate Checks
               (score + coverage)
                              |
                +-------------+-------------+
                |                           |
                v                           v
          Block/Abstain             Generate from Context
                |                     + Validate Citations
                +-------------+-------------+
                              v
                      Final Response + UI Status

Provider Support

  • OpenAI is integrated out-of-the-box.
  • You can connect Gemini, Claude, and others by adding provider adapters in demo-app/src/providers/ and extending provider-selection logic in services.

Quickstart

git clone https://github.com/amiya-pattnaik/rag-engineering-playbook.git
cd rag-engineering-playbook/demo-app
cp .env.example .env
npm install
npm run dev
# open http://localhost:3000

Use OpenAI provider mode:

  • Set OPENAI_API_KEY in .env.
  • Keep OPENAI_TEMPERATURE=0 for deterministic outputs.

Run and Evaluate

npm run demo:scenarios
npm run demo:anti-hallucination

Suite checks:

  • answerable: grounded + expected facts
  • unanswerable: abstain without grounded factual claims
  • partial: answer known parts, abstain on unknown parts

Closing Thought

Reliable RAG is an engineering discipline: thresholds, coverage checks, citation constraints, and repeatable evaluation, not just prompt design.