RAG & Model Context Protocol (MCP) Prompts

Ground your AI in truth and personal data.

This library contains 110+ prompts for retrieval-augmented generation, knowledge management, context engineering, and MCP-based persistent systems.

Core RAG Mastery

1. Perfect RAG Query

Using only the provided context, answer accurately. If the answer is not in the context, say "I don't have enough information" and suggest what additional context would help. Cite your sources using [Source: X]. Be precise and don't hallucinate.

2. Context Compression

Summarize the following documents into a dense, query-optimized knowledge base for future retrieval. Preserve key facts, relationships, and nuance while reducing token count by 80%. Structure for efficient embedding and retrieval.

3. Personal Knowledge Vault

You have access to my entire life archive (emails, notes, photos metadata, reading history, journal entries). Answer as if you are me, with perfect recall and my personal style. Reference specific memories when relevant. Maintain my voice and perspective completely.

4. Multi-Source Synthesis

Synthesize answers from multiple retrieved sources: 1. Identify relevant information from each source 2. Note contradictions between sources 3. Reconcile conflicting information with confidence levels 4. Combine complementary insights 5. Cite all sources appropriately 6. Flag when sources are insufficient

5. Query Expansion for Retrieval

Given user query: [query], generate 10 semantically similar queries that might retrieve additional relevant documents. Include synonyms, rephrasings, and related concepts. Use all variations to broaden search coverage.

6. Hypothetical Document Embedding (HyDE)

Given query: [query], first generate a hypothetical ideal document that would answer this query perfectly. Then use this hypothetical document to retrieve actual similar documents from the corpus.

7. Step-Back Prompting for RAG

Original question: [specific question]. First, ask a broader, more general question to retrieve relevant background context. Then answer the specific question using both the broad context and specific retrieved chunks.

8. Chain-of-Verification RAG

Generate initial answer from retrieved context. Then: 1. Identify claims made in the answer 2. Verify each claim against retrieved sources 3. Flag any claims not supported by context 4. Revise answer, removing unsupported claims 5. Note what additional information would be needed

Chunking & Document Processing

9. Semantic Chunking Strategy

Process this document using semantic chunking: 1. Identify natural boundaries (paragraphs, sections) 2. Maintain context within chunks (overlap previous chunk) 3. Preserve semantic coherence (don't split mid-thought) 4. Optimize chunk size for embedding model (target 512 tokens) 5. Add metadata (source, position, section) 6. Generate summary for each chunk

10. Hierarchical Chunking

Create multi-level chunks for [document]: - Level 1: Sections/chapters (coarse) - Level 2: Paragraphs (medium) - Level 3: Sentences (fine)

Enable retrieval at appropriate granularity based on query type.

11. Document Enrichment

Enrich documents before embedding: 1. Extract entities (people, places, organizations, concepts) 2. Identify relationships between entities 3. Generate hypothetical questions this doc answers 4. Create keyword tags 5. Summarize in 3 sentences 6. Add temporal metadata if relevant

12. Parent-Child Chunking

Implement parent-child chunking: 1. Small child chunks for precise retrieval (128-256 tokens) 2. Parent chunks containing broader context (1024 tokens) 3. Retrieve children, return parent for context 4. Balance precision with context richness

13. Sliding Window Chunking

Process [long document] with sliding window: 1. Window size: 512 tokens 2. Step size: 256 tokens (50% overlap) 3. Maintain continuity across boundaries 4. Tag with position and surrounding context

14. Structured Data Extraction

Extract structured information from [document]: 1. Tables → JSON/CSV 2. Key-value pairs 3. Named entities with types 4. Dates and temporal information 5. Relationships and references 6. Store structured alongside raw text

Embedding & Retrieval Optimization

15. Hybrid Search Query

Execute hybrid retrieval: 1. Dense retrieval: Embedding similarity for semantic match 2. Sparse retrieval: BM25/TF-IDF for keyword match 3. Re-ranking: Cross-encoder for final ranking 4. Fusion: Combine scores (typically α=0.5) 5. Return top-k with scores and source types

16. Query Embedding Optimization

Optimize queries for embedding retrieval: 1. Remove stop words unless meaningful 2. Preserve key entities and nouns 3. Consider query expansion with hyponyms/hypernyms 4. Use natural language (not keyword soup) 5. Include context if query is ambiguous 6. Test with hypothetical matches

17. Multi-Vector Representation

For [document], create multiple embeddings: 1. Dense passage embedding: Overall meaning 2. ColBERT-style token embeddings: Fine-grained matching 3. Sentence embeddings: Key sentence representations 4. Summary embedding: High-level concept 5. Use appropriate representation per query type

18. Embedding Model Selection

Recommend embedding model for [use case]: 1. OpenAI: text-embedding-3-large (general purpose) 2. Cohere: embed-multilingual-v3 (multilingual) 3. Open source: BGE, GTE, E5 models (cost/performance) 4. Domain-specific: Fine-tuned for legal/medical/technical 5. Consider: dimensionality, context length, cost, latency

19. Index Optimization

Optimize vector index for [corpus characteristics]: 1. Flat (brute force): Small corpus (<10k), maximum accuracy 2. IVF: Medium corpus, balanced speed/accuracy 3. HNSW: Large corpus, approximate search 4. ScaNN: Very large corpus, high performance 5. Tune parameters for recall@k requirements

20. Re-ranking Strategy

Apply re-ranking to initial retrieval results: 1. Retrieve top-100 with fast embedding search 2. Apply cross-encoder re-ranking (more precise) 3. Return top-10 after re-ranking 4. Consider latency vs. quality trade-off

Context Management & Prompt Engineering

21. Context Window Prioritization

Given [large context] and limited window, prioritize: 1. Most semantically similar chunks to query 2. Most recent information (if temporal) 3. Most authoritative sources 4. Diverse perspectives (avoid echo chamber) 5. Recency-weighted relevance scoring

22. Dynamic Context Injection

Structure RAG prompt dynamically:

System: You are a helpful assistant with access to reference materials.

Context: [Retrieved documents with citation markers]

Guidelines:
- Answer based primarily on provided context
- Cite sources using [1], [2], etc.
- Say "I don't know" if context insufficient
- Synthesize multiple sources when helpful

User Query: [Question]

23. Contextual Compression

Compress retrieved context before inclusion: 1. Remove redundant information 2. Extract key sentences only 3. Summarize verbose sections 4. Preserve unique facts from each source 5. Maintain citations for provenance

24. Provenance Tracking

Ensure every statement can be traced: 1. Tag each context chunk with source ID 2. Require model to cite [Source: X] for facts 3. Validate citations exist in provided context 4. Surface sources to user for verification 5. Enable click-through to original documents

25. Contextual Few-Shot Examples

Include relevant examples in context: 1. Retrieve similar past Q&A pairs 2. Show examples matching query type 3. Demonstrate desired output format 4. Include edge case handling examples 5. Update examples based on feedback

Advanced Retrieval Techniques

26. Sub-Question Decomposition

For complex query [question]: 1. Decompose into sub-questions 2. Retrieve context for each sub-question 3. Answer sub-questions separately 4. Synthesize sub-answers into final response 5. Show reasoning chain

27. Self-RAG (Reflective Retrieval)

Augment RAG with self-reflection: 1. Initial retrieval and answer generation 2. Self-evaluation: Is answer complete/supported? 3. If not satisfied, generate follow-up queries 4. Retrieve additional context 5. Revise answer with new information 6. Repeat until satisfied or max iterations

28. Corrective RAG (CRAG)

Implement corrective retrieval: 1. Retrieve initial documents 2. Evaluate relevance of each document 3. If low relevance: trigger web search or knowledge base query 4. Combine retrieved + supplementary sources 5. Generate answer from augmented context

29. Fusion-in-Decoding

Multiple retrieval strategies in parallel: 1. Dense retrieval path 2. Sparse retrieval path 3. Knowledge graph path 4. Generate with access to all three contexts 5. Model learns to attend to relevant source

30. Speculative RAG

Draft-then-verify approach: 1. Generate draft answer without retrieval (speculative) 2. Identify claims in draft needing verification 3. Retrieve context specifically for claims 4. Verify and correct draft 5. Return verified final answer

Knowledge Graph & Structured RAG

31. Knowledge Graph Construction

Build knowledge graph from [documents]: 1. Extract entities (nodes) 2. Identify relationships (edges) 3. Assign properties to nodes/edges 4. Resolve entity references (coreference) 5. Store in graph database 6. Enable graph traversal queries

32. GraphRAG Query

Answer using knowledge graph: 1. Parse question for entities and relations 2. Execute graph query (Cypher/GQL) 3. Traverse relevant paths 4. Retrieve connected subgraph 5. Generate answer from graph structure 6. Show reasoning path through graph

33. Entity Linking

Link mentions to canonical entities: 1. Identify entity mentions in text 2. Compare to knowledge base entities 3. Disambiguate (context helps) 4. Link to canonical identifier 5. Enrich with KB properties

34. Multi-Hop Reasoning

Answer multi-hop questions: 1. Identify starting entity in query 2. First hop: retrieve related entities 3. Second hop: from related to target 4. Connect findings across hops 5. Synthesize multi-hop answer

Example: "What company founded by X's co-founder acquired Y?"

35. Temporal Knowledge Graph

Handle time-aware knowledge: 1. Timestamp facts with validity periods 2. Query: "What was true in 2010?" 3. Handle changing relationships over time 4. Reason about temporal order 5. Identify contradictions across time

MCP (Model Context Protocol)

36. MCP Server Definition

Define MCP server for [capability]:

{
  "name": "knowledge-base",
  "version": "1.0.0",
  "tools": [
    {
      "name": "search_documents",
      "description": "Search knowledge base",
      "parameters": {
        "query": "string",
        "top_k": "integer"
      }
    }
  ],
  "resources": [
    {
      "name": "document",
      "description": "Full document content",
      "uri": "doc://{id}"
    }
  ]
}

37. MCP Tool Implementation

Implement MCP tool for [function]: 1. Define input schema (types, required/optional) 2. Implement core logic 3. Handle errors gracefully 4. Return structured output 5. Document side effects 6. Rate limit if needed

38. MCP Resource Provider

Provide MCP resources: 1. Define resource URI pattern 2. Implement fetch logic 3. Handle authentication if needed 4. Return appropriate MIME type 5. Support pagination for large resources 6. Cache if appropriate

39. MCP Client Prompt

Use MCP server in conversation:

You have access to MCP servers:
- filesystem: Read/write files
- git: Repository operations  
- fetch: HTTP requests
- database: SQL queries

When needed, invoke appropriate tools.
Show tool calls and their results.
Integrate tool outputs into responses.

40. Persistent Memory with MCP

Maintain persistent context: 1. Store conversation summaries in MCP resource 2. Retrieve previous session context on startup 3. Update memory after each interaction 4. Compress old memories into higher-level summaries 5. Enable long-term personalization

Domain-Specific RAG

41. Legal Document RAG

Process legal documents: 1. Citation extraction: Case law, statutes, regulations 2. Argument mapping: Identify claims and support 3. Precedent analysis: Similar past cases 4. Risk assessment: Potential legal issues 5. Compliance checking: Against regulations 6. Maintain confidentiality and privilege awareness

42. Medical Literature RAG

Query medical literature: 1. Evidence grading: RCT > observational > case report 2. Conflict of interest: Flag industry funding 3. Recency weighting: Recent research priority 4. Population matching: Match patient demographics 5. Intervention comparison: Head-to-head when possible 6. Always include disclaimer: Not medical advice

43. Financial Document RAG

Analyze financial documents: 1. 10-K/10-Q extraction: Key metrics, risks, MD&A 2. Earnings call transcript processing 3. SEC filing monitoring 4. Analyst report synthesis 5. Trend identification: QoQ, YoY comparisons 6. Risk factor aggregation

44. Technical Documentation RAG

Navigate technical docs: 1. API endpoint discovery 2. Code example retrieval 3. Version-specific answers 4. Error message lookup 5. Configuration guidance 6. Troubleshooting pathfinding

45. Academic Research RAG

Synthesize research: 1. Paper retrieval: Semantic + keyword 2. Citation network traversal 3. Methodology extraction 4. Finding comparison across studies 5. Research gap identification 6. Future work suggestions**

Evaluation & Quality Assurance

46. RAG Evaluation Framework

Evaluate RAG system on: 1. Context relevance: Retrieved docs relevant to query? 2. Context recall: Did we get all needed information? 3. Faithfulness: Is answer supported by context? 4. Answer relevance: Does it address the question? 5. Citation accuracy: Are citations correct? 6. Hallucination rate: Unsupported claims?

47. Golden Dataset Creation

Create evaluation dataset: 1. Collect representative queries 2. Manually identify relevant documents 3. Write ideal answers 4. Include edge cases and adversarial examples 5. Version and maintain dataset 6. Use for regression testing

48. A/B Testing for RAG

Compare RAG configurations: 1. Vary chunking strategies 2. Test embedding models 3. Adjust retrieval parameters 4. Measure on held-out queries 5. Statistical significance testing 6. Deploy winner, document learnings

49. Human Evaluation Protocol

Conduct human evaluation: 1. Recruit domain expert judges 2. Define rating rubric (1-5 scale) 3. Blind comparison of systems 4. Measure inter-annotator agreement 5. Collect qualitative feedback 6. Iterate based on findings

50. Continuous Monitoring

Monitor RAG in production: 1. Track query distribution 2. Measure latency percentiles 3. Monitor retrieval accuracy 4. Detect drift in query patterns 5. Alert on quality degradation 6. Feedback loop for improvement

Advanced Techniques

51. Fine-Tuned Embedding for Domain

Fine-tune embeddings for [domain]: 1. Collect domain-specific corpus 2. Create training pairs (query, relevant doc) 3. Use contrastive learning 4. Evaluate on domain benchmark 5. Deploy fine-tuned model 6. Monitor vs. general embedding performance

52. ColBERT Late Interaction

Implement ColBERT: 1. Token-level embeddings for docs 2. Token-level embeddings for queries 3. MaxSim operator for matching 4. Efficient pruning for speed 5. High accuracy with manageable cost

53. SPLADE Sparse Retrieval

Use learned sparse retrieval: 1. Expand documents to term importance vectors 2. Expand queries similarly 3. Efficient inverted index lookup 4. Combine with dense for hybrid 5. Better lexical matching than BM25

Retrieve across modalities: 1. Image-to-text: Describe images, embed description 2. Text-to-image: Retrieve relevant images 3. Unified embedding space: CLIP-style 4. Cross-modal queries: "Find images like this description" 5. Fusion: Combine visual + text evidence

55. Conversational RAG

Maintain context across turns: 1. Track conversation history 2. Resolve anaphora ("it", "that") 3. Clarify ambiguous references 4. Retrieve based on full context, not just last turn 5. Handle topic shifts gracefully

System Design & Architecture

56. RAG Pipeline Architecture

Design RAG system architecture: 1. Ingestion pipeline: Document → Chunks → Embeddings → Index 2. Query pipeline: Query → Embedding → Retrieval → LLM → Answer 3. Storage: Vector DB + Metadata store 4. Caching: Query cache, result cache 5. Monitoring: Logs, metrics, tracing 6. Scaling: Horizontal scaling strategy

57. Real-Time RAG Updates

Handle live document updates: 1. Detect new/updated/deleted documents 2. Incremental embedding computation 3. Index updates without full rebuild 4. Version management for documents 5. Consistency guarantees

58. Multi-Tenant RAG

Support multiple tenants: 1. Data isolation per tenant 2. Tenant-specific embeddings 3. Query routing to appropriate index 4. Resource allocation/fairness 5. Tenant-aware caching

59. Federated RAG

Query across distributed sources: 1. Query multiple RAG systems 2. Aggregate results 3. Handle different schemas 4. Maintain source attribution 5. De-duplicate across sources

60. Edge RAG Deployment

Deploy RAG at edge: 1. Lightweight embedding models 2. Compressed indices 3. On-device inference 4. Sync with central knowledge base 5. Handle offline scenarios

Personal & Life RAG

61. Second Brain RAG

Query personal knowledge base: 1. Notes, highlights, bookmarks 2. Conversations and ideas 3. Reading and watch lists 4. Projects and goals 5. Retrieve with personal context

62. Journal RAG

Query journal entries: 1. Temporal queries: "What was I thinking in March?" 2. Thematic: "When have I felt this way before?" 3. Pattern: "What do I often struggle with?" 4. Growth: "How has my perspective changed?" 5. Privacy: Local-first, encrypted storage

63. Email Archive RAG

Search email history: 1. Semantic search across years of email 2. Contact knowledge: "What did John say about X?" 3. Project tracking: Find all emails about [project] 4. Attachment indexing: Search document contents 5. Thread reconstruction: Follow conversation flow

64. Health & Fitness RAG

Query health data: 1. Workout history and patterns 2. Nutrition logs 3. Sleep data 4. Symptom tracking 5. Trends and correlations 6. With appropriate privacy safeguards

65. Family Knowledge RAG

Family information system: 1. Medical histories 2. Important dates and preferences 3. Recipe collection 4. Home maintenance records 5. Travel history and recommendations

Enterprise RAG

66. Enterprise Search RAG

Company-wide knowledge: 1. Documents: Policies, procedures, templates 2. People: Skills, projects, org chart 3. Systems: Documentation, APIs 4. Conversations: Slack, Teams, meetings 5. Access control: Respect permissions

67. Customer Support RAG

Support knowledge base: 1. Past tickets and resolutions 2. Product documentation 3. Known issues and workarounds 4. Escalation paths 5. Customer-specific context

68. HR Policy RAG

Employee self-service: 1. Benefits information 2. Time-off policies 3. Expense guidelines 4. Career development paths 5. Compliance requirements

69. IT Helpdesk RAG

Technical support: 1. Troubleshooting guides 2. Common error resolutions 3. System status and maintenance 4. Software documentation 5. Asset information

70. Sales Enablement RAG

Sales knowledge: 1. Product information 2. Competitive positioning 3. Case studies and testimonials 4. Pricing and contracts 5. Objection handling

Specialized RAG Patterns

71. Summary RAG

Retrieve pre-generated summaries: 1. Chunk-level summaries 2. Document-level summaries 3. Collection-level overviews 4. Hierarchical summarization 5. Answer from appropriate summary level

72. Table RAG

Query tabular data: 1. Table extraction from documents 2. Structured embedding of rows 3. SQL-like querying over tables 4. Aggregation and computation 5. Visual rendering of results

73. Code RAG

Code-aware retrieval: 1. Function/method embeddings 2. AST-aware chunking 3. Import/dependency graph 4. Code example retrieval 5. Documentation lookup

74. Multi-Lingual RAG

Cross-lingual retrieval: 1. Unified embedding space 2. Query in language A, retrieve from language B 3. Answer in query language 4. Handle code-switching 5. Translation quality considerations

75. Time-Aware RAG

Temporal reasoning: 1. Document timestamps 2. Query time ranges: "What happened in Q2?" 3. Recency weighting 4. Temporal ordering of facts 5. Handle outdated information

Prompt Engineering for RAG

76. RAG System Prompt

You are a helpful assistant with access to a knowledge base.

Instructions:
1. Base your answers on the provided context
2. Cite sources using [1], [2], etc.
3. If context is insufficient, say so and suggest what would help
4. Synthesize multiple sources when relevant
5. Be precise - don't add information not in context

Context:
{retrieved_documents}

User: {query}

77. Citation Format Prompt

When using information from context:
1. Include citation immediately after claim: [1]
2. Multiple sources: [1][2][3]
3. Specific parts: [1, Section A]
4. List sources at end if many citations
5. Never cite sources not in provided context

78. Uncertainty Expression Prompt

When answering from retrieved context:
- High confidence (directly stated): State clearly
- Medium confidence (inferred): "This suggests..."
- Low confidence (unclear): "The documents mention... but it's unclear if..."
- No information: "The provided context doesn't contain information about..."

79. Multi-Step RAG Prompt

For complex questions requiring multiple lookups:
1. Break question into sub-questions
2. For each sub-question:
   - Identify what needs to be retrieved
   - Retrieve relevant context
   - Answer sub-question
3. Synthesize sub-answers into final response
4. Show your reasoning chain

80. Constrained Generation Prompt

Generate answer with constraints:
1. Use ONLY provided context
2. Maximum length: 200 words
3. Include at least 2 citations
4. No speculation beyond context
5. If multiple perspectives in context, present all

Integration & Tools

81. LangChain RAG Chain

Build RAG chain:

from langchain import OpenAIEmbeddings, Chroma, RetrievalQA

# Components
embeddings = OpenAIEmbeddings()
vectorstore = Chroma(embedding_function=embeddings)
llm = ChatOpenAI()

# Chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

82. LlamaIndex RAG Setup

Initialize RAG with LlamaIndex:

from llama_index import VectorStoreIndex, SimpleDirectoryReader

# Load
documents = SimpleDirectoryReader("data").load_data()

# Index
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("Your question")

83. Haystack Pipeline

Build Haystack pipeline:

from haystack import Pipeline
from haystack.components.retrievers import InMemoryEmbeddingRetriever
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder

# Define pipeline
pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryEmbeddingRetriever())
pipeline.add_component("prompt_builder", PromptBuilder(template=...))
pipeline.add_component("llm", OpenAIGenerator())

84. Vercel AI SDK RAG

Implement RAG with Vercel AI SDK:

import { OpenAIStream, StreamingTextResponse } from 'ai';
import { openai } from '@ai-sdk/openai';

// Retrieve context
const context = await retrieveContext(query);

// Stream response
const response = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [
    { role: 'system', content: `Context: ${context}` },
    { role: 'user', content: query }
  ],
  stream: true
});

Security & Privacy

85. Data Sanitization

Sanitize documents before indexing: 1. Detect and redact PII (names, SSNs, emails) 2. Remove confidential information 3. Apply data classification labels 4. Role-based access to chunks 5. Audit access to sensitive documents

86. Query Sanitization

Sanitize user queries: 1. Detect prompt injection attempts 2. Filter out malicious patterns 3. Rate limiting per user 4. Query logging for audit 5. Anomaly detection

87. Access Control in RAG

Enforce permissions: 1. Index with ACL metadata 2. Filter retrieval by user permissions 3. Respect document-level security 4. Audit trail for sensitive queries 5. Automatic redaction of unauthorized content

88. Privacy-Preserving RAG

Protect privacy: 1. Differential privacy for queries 2. Homomorphic encryption options 3. Federated retrieval (query local index) 4. On-device embedding computation 5. Secure multi-party computation

Performance Optimization

89. Caching Strategy

Implement multi-level caching: 1. Query cache: Exact match → instant response 2. Embedding cache: Avoid recomputing embeddings 3. Retrieval cache: Cache frequent retrievals 4. LLM cache: Cache common responses 5. Cache invalidation: Handle document updates

90. Latency Optimization

Reduce response time: 1. Async embedding computation 2. Parallel retrieval (multiple indices) 3. Streaming responses (first token fast) 4. Approximate retrieval (HNSW) 5. Pre-computed common queries

91. Cost Optimization

Reduce RAG costs: 1. Smaller embedding models for retrieval 2. Filter before embedding (keyword pre-filter) 3. Caching at all layers 4. Batching embedding requests 5. Tiered retrieval (cheap first, expensive if needed) 6. Model distillation

92. Scalability Patterns

Scale RAG systems: 1. Horizontal index sharding 2. Read replicas for query serving 3. Async document processing 4. Queue-based ingestion 5. Auto-scaling based on load 6. CDN for static embeddings

Evaluation Metrics Deep Dive

93. MRR (Mean Reciprocal Rank)

MRR = (1/rank of first relevant doc) averaged across queries

Example:
Query 1: Relevant doc at position 2 → 1/2 = 0.5
Query 2: Relevant doc at position 1 → 1/1 = 1.0
Query 3: No relevant docs → 0
MRR = (0.5 + 1.0 + 0) / 3 = 0.5

94. NDCG (Normalized Discounted Cumulative Gain)

DCG = sum(relevance_i / log2(i + 1)) for i = 1 to n
IDCG = ideal DCG (perfect ranking)
NDCG = DCG / IDCG

Accounts for position and graded relevance

95. Answer Faithfulness Metrics

Measure answer grounding: 1. Claim extraction: Break answer into atomic claims 2. Evidence matching: Match claims to context 3. Hallucination detection: Claims without support 4. Contradiction detection: Claims contradicting context 5. Coverage: What context claims weren't used

96. RAGAS Metrics

Automated RAG evaluation: - Faithfulness: Answer grounded in context? - Answer Relevance: Answer addresses question? - Context Precision: Retrieved context relevant? - Context Recall: All relevant context retrieved? - Context Entity Recall: Entities covered?

97. LLM-as-Judge for RAG

Prompt for evaluation:

Evaluate this RAG output:
Question: {query}
Context: {retrieved_context}
Answer: {generated_answer}

Score 1-5 on:
1. Accuracy (vs. context)
2. Completeness
3. Relevance to question
4. Citations correct
5. No hallucinations

Explain each score.

Future Directions

98. Agentic RAG

RAG as agent capability: 1. Agent decides when to retrieve 2. Formulates own queries 3. Evaluates if retrieved info is sufficient 4. Iterates with follow-up retrievals 5. Maintains retrieval history

99. Multi-Agent RAG

Specialized retrieval agents: 1. Query understanding agent 2. Retrieval strategy agent 3. Re-ranking agent 4. Synthesis agent 5. Fact-checking agent 6. Orchestrator coordinates

100. Adaptive RAG

Self-improving retrieval: 1. Monitor which retrievals led to good answers 2. Learn query → optimal strategy mapping 3. Adjust chunking based on query types 4. Fine-tune embeddings on feedback 5. Continuous improvement loop

101. Multimodal RAG

Beyond text: 1. Video retrieval (scene understanding) 2. Audio retrieval (transcript + semantic) 3. 3D model retrieval 4. Code + documentation unified 5. Sensory data (IoT, biometrics)

102. Real-Time RAG

Live knowledge: 1. Stream processing of new documents 2. Incremental index updates 3. Sub-second latency from event to retrievable 4. Event-driven notifications 5. Temporal reasoning over streams

103. Personalized RAG

User-aware retrieval: 1. User interest modeling 2. Retrieval ranking by user relevance 3. Explanation style adaptation 4. Privacy-preserving personalization 5. Collaborative filtering for documents

104. Explainable RAG

Transparent retrieval: 1. Show why documents were retrieved 2. Highlight matching sections 3. Explain re-ranking decisions 4. Visualize embedding similarity 5. Suggest alternative queries

105. Interactive RAG

Collaborative retrieval: 1. Clarify ambiguous queries 2. Present retrieved docs for user selection 3. Incorporate user feedback on relevance 4. Iterative refinement 5. Explain trade-offs in strategies

106. Robust RAG

Handle adversarial scenarios: 1. Detect poisoned documents 2. Handle contradictory sources 3. Graceful degradation when retrieval fails 4. Conflicting evidence resolution 5. Confidence calibration

107. Green RAG

Sustainable retrieval: 1. Energy-efficient embedding models 2. Compressed representations 3. Selective retrieval (don't always query) 4. Caching to reduce computation 5. Carbon-aware scheduling

108. Edge-Cloud Hybrid RAG

Distribute intelligence: 1. Edge handles simple queries locally 2. Cloud for complex reasoning 3. Sync knowledge bases efficiently 4. Privacy-sensitive data stays edge 5. Dynamic offloading decisions

109. Federated RAG

Privacy-preserving collaboration: 1. Multiple organizations, shared queries 2. Retrieved docs stay local 3. Secure aggregation of answers 4. Differential privacy guarantees 5. No central data repository

110. Quantum RAG

Future possibilities: 1. Quantum embedding algorithms 2. Grover's algorithm for search speedup 3. Quantum machine learning for retrieval 4. Quantum-safe encryption for private RAG 5. Still largely theoretical

Total: 110+ prompts for building trustworthy, grounded AI systems with RAG and MCP.