rag-implementation
wshobson/agents
Build RAG systems with vector databases and semantic search to ground LLMs in external knowledge.
What is rag-implementation?
Retrieval-Augmented Generation (RAG) enables LLM applications to provide accurate, factual responses by retrieving relevant documents from external knowledge sources before generating answers. Use this skill when building Q&A systems, documentation assistants, chatbots with current information, or any application where reducing hallucinations and grounding responses in real data is critical.
- Store and retrieve document embeddings efficiently using vector databases (Pinecone, Weaviate, Milvus, Chroma, Qdrant, pgvector)
- Convert text to numerical vectors using embedding models optimized for different use cases (Voyage, OpenAI, open-source options)
- Implement retrieval strategies including dense retrieval, sparse retrieval, hybrid search, multi-query, and HyDE approaches
- Rerank retrieval results using cross-encoders, API-based reranking, MMR, or LLM-based scoring to improve quality
- Build complete RAG pipelines with LangGraph that retrieve context and generate grounded answers
How to install rag-implementation
npx skills add https://github.com/wshobson/agents --skill rag-implementation- Vector database account or local setup (Pinecone, Weaviate, Chroma, etc.)
- Embedding model API key (Voyage AI, OpenAI, or local model)
- LangChain and LangGraph libraries installed
- Document collection or knowledge base to index
How to use rag-implementation
- 1.Choose and set up a vector database (managed like Pinecone or local like Chroma)
- 2.Select an embedding model appropriate for your use case (Voyage-3-large recommended for Claude apps)
- 3.Prepare and chunk your documents using RecursiveCharacterTextSplitter or similar
- 4.Generate embeddings and store documents in the vector database
- 5.Implement a retriever using your vector store
- 6.Build a RAG graph with retrieve and generate nodes using LangGraph
- 7.Connect the retriever to an LLM with a prompt template that includes context
- 8.Test with sample questions and iterate on retrieval strategies if needed
Use cases
- Building Q&A systems over proprietary documents and knowledge bases
- Creating chatbots that provide current, factual information with source citations
- Implementing semantic search with natural language queries across large document collections
- Reducing hallucinations by grounding LLM responses in retrieved context
- Building documentation assistants and research tools with domain-specific knowledge access
- Backend engineers building knowledge-grounded AI applications
- Full-stack developers creating chatbots and Q&A systems
- Data scientists implementing semantic search and information retrieval
- Teams needing to integrate LLMs with proprietary or real-time data sources
rag-implementation FAQ
Pinecone for managed/serverless, Chroma for lightweight local development, Weaviate for hybrid search, pgvector for SQL integration, or Qdrant for high performance. Choice depends on scale, infrastructure, and feature needs.
Use voyage-3-large for Claude applications (Anthropic-recommended), text-embedding-3-large for OpenAI apps with high accuracy, text-embedding-3-small for cost-effectiveness, or bge-large-en-v1.5 for open-source local deployment.
Ensure high-quality retrieval by using hybrid search, reranking results, and implementing a prompt that instructs the LLM to only answer based on provided context and say 'I don't know' when context is insufficient.
Dense retrieval uses semantic embeddings for meaning-based matching, while sparse retrieval uses keyword matching (BM25). Hybrid search combines both for better coverage of semantic and keyword-based queries.
Yes, reranking improves quality by filtering and reordering initial retrieval results. Use cross-encoders for accuracy, MMR for diversity, or LLM-based scoring when you need semantic understanding of relevance.
Full instructions (SKILL.md)
Source of truth, from wshobson/agents.
name: rag-implementation description: Build Retrieval-Augmented Generation (RAG) systems for LLM applications with vector databases and semantic search. Use when implementing knowledge-grounded AI, building document Q&A systems, or integrating LLMs with external knowledge bases.
RAG Implementation
Master Retrieval-Augmented Generation (RAG) to build LLM applications that provide accurate, grounded responses using external knowledge sources.
When to Use This Skill
- Building Q&A systems over proprietary documents
- Creating chatbots with current, factual information
- Implementing semantic search with natural language queries
- Reducing hallucinations with grounded responses
- Enabling LLMs to access domain-specific knowledge
- Building documentation assistants
- Creating research tools with source citation
Core Components
1. Vector Databases
Purpose: Store and retrieve document embeddings efficiently
Options:
- Pinecone: Managed, scalable, serverless
- Weaviate: Open-source, hybrid search, GraphQL
- Milvus: High performance, on-premise
- Chroma: Lightweight, easy to use, local development
- Qdrant: Fast, filtered search, Rust-based
- pgvector: PostgreSQL extension, SQL integration
2. Embeddings
Purpose: Convert text to numerical vectors for similarity search
Models (2026):
| Model | Dimensions | Best For |
|---|---|---|
| voyage-3-large | 1024 | Claude apps (Anthropic recommended) |
| voyage-code-3 | 1024 | Code search |
| text-embedding-3-large | 3072 | OpenAI apps, high accuracy |
| text-embedding-3-small | 1536 | OpenAI apps, cost-effective |
| bge-large-en-v1.5 | 1024 | Open source, local deployment |
| multilingual-e5-large | 1024 | Multi-language support |
3. Retrieval Strategies
Approaches:
- Dense Retrieval: Semantic similarity via embeddings
- Sparse Retrieval: Keyword matching (BM25, TF-IDF)
- Hybrid Search: Combine dense + sparse with weighted fusion
- Multi-Query: Generate multiple query variations
- HyDE: Generate hypothetical documents for better retrieval
4. Reranking
Purpose: Improve retrieval quality by reordering results
Methods:
- Cross-Encoders: BERT-based reranking (ms-marco-MiniLM)
- Cohere Rerank: API-based reranking
- Maximal Marginal Relevance (MMR): Diversity + relevance
- LLM-based: Use LLM to score relevance
Quick Start with LangGraph
from langgraph.graph import StateGraph, START, END
from langchain_anthropic import ChatAnthropic
from langchain_voyageai import VoyageAIEmbeddings
from langchain_pinecone import PineconeVectorStore
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_text_splitters import RecursiveCharacterTextSplitter
from typing import TypedDict, Annotated
class RAGState(TypedDict):
question: str
context: list[Document]
answer: str
# Initialize components
llm = ChatAnthropic(model="claude-sonnet-4-6")
embeddings = VoyageAIEmbeddings(model="voyage-3-large")
vectorstore = PineconeVectorStore(index_name="docs", embedding=embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
# RAG prompt
rag_prompt = ChatPromptTemplate.from_template(
"""Answer based on the context below. If you cannot answer, say so.
Context:
{context}
Question: {question}
Answer:"""
)
async def retrieve(state: RAGState) -> RAGState:
"""Retrieve relevant documents."""
docs = await retriever.ainvoke(state["question"])
return {"context": docs}
async def generate(state: RAGState) -> RAGState:
"""Generate answer from context."""
context_text = "\n\n".join(doc.page_content for doc in state["context"])
messages = rag_prompt.format_messages(
context=context_text,
question=state["question"]
)
response = await llm.ainvoke(messages)
return {"answer": response.content}
# Build RAG graph
builder = StateGraph(RAGState)
builder.add_node("retrieve", retrieve)
builder.add_node("generate", generate)
builder.add_edge(START, "retrieve")
builder.add_edge("retrieve", "generate")
builder.add_edge("generate", END)
rag_chain = builder.compile()
# Use
result = await rag_chain.ainvoke({"question": "What are the main features?"})
print(result["answer"])
Detailed patterns and worked examples
Detailed pattern documentation lives in references/details.md. Read that file when the navigation tier above is insufficient.
Related skills
More from wshobson/agents and the wider catalog.
tailwind-design-system
Build production-ready design systems with Tailwind CSS v4, design tokens, and component libraries.
typescript-advanced-types
Master TypeScript's advanced type system: generics, conditional types, mapped types, and utility types for type-safe applications.
nodejs-backend-patterns
Build production-ready Node.js backends with Express/Fastify, middleware patterns, auth, and database integration.
python-performance-optimization
Profile and optimize Python code using cProfile, memory profilers, and performance best practices.
brand-landingpage
Brand-first landing page designer with guided interviews and Stitch-powered iteration.
python-testing-patterns
Implement comprehensive testing strategies with pytest, fixtures, mocking, and test-driven development.