Skill

Pass

Audit score 90

rag-implementation

wshobson/agents

Build RAG systems with vector databases and semantic search to ground LLMs in external knowledge.

What is rag-implementation?

Retrieval-Augmented Generation (RAG) enables LLM applications to provide accurate, factual responses by retrieving relevant documents from external knowledge sources before generating answers. Use this skill when building Q&A systems, documentation assistants, chatbots with current information, or any application where reducing hallucinations and grounding responses in real data is critical.

Store and retrieve document embeddings efficiently using vector databases (Pinecone, Weaviate, Milvus, Chroma, Qdrant, pgvector)
Convert text to numerical vectors using embedding models optimized for different use cases (Voyage, OpenAI, open-source options)
Implement retrieval strategies including dense retrieval, sparse retrieval, hybrid search, multi-query, and HyDE approaches
Rerank retrieval results using cross-encoders, API-based reranking, MMR, or LLM-based scoring to improve quality
Build complete RAG pipelines with LangGraph that retrieve context and generate grounded answers

How to install rag-implementation

npx skills add https://github.com/wshobson/agents --skill rag-implementation

Prerequisites

Vector database account or local setup (Pinecone, Weaviate, Chroma, etc.)
Embedding model API key (Voyage AI, OpenAI, or local model)
LangChain and LangGraph libraries installed
Document collection or knowledge base to index

Claude Code

Cursor

Windsurf

Cline

How to use rag-implementation

1.Choose and set up a vector database (managed like Pinecone or local like Chroma)
2.Select an embedding model appropriate for your use case (Voyage-3-large recommended for Claude apps)
3.Prepare and chunk your documents using RecursiveCharacterTextSplitter or similar
4.Generate embeddings and store documents in the vector database
5.Implement a retriever using your vector store
6.Build a RAG graph with retrieve and generate nodes using LangGraph
7.Connect the retriever to an LLM with a prompt template that includes context
8.Test with sample questions and iterate on retrieval strategies if needed

Use cases

Good for

Building Q&A systems over proprietary documents and knowledge bases
Creating chatbots that provide current, factual information with source citations
Implementing semantic search with natural language queries across large document collections
Reducing hallucinations by grounding LLM responses in retrieved context
Building documentation assistants and research tools with domain-specific knowledge access

Who it's for

Backend engineers building knowledge-grounded AI applications
Full-stack developers creating chatbots and Q&A systems
Data scientists implementing semantic search and information retrieval
Teams needing to integrate LLMs with proprietary or real-time data sources

rag-implementation FAQ

Which vector database should I use?

Pinecone for managed/serverless, Chroma for lightweight local development, Weaviate for hybrid search, pgvector for SQL integration, or Qdrant for high performance. Choice depends on scale, infrastructure, and feature needs.

What embedding model should I choose?

Use voyage-3-large for Claude applications (Anthropic-recommended), text-embedding-3-large for OpenAI apps with high accuracy, text-embedding-3-small for cost-effectiveness, or bge-large-en-v1.5 for open-source local deployment.

How do I reduce hallucinations in RAG?

Ensure high-quality retrieval by using hybrid search, reranking results, and implementing a prompt that instructs the LLM to only answer based on provided context and say 'I don't know' when context is insufficient.

What's the difference between dense and sparse retrieval?

Dense retrieval uses semantic embeddings for meaning-based matching, while sparse retrieval uses keyword matching (BM25). Hybrid search combines both for better coverage of semantic and keyword-based queries.

Should I use reranking?

Yes, reranking improves quality by filtering and reordering initial retrieval results. Use cross-encoders for accuracy, MMR for diversity, or LLM-based scoring when you need semantic understanding of relevance.

Full instructions (SKILL.md)

Source of truth, from wshobson/agents.

name: rag-implementation description: Build Retrieval-Augmented Generation (RAG) systems for LLM applications with vector databases and semantic search. Use when implementing knowledge-grounded AI, building document Q&A systems, or integrating LLMs with external knowledge bases.

RAG Implementation

Master Retrieval-Augmented Generation (RAG) to build LLM applications that provide accurate, grounded responses using external knowledge sources.

When to Use This Skill

Building Q&A systems over proprietary documents
Creating chatbots with current, factual information
Implementing semantic search with natural language queries
Reducing hallucinations with grounded responses
Enabling LLMs to access domain-specific knowledge
Building documentation assistants
Creating research tools with source citation

Core Components

1. Vector Databases

Purpose: Store and retrieve document embeddings efficiently

Options:

Pinecone: Managed, scalable, serverless
Weaviate: Open-source, hybrid search, GraphQL
Milvus: High performance, on-premise
Chroma: Lightweight, easy to use, local development
Qdrant: Fast, filtered search, Rust-based
pgvector: PostgreSQL extension, SQL integration

2. Embeddings

Purpose: Convert text to numerical vectors for similarity search

Models (2026):

Model	Dimensions	Best For
voyage-3-large	1024	Claude apps (Anthropic recommended)
voyage-code-3	1024	Code search
text-embedding-3-large	3072	OpenAI apps, high accuracy
text-embedding-3-small	1536	OpenAI apps, cost-effective
bge-large-en-v1.5	1024	Open source, local deployment
multilingual-e5-large	1024	Multi-language support

3. Retrieval Strategies

Approaches:

Dense Retrieval: Semantic similarity via embeddings
Sparse Retrieval: Keyword matching (BM25, TF-IDF)
Hybrid Search: Combine dense + sparse with weighted fusion
Multi-Query: Generate multiple query variations
HyDE: Generate hypothetical documents for better retrieval

4. Reranking

Purpose: Improve retrieval quality by reordering results

Methods:

Cross-Encoders: BERT-based reranking (ms-marco-MiniLM)
Cohere Rerank: API-based reranking
Maximal Marginal Relevance (MMR): Diversity + relevance
LLM-based: Use LLM to score relevance

Quick Start with LangGraph

from langgraph.graph import StateGraph, START, END
from langchain_anthropic import ChatAnthropic
from langchain_voyageai import VoyageAIEmbeddings
from langchain_pinecone import PineconeVectorStore
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_text_splitters import RecursiveCharacterTextSplitter
from typing import TypedDict, Annotated

class RAGState(TypedDict):
    question: str
    context: list[Document]
    answer: str

# Initialize components
llm = ChatAnthropic(model="claude-sonnet-4-6")
embeddings = VoyageAIEmbeddings(model="voyage-3-large")
vectorstore = PineconeVectorStore(index_name="docs", embedding=embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

# RAG prompt
rag_prompt = ChatPromptTemplate.from_template(
    """Answer based on the context below. If you cannot answer, say so.

    Context:
    {context}

    Question: {question}

    Answer:"""
)

async def retrieve(state: RAGState) -> RAGState:
    """Retrieve relevant documents."""
    docs = await retriever.ainvoke(state["question"])
    return {"context": docs}

async def generate(state: RAGState) -> RAGState:
    """Generate answer from context."""
    context_text = "\n\n".join(doc.page_content for doc in state["context"])
    messages = rag_prompt.format_messages(
        context=context_text,
        question=state["question"]
    )
    response = await llm.ainvoke(messages)
    return {"answer": response.content}

# Build RAG graph
builder = StateGraph(RAGState)
builder.add_node("retrieve", retrieve)
builder.add_node("generate", generate)
builder.add_edge(START, "retrieve")
builder.add_edge("retrieve", "generate")
builder.add_edge("generate", END)

rag_chain = builder.compile()

# Use
result = await rag_chain.ainvoke({"question": "What are the main features?"})
print(result["answer"])

Detailed patterns and worked examples

Detailed pattern documentation lives in references/details.md. Read that file when the navigation tier above is insufficient.

Related skills

More from wshobson/agents and the wider catalog.

tailwind-design-system

wshobson/agents

Build production-ready design systems with Tailwind CSS v4, design tokens, and component libraries.

52k installsAudited

typescript-advanced-types

wshobson/agents

Master TypeScript's advanced type system: generics, conditional types, mapped types, and utility types for type-safe applications.

51k installsAudited

nodejs-backend-patterns

wshobson/agents

Build production-ready Node.js backends with Express/Fastify, middleware patterns, auth, and database integration.

38k installsAudited

python-performance-optimization

wshobson/agents

Profile and optimize Python code using cProfile, memory profilers, and performance best practices.

28k installsAudited

brand-landingpage

wshobson/agents

Brand-first landing page designer with guided interviews and Stitch-powered iteration.

26k installsAudited

python-testing-patterns

wshobson/agents

Implement comprehensive testing strategies with pytest, fixtures, mocking, and test-driven development.

26k installsAudited