PluginBench
Skill
Pass
Audit score 90

rag-implementation

wshobson/agents

Build RAG systems with vector databases and semantic search to ground LLMs in external knowledge.

What is rag-implementation?

Retrieval-Augmented Generation (RAG) enables LLM applications to provide accurate, factual responses by retrieving relevant documents from external knowledge sources before generating answers. Use this skill when building Q&A systems, documentation assistants, chatbots with current information, or any application where reducing hallucinations and grounding responses in real data is critical.

  • Store and retrieve document embeddings efficiently using vector databases (Pinecone, Weaviate, Milvus, Chroma, Qdrant, pgvector)
  • Convert text to numerical vectors using embedding models optimized for different use cases (Voyage, OpenAI, open-source options)
  • Implement retrieval strategies including dense retrieval, sparse retrieval, hybrid search, multi-query, and HyDE approaches
  • Rerank retrieval results using cross-encoders, API-based reranking, MMR, or LLM-based scoring to improve quality
  • Build complete RAG pipelines with LangGraph that retrieve context and generate grounded answers

How to install rag-implementation

npx skills add https://github.com/wshobson/agents --skill rag-implementation
Prerequisites
  • Vector database account or local setup (Pinecone, Weaviate, Chroma, etc.)
  • Embedding model API key (Voyage AI, OpenAI, or local model)
  • LangChain and LangGraph libraries installed
  • Document collection or knowledge base to index
Claude Code
Cursor
Windsurf
Cline

How to use rag-implementation

  1. 1.Choose and set up a vector database (managed like Pinecone or local like Chroma)
  2. 2.Select an embedding model appropriate for your use case (Voyage-3-large recommended for Claude apps)
  3. 3.Prepare and chunk your documents using RecursiveCharacterTextSplitter or similar
  4. 4.Generate embeddings and store documents in the vector database
  5. 5.Implement a retriever using your vector store
  6. 6.Build a RAG graph with retrieve and generate nodes using LangGraph
  7. 7.Connect the retriever to an LLM with a prompt template that includes context
  8. 8.Test with sample questions and iterate on retrieval strategies if needed

Use cases

Good for
  • Building Q&A systems over proprietary documents and knowledge bases
  • Creating chatbots that provide current, factual information with source citations
  • Implementing semantic search with natural language queries across large document collections
  • Reducing hallucinations by grounding LLM responses in retrieved context
  • Building documentation assistants and research tools with domain-specific knowledge access
Who it's for
  • Backend engineers building knowledge-grounded AI applications
  • Full-stack developers creating chatbots and Q&A systems
  • Data scientists implementing semantic search and information retrieval
  • Teams needing to integrate LLMs with proprietary or real-time data sources

rag-implementation FAQ

Which vector database should I use?

Pinecone for managed/serverless, Chroma for lightweight local development, Weaviate for hybrid search, pgvector for SQL integration, or Qdrant for high performance. Choice depends on scale, infrastructure, and feature needs.

What embedding model should I choose?

Use voyage-3-large for Claude applications (Anthropic-recommended), text-embedding-3-large for OpenAI apps with high accuracy, text-embedding-3-small for cost-effectiveness, or bge-large-en-v1.5 for open-source local deployment.

How do I reduce hallucinations in RAG?

Ensure high-quality retrieval by using hybrid search, reranking results, and implementing a prompt that instructs the LLM to only answer based on provided context and say 'I don't know' when context is insufficient.

What's the difference between dense and sparse retrieval?

Dense retrieval uses semantic embeddings for meaning-based matching, while sparse retrieval uses keyword matching (BM25). Hybrid search combines both for better coverage of semantic and keyword-based queries.

Should I use reranking?

Yes, reranking improves quality by filtering and reordering initial retrieval results. Use cross-encoders for accuracy, MMR for diversity, or LLM-based scoring when you need semantic understanding of relevance.

Full instructions (SKILL.md)

Source of truth, from wshobson/agents.


name: rag-implementation description: Build Retrieval-Augmented Generation (RAG) systems for LLM applications with vector databases and semantic search. Use when implementing knowledge-grounded AI, building document Q&A systems, or integrating LLMs with external knowledge bases.

RAG Implementation

Master Retrieval-Augmented Generation (RAG) to build LLM applications that provide accurate, grounded responses using external knowledge sources.

When to Use This Skill

  • Building Q&A systems over proprietary documents
  • Creating chatbots with current, factual information
  • Implementing semantic search with natural language queries
  • Reducing hallucinations with grounded responses
  • Enabling LLMs to access domain-specific knowledge
  • Building documentation assistants
  • Creating research tools with source citation

Core Components

1. Vector Databases

Purpose: Store and retrieve document embeddings efficiently

Options:

  • Pinecone: Managed, scalable, serverless
  • Weaviate: Open-source, hybrid search, GraphQL
  • Milvus: High performance, on-premise
  • Chroma: Lightweight, easy to use, local development
  • Qdrant: Fast, filtered search, Rust-based
  • pgvector: PostgreSQL extension, SQL integration

2. Embeddings

Purpose: Convert text to numerical vectors for similarity search

Models (2026):

ModelDimensionsBest For
voyage-3-large1024Claude apps (Anthropic recommended)
voyage-code-31024Code search
text-embedding-3-large3072OpenAI apps, high accuracy
text-embedding-3-small1536OpenAI apps, cost-effective
bge-large-en-v1.51024Open source, local deployment
multilingual-e5-large1024Multi-language support

3. Retrieval Strategies

Approaches:

  • Dense Retrieval: Semantic similarity via embeddings
  • Sparse Retrieval: Keyword matching (BM25, TF-IDF)
  • Hybrid Search: Combine dense + sparse with weighted fusion
  • Multi-Query: Generate multiple query variations
  • HyDE: Generate hypothetical documents for better retrieval

4. Reranking

Purpose: Improve retrieval quality by reordering results

Methods:

  • Cross-Encoders: BERT-based reranking (ms-marco-MiniLM)
  • Cohere Rerank: API-based reranking
  • Maximal Marginal Relevance (MMR): Diversity + relevance
  • LLM-based: Use LLM to score relevance

Quick Start with LangGraph

from langgraph.graph import StateGraph, START, END
from langchain_anthropic import ChatAnthropic
from langchain_voyageai import VoyageAIEmbeddings
from langchain_pinecone import PineconeVectorStore
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_text_splitters import RecursiveCharacterTextSplitter
from typing import TypedDict, Annotated

class RAGState(TypedDict):
    question: str
    context: list[Document]
    answer: str

# Initialize components
llm = ChatAnthropic(model="claude-sonnet-4-6")
embeddings = VoyageAIEmbeddings(model="voyage-3-large")
vectorstore = PineconeVectorStore(index_name="docs", embedding=embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

# RAG prompt
rag_prompt = ChatPromptTemplate.from_template(
    """Answer based on the context below. If you cannot answer, say so.

    Context:
    {context}

    Question: {question}

    Answer:"""
)

async def retrieve(state: RAGState) -> RAGState:
    """Retrieve relevant documents."""
    docs = await retriever.ainvoke(state["question"])
    return {"context": docs}

async def generate(state: RAGState) -> RAGState:
    """Generate answer from context."""
    context_text = "\n\n".join(doc.page_content for doc in state["context"])
    messages = rag_prompt.format_messages(
        context=context_text,
        question=state["question"]
    )
    response = await llm.ainvoke(messages)
    return {"answer": response.content}

# Build RAG graph
builder = StateGraph(RAGState)
builder.add_node("retrieve", retrieve)
builder.add_node("generate", generate)
builder.add_edge(START, "retrieve")
builder.add_edge("retrieve", "generate")
builder.add_edge("generate", END)

rag_chain = builder.compile()

# Use
result = await rag_chain.ainvoke({"question": "What are the main features?"})
print(result["answer"])

Detailed patterns and worked examples

Detailed pattern documentation lives in references/details.md. Read that file when the navigation tier above is insufficient.