Grounded Docs MCP Server: Local RAG Documentation for AI Agents

AI coding assistants are only as good as the context they can access. When a model answers questions about a library or framework, outdated training data leads to hallucinations and incorrect APIs. Grounded Docs MCP Server solves this by turning any documentation into a local, queryable RAG index that your AI agent can access through the Model Context Protocol.

What Is Grounded Docs MCP Server

Grounded Docs MCP Server is a free, open-source documentation indexing tool that fetches and embeds docs from websites, GitHub repositories, npm, PyPI, and local files. It runs entirely on your machine, so no code or queries leave your network. The server exposes a standard MCP interface, making it compatible with OpenCode, IntelliJ IDEA, Claude, Cline, VS Code Copilot, Gemini CLI and any other MCP-compatible client.

Key features include:

Cross-platform - works on Windows, macOS, and Linux
Free and open source - MIT license, self-hosted with no usage limits
Broad format support - HTML, Markdown, PDF, Office documents, source code, and more
Version-specific indexing - target the exact library version used in your project
Multiple sources - index websites, GitHub repos, local folders, and ZIP archives

Prerequisites

You need Node.js 22 or newer installed on your system. The server is distributed via npm and executed through npx, so no global installation is required.

If you plan to use local embeddings instead of a cloud provider, you also need Ollama installed. This guide configures the server with the bge-m3 embedding model through Ollama for a fully offline setup.

Install Ollama and Pull the Embedding Model

The server supports several embedding providers, including OpenAI, Gemini, Azure, and Ollama. The default cloud model is OpenAI’s text-embedding-3-small, which works well for English-only content. For stronger multilingual support and better accuracy across programming docs in different languages, this guide uses bge-m3 running locally through Ollama.

Install Ollama for your platform, then pull the model:

ollama pull bge-m3

Once the download finishes, Ollama will serve the model on its default port. The Docs MCP Server will communicate with it automatically when configured.

Configure the Server

Grounded Docs MCP Server stores its settings in a local config file. You can tune the embedding model, index storage path, and scraper behavior through the CLI.

Run the following commands to apply the configuration used in this guide:

npx @arabold/docs-mcp-server@latest config set app.storePath /Users/drfits/develop/mcp_servers/documentation_store && \
npx @arabold/docs-mcp-server@latest config set app.embeddingModel bge-m3 && \
npx @arabold/docs-mcp-server@latest config set scraper.maxPages 100000 && \
npx @arabold/docs-mcp-server@latest config set scraper.maxDepth 5 && \
npx @arabold/docs-mcp-server@latest config set scraper.maxConcurrency 3 && \
npx @arabold/docs-mcp-server@latest config set splitter.preferredChunkSize 1000 && \
npx @arabold/docs-mcp-server@latest config set splitter.minChunkSize 300 && \
npx @arabold/docs-mcp-server@latest config set splitter.maxChunkSize 3000

Explanation of each setting:

app.storePath - directory where the vector index and metadata are stored
app.embeddingModel - first resets the model, then sets it to bge-m3
scraper.maxPages - maximum number of pages to fetch during a scrape job
scraper.maxDepth - how many link levels to follow from the starting URL
scraper.maxConcurrency - number of parallel fetch requests
splitter.preferredChunkSize - target size for each text chunk; smaller chunks return more precise results with less noise
splitter.minChunkSize - minimum chunk size to avoid indexing useless fragments like single-word headings
splitter.maxChunkSize - upper limit for cases where an unbreakable block of text is encountered

A preferredChunkSize of 1000 creates more granular fragments. For code this is critical: it is better to retrieve three small, precise functions than one massive wall of text. Adjust storePath and concurrency to match your hardware and project size.

Running the Server Locally via CLI

You can start the Docs MCP Server directly from the command line, routing embedding requests through Ollama. This is useful when you want full control over the launch process or need to debug the server output.

export OPENAI_API_KEY="ollama"
export OPENAI_API_BASE="http://localhost:11434/v1"

npx @arabold/docs-mcp-server@latest server \
  --store-path /Users/drfits/develop/mcp_servers/documentation_store \
  --embedding-model "openai:bge-m3"

The openai: prefix tells the embedding library to send requests in the standard OpenAI API format, which Ollama understands natively. The OPENAI_API_KEY value is arbitrary here since Ollama does not require authentication; setting it to "ollama" is a common convention.

Once the server starts, the Web UI is available at http://localhost:6280 and the SSE endpoint at http://localhost:6280/sse.

Platform-Specific MCP Client Configuration

After starting the server with npx @arabold/docs-mcp-server@latest, the Web UI becomes available at http://localhost:6280. You can add documentation sources there, or use the CLI to scrape directly.

To connect your AI client, add an MCP server entry pointing to the local SSE endpoint. The JSON structure is identical across all platforms:

{
  "mcpServers": {
    "docs-mcp-server": {
      "type": "sse",
      "url": "http://localhost:6280/sse"
    }
  }
}

Where you place this configuration depends on your client and operating system.

OpenCode

Create or open opencode.json in your project root and add the MCP server entry under the mcp key:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "docs-mcp-server": {
      "type": "sse",
      "url": "http://localhost:6280/sse"
    }
  }
}

This works the same way on macOS, Windows, and Linux because the file lives inside your project.

Choosing an Embedding Model

The server supports multiple embedding strategies. The table below compares the two most common options:

Model	Provider	Strengths	Best For
`text-embedding-3-small`	OpenAI (cloud)	Fast, low cost, easy setup	English-only documentation, quick experiments
`bge-m3`	Ollama (local)	Multilingual, high accuracy, fully offline	Teams with non-English docs, air-gapped environments

If your documentation includes languages other than English, or if you want to avoid sending data to third-party APIs, bge-m3 through Ollama is the better choice. It requires more local resources but keeps everything private.

Once a documentation index is built, you can share it with teammates without requiring everyone to scrape the same sources again. This is especially useful for large frameworks or internal documentation portals.

Set the server to read-only mode so that team members can query the index but cannot accidentally modify or reindex it:

DOCS_MCP_READ_ONLY=true DOCS_MCP_STORAGE_PATH=/Users/drfits/develop/mcp_servers/documentation_store npx @arabold/docs-mcp-server@latest

In this mode, the server loads the existing index from DOCS_MCP_STORAGE_PATH and disables all write operations. Use the same path you configured earlier with app.storePath.

To distribute the index, place the store directory on a shared cloud drive. Any service that syncs a local folder works:

OneDrive
Google Drive
Yandex Disk

One team member performs the initial indexing with write access. After the first sync completes, everyone else points their app.storePath to the synced folder and launches the server in read-only mode. This gives the entire team a consistent, up-to-date documentation source without duplicate network traffic or API costs.

Conclusion

Grounded Docs MCP Server brings local RAG to any AI coding assistant. By indexing documentation on your own hardware and exposing it through MCP, you eliminate outdated knowledge and keep your agent grounded in the exact APIs and versions you use. The cross-platform support, free open-source license, and ability to share indexes across teams make it a practical addition to any development workflow.

Start with Ollama and bge-m3 for a fully offline setup, or use OpenAI embeddings for a lighter cloud-backed configuration. Either way, your AI agent will finally have documentation it can trust.