md-server

HTTP API and MCP server that converts documents, web pages, and media to markdown. Auto-detects input type (PDF, Office, images, URLs), delegates to MarkItDown or Crawl4AI, and returns clean markdown with metadata. Single /convert endpoint handles everything.

Overview

  • Language: Python
  • Repo: peteretelej/md-server
  • Install: uvx md-server or pip install md-server
  • Status: stable (Production/Stable)

Architecture

The project runs in two modes: an HTTP server (Litestar + Uvicorn) and an MCP server (FastMCP over stdio). Both share the same core conversion logic.

Core modules (src/md_server/core/):

  • converter.py - DocumentConverter class, the central workhorse. Exposes convert_file(), convert_url(), convert_content(), and convert_text(). URL conversion has two paths: MarkItDown.convert() for static pages, and Crawl4AI’s AsyncWebCrawler for JavaScript-rendered content. File/content conversion uses MarkItDown.convert_stream() with StreamInfo for format hints. Includes markdown cleaning, four truncation modes (chars, tokens, sections, paragraphs), safe truncation that avoids breaking code blocks, and SSRF validation for URLs.
  • detection.py - ContentTypeDetector with layered detection: magic bytes (PDF, PNG, JPEG, ZIP/Office, audio), filename extension via mimetypes, and content heuristics (null bytes, non-printable character ratio, BOM detection). Also handles the unified input type dispatch (detect_input_type) that figures out whether a request contains a URL, base64 content, or raw text.
  • config.py - Settings via Pydantic Settings with env var support. Configures conversion timeout, max file size, debug mode, API key auth, SSRF controls (allow_localhost, allow_private_networks).
  • errors.py - Error hierarchy with HTTP error classification. classify_http_error() maps upstream HTTP status codes and connection errors into typed exceptions (NotFoundError, AccessDeniedError, URLTimeoutError, etc.) with user-facing suggestions.
  • factories.py - MarkItDownFactory for creating configured MarkItDown instances.
  • validation.py - Input validation utilities.
  • browser.py - BrowserChecker for detecting Playwright/Crawl4AI availability at startup.

HTTP layer (src/md_server/):

  • app.py - Litestar application with dependency injection. Registers ConvertController, health/formats endpoints, and optional API key auth middleware. Runs browser detection at startup to configure JS rendering capability.
  • controllers.py - ConvertController with a single POST /convert endpoint. Parses three input modes (JSON body, multipart file upload, raw binary), delegates to DocumentConverter, and supports content negotiation (JSON response by default, raw markdown via Accept: text/markdown header or output_format option).
  • models.py - Pydantic models for API requests/responses (ConvertResponse, ErrorResponse, ConversionMetadata, TruncationInfo).
  • middleware/auth.py - Optional API key authentication middleware.
  • security/url_validator.py - SSRF protection that blocks requests to private networks and localhost unless explicitly allowed.

MCP layer (src/md_server/mcp/):

  • server.py - FastMCP server with a single convert_to_markdown tool. Wraps the same DocumentConverter as the HTTP layer. Handles base64 file content decoding and maps errors to ToolError with suggestions.
  • handlers.py - handle_read_resource() function that dispatches URL vs file conversion, validates inputs, and returns either raw markdown or structured JSON based on output_format.
  • tools.py - MCP tool schema definition (read_resource tool with properties for URL, file_content, render_js, truncation options, etc.).
  • models.py - MCP-specific response models (MCPSuccessResponse, MCPErrorResponse).
  • errors.py - Factory functions for structured MCP error responses with suggestions.

SDK (src/md_server/sdk/):

  • converter.py - Programmatic SDK for using md-server as a Python library.
  • remote.py - HTTP client for talking to a remote md-server instance.

Metadata (src/md_server/metadata/):

  • extractor.py - Extracts title, token count (via tiktoken), and language detection (via langdetect) from converted markdown. Can optionally inject YAML frontmatter with extracted metadata.

Data flow for a URL conversion:

  1. POST /convert with {"url": "https://..."} (or MCP convert_to_markdown call)
  2. Controller parses request, creates options dict
  3. DocumentConverter.convert_url() validates URL (SSRF check), checks if JS rendering requested
  4. If JS rendering: Crawl4AI launches headless Chromium, crawls page, returns markdown
  5. If static: MarkItDown.convert(url) fetches and converts
  6. Apply truncation and cleaning options
  7. Extract metadata (title, tokens, language)
  8. Return markdown (with optional YAML frontmatter) or JSON with full metadata

Key Design Decisions

Two conversion backends. MarkItDown handles static content and file formats (PDF, Office, images). Crawl4AI with Playwright handles JavaScript-rendered pages. The render_js flag switches between them. This gives good coverage without making every request pay the browser startup cost.

Single endpoint, auto-detection. Instead of separate /convert/url, /convert/file, /convert/text endpoints, a single POST /convert uses content type detection to figure out what to do. JSON body with a url field triggers URL conversion, content field triggers base64 file conversion, multipart triggers file upload. This simplifies the API surface.

Dual-mode server. The same conversion logic runs as either an HTTP API (for integration with other services) or an MCP server (for direct AI assistant use). The __main__.py entrypoint uses --mcp-stdio to switch modes. This avoids code duplication and keeps behavior consistent.

SSRF protection by default. URL validation blocks private networks (10.x, 172.16.x, 192.168.x) and localhost unless explicitly enabled via settings. This is important because the server fetches arbitrary URLs from user input.

Structured truncation. Four truncation modes (chars, tokens, sections, paragraphs) with safe boundary detection. Token-based truncation uses tiktoken’s cl100k_base encoding for accuracy. Section-based truncation splits at ## headings. All modes avoid breaking inside code blocks by checking fence count parity.

Content negotiation. The HTTP API defaults to JSON responses with full metadata, but clients can request raw markdown via Accept: text/markdown header. The MCP server defaults to raw markdown since that is what AI assistants typically want.

Development

# Install dependencies
uv sync

# Run HTTP server
uv run md-server --host 127.0.0.1 --port 8080

# Run MCP server
uv run md-server --mcp-stdio

# Run tests
uv run pytest

# Type checking
uv run mypy src/

# Lint
uv run ruff check src/

# Docker
docker build -t md-server .
docker run -p 8080:8080 md-server