Cloudflare AI

Cloudflare’s AI stack runs on the same edge network as Workers. You get serverless GPU inference (Workers AI), a proxy layer for any AI provider (AI Gateway), a vector database for embeddings (Vectorize), managed RAG (AI Search), and a framework for building stateful AI agents (Agents SDK). Everything connects through bindings, same as D1 or R2.

The stack composes naturally: a Worker calls Workers AI for embeddings, stores them in Vectorize, proxies LLM calls through AI Gateway for caching and fallback, and wraps the whole thing in an Agent for stateful conversation. Each piece is useful on its own, but they’re designed to work together.

Key Facts

  • Workers AI: Serverless GPU inference - LLMs, embeddings, image generation
  • AI Gateway: Proxy for any AI provider with caching, rate limiting, fallback
  • Vectorize: Vector database for similarity search and RAG
  • AI Search: Managed RAG pipeline (no infra to manage)
  • Agents SDK: Stateful agents built on Durable Objects (npm install agents)
  • Pricing: Workers AI has a free tier (10K neurons/day); paid via usage
  • Docs: developers.cloudflare.com/workers-ai

Contents

Concepts

  • AI Landscape - Map of the AI stack: Workers AI, AI Gateway, Vectorize, AI Search
  • Agents Model - How the Agents SDK works on Durable Objects

Quickstarts

  • Workers AI - LLM inference, streaming, embeddings, image generation
  • AI Gateway - Proxy setup, caching, rate limiting, provider fallback
  • Vectorize RAG - Full RAG pipeline: embed, store, query, generate
  • Agents SDK - Stateful agent with tools and conversation state

Deep Dives

  • RAG Patterns - DIY RAG vs AI Search, chunking, reranking, hybrid search
  • Agent Patterns - Scheduled agents, MCP, human-in-the-loop, multi-agent coordination

Notes

  • Model Catalog - Key models, speed tiers, neurons pricing, cost estimation
  • Gotchas - Common pitfalls across Workers AI, Vectorize, AI Gateway, Agents SDK
  • Cloudflare Platform - Workers, D1, R2, KV, Durable Objects, and the runtime that underpins the AI stack
  • Cloudflare Frameworks - Frontend frameworks for building full-stack apps that use Workers AI and Agents SDK

Resources