← Back to blog

Day 4 — AI Gateway Integration

The Problem

Every user on ekuri gets their own isolated container running an AI agent. That agent makes API calls to Claude. Until today, those calls went directly from the container to Anthropic's API — no logging, no caching, no rate limiting, no visibility.

If something went wrong, we had zero insight. If a user's prompt triggered an expensive loop, we'd only find out when the bill arrived.

What AI Gateway Does

Cloudflare AI Gateway is a proxy that sits between your app and any AI provider. It gives you:

  • Request logging — every prompt and response, searchable in the dashboard
  • Caching — identical prompts return cached responses, saving both latency and cost
  • Rate limiting — protect against runaway loops or abuse
  • Analytics — token usage, latency percentiles, error rates per user

The best part: it's a one-line URL change. No SDK, no middleware, no agent modifications.

The Implementation

The change was surgical. In the sandbox worker, wherever we build the container's environment, we swapped the base URL:

# Before
ANTHROPIC_BASE_URL=https://api.anthropic.com

# After
ANTHROPIC_BASE_URL=https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/anthropic

That's it. The Anthropic SDK respects ANTHROPIC_BASE_URL, so every API call from every container now flows through our AI Gateway automatically.

What We Learned

Caching is more useful than expected. System prompts and tool definitions get sent with every request. Since these are identical across messages in a session, the gateway caches them and returns responses faster for repetitive tool calls.

Logging changes how you debug. Before, debugging a user issue meant SSH-ing into their container and reading logs. Now we can see every API interaction in the Cloudflare dashboard with full request/response payloads.

Rate limiting is essential for BYOK users. Users who bring their own API key could accidentally (or intentionally) trigger infinite loops. Gateway rate limiting adds a safety net without touching the container code.

What's Next

Now that we have visibility into API usage, the next step is building a usage dashboard in ekuri itself. Users should be able to see their token consumption, cost estimates, and response times — not just us.

We're also exploring per-user caching rules. Power users who run repetitive workflows (like daily summarization) could benefit from aggressive caching, while conversational users need fresh responses every time.


This is part of our build-in-public series. Follow along as we build ekuri from scratch.