One gateway for every LLM
Xoredge AI Platform
A cost-effective multi-provider AI gateway with failover, tool orchestration, structured outputs and built-in observability.
The Xoredge AI Platform is a single, self-hostable gateway that sits between your applications and every large-language-model provider — OpenAI, Anthropic, Google, Mistral, and local models via Ollama — behind one stable API.
Why teams build on it
Wiring an LLM into a product is easy. Keeping it cheap, reliable, observable and provider-independent in production is the hard part. The Xoredge AI Platform exists to solve exactly that.
- One API, every model — switch or mix providers with a config change, never a rewrite. No vendor lock-in.
- Automatic failover — if a provider is down, rate-limited or slow, requests fall through to a healthy one transparently.
- Cost-aware routing — send cheap, high-volume calls to small models and reserve frontier models for the hard ones.
- Structured outputs & tools — first-class JSON-schema responses and function/tool calling that behave the same across providers.
- Built-in observability — every request is traced with latency, token and cost metrics, exported to Prometheus and Grafana.
- Private by default — self-host it next to your data; nothing leaves your network unless you choose a hosted provider.
Cost-effective on purpose
Caching, prompt de-duplication and routing to the right-sized model routinely cut LLM spend by half or more versus calling a single frontier provider directly — without the application code ever knowing which model answered.
Powers our own products
This is not a side project. The same platform runs the AI assistant in College Admin and the forecasting features in Vouch, so it's hardened by the products we ship every day.
Multi-provider gateway
OpenAI, Anthropic, Google, Mistral and local Ollama models behind one API.
Automatic failover
Health-checked routing falls through to a working provider on errors or rate limits.
Cost-aware routing
Right-size every call — small models for volume, frontier models for hard tasks.
Structured outputs & tools
Provider-agnostic JSON-schema responses and function calling.
Observability built in
Per-request latency, token and cost metrics exported to Prometheus/Grafana.
Self-hosted & private
Deploy in your own network; your prompts and data never leave unless you say so.
FAQ
Which providers and models are supported?+
OpenAI, Anthropic (Claude), Google (Gemini), Mistral and any local model served through Ollama. New providers are added behind the same API, so your code never changes.
How does it actually reduce cost?+
Three ways: response caching for repeated prompts, routing routine calls to smaller/cheaper models, and falling back instead of retrying expensive models. You set the policy; the gateway enforces it.
Can we keep everything inside our own network?+
Yes. Self-host the gateway and point it only at local models, or at your own provider accounts. Nothing is sent to Xoredge.
How do we monitor what it's doing?+
Every request emits structured logs and Prometheus metrics — latency, tokens, cost and provider — which drop straight into Grafana dashboards we can set up for you.
Engineered for production, not demos
The boring guarantees that matter when AI is on your critical path.
Low latency
Connection pooling and streaming keep first-token times tight.
Resilient
Health checks, retries and failover so one provider outage isn't yours.
Caching
Deduplicate identical prompts and cache responses you can safely reuse.
Traceable
Every call is attributable to a user, feature and cost centre.
Drop-in
OpenAI-compatible surface — point your existing SDK at it.
Yours to run
Ships as one container; deploy on LIP or any Docker host.
Start free, scale when you need us
Self-host at no cost, or let us run and monitor it for you.
Put AI on your critical path with confidence
Tell us your use case and we'll show you a routing and cost plan in one call.