Shopify LLM proxy with automatic failover and internal distillation • Meteora Web Agency

Shopify has built an internal LLM proxy that gives every engineer access to multiple AI providers, with automatic failover when a model is deprecated, updated, or goes down. When Claude Fable 5 was shut down, Shopify's engineers didn't panic: the proxy shifted them to Claude Opus or GPT 5.5 seamlessly, without interrupting their workflows.

An LLM proxy to handle model volatility

“Fable looks amazing; we used it of course,” said Farhan Thawar, Shopify's head of engineering, on the VentureBeat Beyond the Pilot podcast. “When a model comes and then it goes, or it could be as innocuous as an update, the proxy allows us to spray across the different providers.” Shopify buys tokens in bulk and all users connect to models through its proxy. This provides reporting and failover; when there’s an availability issue with one provider, users are “automatically, seamlessly” transferred to another. Thawar advises enterprises to learn from this example and establish a solid backup plan to avoid being “super tied” to a specific provider. In a landscape where companies like Stripe, Anthropic, and OpenAI invest in AI health solutions, Shopify chooses the path of independent infrastructure.

Distillation: small, fast, and accurate models

Another key strategy is distillation. A student model learns from a teacher model and specializes in a narrow task. Small language models (SLMs) can be more beneficial than general-purpose ones. Shopify's flagship AI assistant, Sidekick, uses distilled models to perform numerous specialized subtasks for merchants, removing toil from their day-to-day. According to Thawar, these models can be 2x cheaper and faster, and in extreme cases 30x cheaper and faster. “It isn’t just about cost and latency; it’s about accuracy,” he emphasizes.

Automated distillation pipeline with Tangle

Engineers feed the pipeline with a teacher model, training data, evals, and a target model – for example, distilling Opus 4.8 down to Qwen 3.5. The process takes about a day and returns an evaluation of speed, cost, and accuracy for that subtask. If the tradeoff looks good, the engineer deploys it with no approval required. Shopify's internal platform Tangle lets anyone visualize the pipeline as it runs. Thawar dreams of a future where no target model is needed: the pipeline itself, based on data and evals, could suggest the best distillation target. “Maybe it'll be such a small model it could run on a phone,” he says.

From AI reflexivity to AI leverage

Shopify also implemented a usage dashboard tracking not only token spend but who uses the most expensive tokens, who spends more time on reasoning, and what models are used by discipline and level. Additionally, there are circuit breakers that ping users if a model runs for a long time (e.g., 10 hours) consuming many tokens: “Did you mean to spend this?” The ultimate goal, Thawar explains, is to move from “AI reflexivity” to “AI leverage,” getting people to think deeply about where AI can benefit their workflows most. Large language models (LLMs) are at the core of this strategy, as defined on Wikipedia. With this approach, Shopify demonstrates how a robust, agnostic infrastructure can reduce vendor lock-in and increase efficiency.

Source: https://venturebeat.com/orchestration/how-shopify-built-an-ai-stack-that-doesnt-care-which-models-survive