When enterprises move AI workloads from pilot to production, data delivery often becomes the factor that determines whether those systems can scale reliably. Point-to-point architectures connecting storage directly to compute hold up under demonstration conditions, but they often break down under sustained, concurrent production traffic. The result is stalled inference pipelines, delayed RAG systems, underutilized GPUs, and SLA violations, all of which carry direct business consequences.
Hunter Smit, senior manager of product marketing at F5, states that "organizations successfully operationalize AI when their infrastructure is built to handle real-world failures, not just controlled conditions."
Production traffic exposes architectural weaknesses
In a pilot, a stalled transfer is an inconvenience, while in production, that same stall is an outage someone now owns. The underlying architecture is often identical in both cases: when a client is wired directly to storage, the system becomes increasingly fragile under sustained production traffic because that direct connection has no answer when a node fails or traffic spikes. From there, retries and timeouts cascade, and the entire pipeline backs up right at the moment the business is depending on the output. Paul Pindell, principal solutions architect for technology alliances at F5, explains: "Point-to-point architectures, where the S3 client connects directly to S3 storage, are not resilient. If a single storage node fails, all traffic to that cluster degrades, and in some cases the cluster can fail entirely."
Sponsored Protocol
The problem is that AI workflows, including RAG-based inference and agentic AI, increasingly treat S3 storage as a first-class citizen in the AI cluster. However, the network connectivity between that storage and the cluster was never designed for the high-throughput, uninterrupted data movement needed to keep GPUs running optimally.
The real cost of stalled inference pipelines and underutilized GPUs
Tanu Mutreja, senior director of product management at F5, remarks: "Enterprise leaders tend to frame AI infrastructure around GPU utilization, but what makes AI different from traditional deterministic workloads is that infrastructure continuously influences those outcomes at every interaction. In AI environments, infrastructure is no longer just a back-end concern. It shapes customer experience, quality, resilience, and cost with every transaction."
Sponsored Protocol
There can be significant business consequences. When inference pipelines stall, it becomes an SLA and customer experience issue. When RAG systems are delayed, models lose access to timely, relevant context, resulting in inaccurate, outdated, or hallucinated responses, which create operational, compliance, and reputational risks. At the same time, infrastructure issues can drive up costs by leaving expensive GPU resources idle or underutilized. Mutreja adds: "When GPUs are underutilized, it signals infrastructure inefficiencies that inflate costs while limiting scalability and responsiveness. The leadership question is whether the end-to-end AI infrastructure consistently delivers reliable, secure, high-quality, and governed AI experiences at sustainable unit economics."
Building a production-ready data delivery layer
F5 treats data delivery as a first-class infrastructure layer rather than assuming the network path will simply work. Where application delivery optimized the flow of requests between users and applications, data delivery optimizes the flow of data between storage, networks, and compute, including AI compute. Making data delivery a first-class layer means building three properties: observability, providing real-time visibility into latency, throughput, and flow health; programmability, enabling policy-driven control over how data moves through dynamic routing, traffic optimization, rate management, and automated failover; and failure-awareness, building resilience for degraded networks, storage throttling, and service disruptions.
Sponsored Protocol
In the architecture F5 has developed for Dell ObjectScale, F5 BIG-IP sits between ObjectScale and AI compute as a programmable control point at the storage edge. Pindell shares: "We have seen cases where a misconfiguration in the AI compute layer effectively DDoS'd the S3 storage infrastructure. Not in a malicious way, more of an 'Oh no, what did I do?' moment, but it still took storage down for the entire organization." Placing BIG-IP as the application delivery controller between the storage and compute layers protects storage with QoS, rate limits, and connection limits, keeping it resilient and operational under that load. SecureIQLab-validated testing confirmed that this protection does not come at the cost of throughput, which matters architecturally. Pindell says: "Preserving, and even improving, throughput is a must-have. It's what lets you layer on the higher-level functionality, resilience and enhanced security, without giving up performance to get there."
Sponsored Protocol
The added complexity of hybrid and multicloud AI
AI deployments in hybrid multicloud environments have an even greater data delivery challenge because of the heterogeneity involved. Data traversing these environments must contend with inconsistent policies, security controls, identity systems, governance requirements, fragmented visibility, and distinct failure boundaries. Programmable traffic management and observability address this complexity together. Observability provides a unified view of application, network, and infrastructure health across otherwise disconnected environments. Programmable traffic management uses those insights to intelligently route, balance, and fail over traffic in real time. Together, they create a closed-loop feedback system that enforces consistent policies, improves resilience across failure domains, and ensures reliable, high-performance AI data delivery regardless of where applications, data, or users reside.
What separates production AI from perpetual pilots
The organizations that move beyond perpetual pilots share a specific engineering discipline, Smit says. "They're the ones that reach for production design with failure as the normal state, not the exception. They will assume latency, congestion, and partial outages will happen. And they build a data path observable and failure-aware enough to absorb them, with explicit mitigation for every degraded condition rather than a hope that the network will hold." Organizations stuck in perpetual pilots are still optimizing for the perfect lab result and discovering the real-world gap only when a workload goes live. The issue is not model quality or GPU count, but whether the data delivery layer was engineered with the same rigor as the compute. Pindell concludes: "Teams need to understand that a real-world network behaves very differently from an optimized lab network. They need a mitigation plan for the failure states and performance bottlenecks they will hit in production."
Sponsored Protocol
For further reading, check the article on Claude Code down and on China's LineShine supercomputer. For more about S3 storage architecture, visit Wikipedia.