Building industrial AI from the inside out for a stronger digital core

BrandPost By Joe Mullich

Sep 10, 20257 mins

How engineering infrastructure for AI can dramatically improve performance and efficiency, while paving a clearer path to ROI.

A manufacturer was running an AI training workload on a cobbled together system of GPUs, storage, and switching infrastructure, believing it had all the necessary tech to achieve its goals. But the company had put little thought into how the components actually worked together.

Problems surfaced quickly. Training cycles dragged on for days instead of hours. Expensive hardware sat idle. And engineering teams began to wonder whether their AI investment would ever pay off.

This experience isn’t unique. As AI becomes a critical element of industrial operations worldwide, many organizations are discovering a counterintuitive truth: the biggest breakthroughs come not from piling on more GPUs or larger models, but from carefully engineering the entire infrastructure to work as a single, integrated system.

Engineering for outcomes

What became of that cobbled-together system? When it was properly engineered to balance compute, networking, and storage, the improvement was quick and dramatic, explains Jason Hardy, CTO of AI for Hitachi Vantara: a 20x boost in output and a matching reduction in “wall clock time,” the actual time it takes to complete AI training cycles.

“The infrastructure must be engineered so you understand exactly what each component delivers,” Hardy explains. “You want to know how the GPU drives specific outcomes, how that impacts the data requirements, and demands on throughput and bandwidth.”

Getting systems to run that smoothly means confronting a challenge most organizations would rather avoid: aging infrastructure.

Hardy points to a semiconductor manufacturer whose systems performed fine—until AI entered the picture. “As soon as they threw AI on top of it, just reading the data out of those systems brought everything to a halt,” he says.

This scenario reflects a widespread industrial reality. Manufacturing environments often rely on systems that have been running reliably for years, even decades. “The only places I can think of where Windows 95 still exists and is used daily are in manufacturing,” Hardy says. “These lines have been operational for decades.”

That longevity now collides with new demands: industrial AI requires exponentially more data throughput than traditional enterprise applications, and legacy systems simply can’t keep up. The challenge creates a fundamental mismatch between aspirations and capabilities.

“We have this transformational outcome we want to pursue,” Hardy explains. “We have these laggard technologies that were good enough before, but now we need a little bit more from them.”

From real-time requirements to sovereign AI

In industrial AI, performance demands often make enterprise workloads look leisurely. Hardy describes a visual inspection system for a manufacturer in Asia that relied entirely on real-time image analysis for quality and cost control. “They wanted AI for quality control and to improve yield, while also controlling costs,” he says.

The AI had to process high-resolution images at production speed—no delays, no cloud roundtrips. The system doesn’t just flag defects but traces them to the upstream machine causing the problem, enabling immediate repairs. It can also salvage partially damaged products by dynamically rerouting them for alternate uses, reducing waste while maintaining yield.

All of this happens in real-time while collecting telemetry to continuously retrain the models, turning what had been a waste problem into an optimization advantage that improves over time.

Using the cloud exclusively introduces delays that make near-real-time processing impossible, Hardy says. The latency from sending data to remote servers and waiting for results back can’t meet manufacturing’s millisecond requirements.

Hardy advocates a hybrid approach: design infrastructure with an on-premises mindset for mission-critical, real-time tasks, and leverage the cloud for burst capacity, development, and non-latency-sensitive cloud-friendly workloads. The approach also serves the rising need for sovereign AI solutions. Sovereign AI ensures that mission-critical AI systems and data remain within national borders for regulatory and cultural compliance. As Hardy says, countries like Saudi Arabia are investing heavily in bringing AI assets in-country to maintain sovereignty, while India is building language- and culture-specific models to accurately reflect its thousands of spoken languages and microcultures.

AI infrastructure is more than muscle

Such high-level performance requires more than just fast hardware. It calls for an engineering mindset that starts with the desired outcome and data sources. As Hardy puts it, “You should step back and not just say, ‘You need a million dollars’ worth of GPUs.’” He notes that sometimes, “85% readiness is sufficient,” emphasizing practicality over perfection.

From there, the emphasis shifts to disciplined, cost-conscious design. “Think about it this way,” Hardy says. “If an AI project were coming out of your own budget, how much would you be willing to spend to solve the problem? Then engineer based on that realistic assessment.”

This mindset forces discipline and optimization. The approach works because it considers both the industrial side (operational requirements) and the IT side (technical optimization)—a combination he says is rare.

Hardy’s observations align with recent academic research on hybrid computing architectures in industrial settings. A 2024 study in the Journal of Technology, Informatics and Engineering¹ found that engineered CPU/GPU systems achieved 88.3% accuracy while using less energy than GPU-only setups, confirming the benefits of an engineering approach.

The financial impact of getting infrastructure wrong can be substantial. Hardy notes that organizations have traditionally overspend on GPU resources that sit idle much of the time, while missing the performance gains that come from proper system engineering. “The traditional approach of buying a pool of GPU resources brings a lot of waste,” Hardy says. “The infrastructure-first approach eliminates this inefficiency while delivering superior results.”

Avoiding mission-critical mistakes

In industrial AI, mistakes can be catastrophic—faulty rail switches, conveyors without emergency shutoffs, or failing equipment can injure people or stop production. “We have an ethical bias to ensure everything we do in the industrial complex is 100% accurate—every decision has critical stakes,” Hardy says.

This commitment shapes Hitachi’s approach: redundant systems, fail-safes, and cautious rollouts ensure reliability takes precedence over speed. “It does not move at the speed of light for a reason,” Hardy explains.

The stakes help explain why Hardy takes a pragmatic view of AI project success rates. “Though 80-90% of AI projects never go to production, the ones that do can justify the entire effort,” he says. “Not doing anything is not an option. We have to move forward and innovate.”

For more on engineering systems for balanced and optimum AI performance, see AI Analytics Platform | Hitachi IQ

Jason Hardy is CTO of AI for Hitachi Vantara, a company specializing in data-driven AI solutions. The company’s Hitachi iQ platform, a scalable and high-performance turn-key solution, plays a critical role in enabling infrastructure that balances compute, networking, and storage to meet the demanding needs of enterprise and industrial AI.

¹Optimizing AI Performance in Industry: A Hybrid Computing Architecture Approach Based on Big Data | Journal of Technology Informatics and Engineering

Show me more