For the past three years, the AI trade was simple: buy Nvidia and anything connected to it. But now, that era is coming to an end.

Nvidia's GPUs were essential for training large language models, when training was all that mattered. Nvidia's datacentre revenue went from $15 billion in FY2023 to $115 billion in FY2025. The stock rose by nearly 1200% from Jan 2024 to today.

In doing so, they also took an entire ecosystem with them (chip fabs like TSMC, HBM memory players like SK Hynix, advanced packaging, liquid cooling, power infrastructure) to ride the wave.

But as the value in AI increasingly moved from spending a trillion dollars in creating new models to delivering the business-critical results clients required by working in their environments: at lightning speed, an affordable price, and at a much larger scale, a lot of things started to make sense.

2026 is the year of the inference pivot. And in this edition of Impactfull Weekly, we dive into how the winners of the first era might not look the same as this one, and why.

Part 1: Training Era (2023-2025)

But first, a quick refresher on the era we just left behind. 

Think of training an LLM like building a gigafactory. It requires massive upfront capital, specialized equipment, and significant time to raise from the ground up. But once it’s built, you don't rebuild it from scratch every day, you just focus on running it.

From 2023 to 2025, the entire AI industry was stuck in this "factory building" mode. 

OpenAI, Google, Anthropic, and Meta were locked in a furious arms race, competing for incremental benchmark gains. Because each new model required significant pre-training data, costly training runs, and post-training alignment, model release costs soared into the billions.

This phase belonged completely to Nvidia. Their H100 GPU became the gold standard of AI capability. Hyperscalers needed a "one-size-fits-all" chip capable of handling massive, diverse training workloads, and Nvidia held a functional monopoly on that power. 

It was simply not compelling for companies to risk billions on unproven, specialised chips when raw performance was the only metric that mattered.

The result was a historic spending spree. 

Hyperscalers bought every server rack they could find, pushing Nvidia’s gross margins from 75% to over 85%. Capex from Microsoft, Google, Amazon, Meta, and Oracle hit $251 billion in 2024, rising to $405 billion in 2025.

Investors rewarded this boom accordingly:

  • Nvidia rose 234% in 2023, 173% in 2024, and 39% in 2025.
  • Broadcom rose 108%.
  • Arista Networks rose 88%.

However, as we close this chapter and look into 2026, the reality is more nuanced. 

We are realising that training is a one-time, episodic activity rather than a constant linear climb. As Dario Amodei (CEO of Anthropic) noted, each generation of LLM is its own CapEx cycle with its own payback timeline, turning the frontier lab revenue model into a series of overlapping S-curves.

Furthermore, we are hitting a ceiling. Models have not dramatically improved over the last quarter, not for a lack of chips, but because we are reaching the upper limit of high-quality training data.

This “model saturation” signals the end of the "Build" phase and the start of the "Run" phase. 

Now that inference accounts for a majority of workloads (up from 5% in 2023), the priorities have changed. 

The aim is no longer just capability at any cost, but efficiency at scale.

Part 2: Inference Era (2025-2028)

With the launch of Claude Opus 4.5 and Gemini 3 Pro, the nature of AI consumption has evolved rapidly.

Usage has diversified beyond simple user-led chatbot interactions into complex, agentic workflows. In this new era, a single user objective no longer generates a single query, but instead it initiates a cascade of API requests, exponentially increasing the volume of inference required per task.

Software engineers began refactoring entire codebases in one go through Claude Code. And Gemini 3 Pro brought native multimodal reasoning at a price point that finally made video analysis and rapid prototyping economically viable for developers.

The release of these two models in November 2025 marked a decisive tipping point, establishing autonomous agents as the default operating model.

As agents became the standard for executing workflows and delegating complex tasks, the consumption of inference tokens accelerated dramatically. In this environment, you are burning inference constantly, as a single high-level objective now triggers multiple, cascading calls to the LLM to verify, iterate, and complete the work.

This shift has made inference the dominant compute workload over training.

Inference is far more price sensitive than training. Enterprise customers are increasingly looking at the price per token of each query, because the competitive advantage of using one frontier model over another is quickly being worn out as most of these labs are reaching human-level capabilities in certain use cases.

(source: Toby Ord’s Blog)

This chart visualizes the 1,000x cost collapse that unlocks the Inference Era. 

The yellow line shows that human experts cost hundreds or thousands of dollars to complete complex tasks, while the colored lines show AI agents achieving similar results for mere pennies. 

This massive price drop is what makes agentic workflows using an exponential amount of inference possible: when intelligence is this cheap, you no longer need to run a model just once. You can afford to run it thousands of times in a loop to iterate, double-check, and refine its own work, trading cheap compute for impeccable reliability.

For a large enterprise, paying $0.01 instead of $0.001 per query is the difference between a $10 million annual bill and a $1 million one.

Since switching costs are negligible, companies will ruthlessly migrate to the cheapest adequate provider to secure these savings. 

This creates a problem for Nvidia and an opportunity for nearly everyone else that has been trying to take them down.

The “Nvidia tax” becomes unbearable at the scale of millions of model/tool calls per day. Nvidia’s chips are expensive, general-purpose Ferraris as they can handle diverse and unpredictable forms of workloads. On the other hand, what inference workloads need are cheap and reliable performance, like a Toyota Camry.

It just so happens that custom silicon chips (ASICs, or Application-Specific Integrated Circuits) can deliver that efficiency at a fraction of the cost. Google's TPUs, Amazon's Trainium and Inferentia chips, and Broadcom's custom designs for Meta and ByteDance all target inference specifically.





With a majority of workload being on inference from here on out, Nvidia faces a profit and expectations cliff as ASICs are already becoming the go-to chip architecture for one specific application: enabling millions of model calls lightning quick at scale, and at competitive cost.

Pricing has plummeted due to rapid hardware improvements. The cost of running a GPT-4 equivalent model has dropped more than 100x, from $30 per million tokens to just $0.28 with DeepSeek V3.2.

Crucially, this token cheapness creates a rebound effect. As intelligence becomes more affordable, developers stop rationing it and start increasing their usage exponentially, consuming more capacity than ever before.

Even the new “premium” tier of reasoning models are extraordinarily cheap (DeepSeek R1), costing just $0.55 per million tokens.

As the marginal cost of intelligence approaches zero, demand has shifted from single "queries" to continuous "Agentic Loops." Developers no longer ration intelligence, they build systems that iterate, debug, and self-correct thousands of times per task.

The commoditisation of inference is being driven by providers running on specialised, non-GPU silicon (TPUs, LPUs, and custom ASICs) that prioritise massive throughput over raw training power.

In essence, when Nvidia will lose its hundreds of billions of revenue from the Training Era, it will flow throughout the Inference Era supply chain, from custom ASICs designers, manufacturers, to distributed datacenter builders, and baseload energy providers.

Part 3: What changes in the value chain (and who wins)

In this section, we will dive deeper into the transition from training-era workloads to inference-era workloads and look at the winners and losers from each part of the AI infrastructure supply chain. 

Custom silicon design

The hyperscaler breakaway from Nvidia is accelerating faster than anticipated. They are realising they do not need expensive "Swiss Army Knife" GPUs for daily operations, instead going for cheaper, single-purpose chips optimised purely for running the models.

Broadcom reported $6.5 billion in AI semiconductor revenue for Q4 FY2025, up 74% year-over-year, and guided Q1 FY2026 AI revenue to $8.2 billion (doubling year-over-year). The company now has five confirmed hyperscaler XPU customers: Google, Meta, ByteDance, plus Anthropic (which signed an $11 billion order for TPU capacity in late 2025) and a fifth customer that placed a $1 billion order in Q4.

The OpenAI partnership announced in October represents a potential $150-200 billion multi-year deal to deploy 10 gigawatts of custom chips for Stargate by 2029. Google's TPU programme alone is expected to generate $22 billion in Broadcom revenue in FY 2026.

The second player in this custom ASIC/TPU design is Marvell who has secured multi-year deals with AWS for its Trainium 2, 3 & 4 generation chips as well as Microsoft for its Maia 100 & Maia 200 generation chips.

Nvidia, however, is not ceding this territory without a fight. In a material evolution of the inference wars, Nvidia executed a $20 billion 'acquihire' of Groq in late December 2025 to secure their LPU (Language Processing Unit) architecture. By absorbing Groq's engineering team (notably the creators of the TPU system at Google) and licensing their SRAM-based technology (Static RAM is known for its quick access times and lower latency compared to Dynamic RAM, making it ideal for inference), Nvidia is effectively bypassing the HBM 'memory wall' that throttled their GPUs to date.

Winners: Broadcom (AVGO), Marvell (MRVL), ARM Holdings (ARM)

“Relative” Losers: Nvidia (NVDA), Advanced Micro Devices (AMD), Intel (INTC) 

High bandwidth memory

High Bandwidth Memory is now a critical bottleneck in the AI supply chain, with the market diverging into two distinct segments based on workload requirements: training versus inference.

(source: Korea Herald) (in picture: Samsung Electronics Chair Lee Jae-yong, Hyundai Motors Chair Chung Euisun, Nvidia CEO Jensen Huang)

Training: SK Hynix remains the primary supplier for the Nvidia ecosystem. As the lead partner for Jensen’s upcoming Vera Rubin chips, SK Hynix has secured the majority of orders for training-focused hardware. 

The company’s CFO recently confirmed that their entire 2026 HBM supply is already sold out. With mass production of next-generation HBM4 chips beginning in February 2026, SK Hynix is projected to capture approximately 70% of the HBM4 supply for Nvidia’s premium clusters. 

Inference: The market for inference workloads is evolving differently and benefiting Samsung Electronics. Having arrived late to the Nvidia memory party (due to approval delays), getting approved more than a year after SK Hynix was, they chose to pivot. 

They have now become the go-to provider for the hyperscaler custom silicon market. Samsung has reportedly secured HBM4 qualification with Broadcom for Google's TPU v8 program. 

This alignment positions Samsung as a dominant memory supplier for the custom ASIC sector, which demands high-volume memory for inference and scale-out workloads rather than raw training power.

Winners: SK Hynix (000660.KS), Samsung Electronics (005930.KS), Micron (MU)

“Relative” Losers: Nanya Technology (2408.TW), Winbond (2344.TW), Lenovo (HK:0992)

Advanced packaging

TSMC’s Chip-on-Wafer-on-Substrate technology, nicknamed CoWoS by industry experts, has been the gold standard for the entire AI industry and no other company has a different approach to chip assembly. TSMC is almost doubling its monthly wafer assembly capacity by the end of 2026 to accommodate the demand, from 75,000 wafer starts to 130,000.

Nvidia has locked up approx. 60% of global CoWoS supply for his Vera Rubin chips, Broadcom holds 15% of the supply and AMD has 11%, leaving less than 15% global capacity for remaining players.

In Taiwan, business is personal, and the only way to secure billion-dollar packaging allocations is by being there in person. Jensen Huang, Nvidia’s CEO has been to Taiwan 5+ times in 2025 alone, seen dining with TSMC founder Morris Chang with whom he has a near 30-year working relationship, as well as C.C. Wei, TSMC’s Chairman to secure this supply.

For context, Google reportedly cut its 2026 TPU production target from 4 million to 3 million units due to CoWoS access limitations from TSMC. Companies that are unable to secure packaging capacity are exploring Intel's EMIB as an alternative. 

Although Intel is an alternative for packaging, it is still seen as a risky play due to their lack of track record at scale for millions of units. As customers like Nvidia & Apple are risk-averse, because a 1% error rate on manufacturing chips worth $30,000 is unacceptable, they are sticking with TSMC.

However, Apple and Google are reportedly in discussions with Intel for future designs, though Intel's capacity remains limited.

The second order effect of this TSMC order backlog is that it is delegating the “simpler” packaging tasks (the oS part of CoWoS) to players like ASE and Amkor that are picking up this spillover effect.

Finally, the machines that TSMC & Intel uses for this advanced packaging come from a company called BE Semiconductor, making ASE, Amkor & BE a great play on debottlenecking CoWoS.

Winners: TSMC (TSM), ASE Technology (ASX), Amkor (AMKR), BE Semiconductor (BESI.AS)

“Relative” Losers: JCET Group (SHA:600584), Samsung Foundry

Datacenter configuration

2026 marks the pivot from building massive centralised training clusters to deploying distributed inference infrastructure. With inference, latency is paramount, as many enterprises are actively moving AI inference to edge environments for energy efficiency & latency.

When frontier models like Opus 4.5 & Gemini 3 are smart enough for most tasks, we officially reached the point of “Model Saturation” where most models are increasingly interchangeable for business results. 

Therefore also paving the way for three parallel AI grids matching electricity's grid-style architecture: public cloud for training, on premises for private agents, on edge for devices and robots.

Dell's CTO John Roese observes that "AI is increasingly living closer to where the data and users are, which is out at the edge, on the device, in the real world." 

Winners:  Dell (DELL), Quanta Computer (2382.TW), Wiwynn Corp (6669.TW)

“Relative” Losers: Intel (INTC), Super Micro (SMCI)

Cooling

For 20 years, data centres were cooled by CRAC units pushing cold air through raised floors. In 2026, this method is obsolete for AI inference.

Air cannot handle the heat of AI computing. A single rack containing 72 Nvidia Blackwell GPUs generates over 120kW of heat, roughly the energy footprint of 100 homes.

To manage this, we are moving to an era of liquid cooling. New data centres are being built with this technology as standard, while old facilities are being retrofitted. Direct-to-Chip (DTC) liquid cooling loops are replacing legacy infrastructure across every hyperscale cluster.

This infrastructure overhaul is the major bet for 2026. As construction enters high gear, the primary beneficiaries will be the specialists making it possible: Vertiv for thermal fitting, nVent for liquid cooling loops, and Modine for heavy-duty heat rejection.

Winners: Vertiv (VRT), nVent Electric (NVT), Modine (MOD)

Losers: Johnson Controls (JCI), Trane Technologies (TT)

Networking

The networking narrative now consists of two distinct parts. Front-end networks, which connect clusters to the outside world, remain dominated by Ethernet. Back-end networks, which connect GPUs within clusters, are currently transitioning from Nvidia's proprietary InfiniBand toward open, Ethernet-based solutions.

The datacenter networking story is no longer a zero-sum game between protocols but a rapidly expanding ecosystem where three giants are growing simultaneously. While front-end networks remain Ethernet-dominated, the back-end infrastructure connecting GPUs (and soon ASICs) is evolving into a battleground of speed and standards.

Nvidia’s networking division is continuing to expand at an impressive rate. In Q3 2025, networking revenue hit $8.19 billion (up 162% year-over-year), nearly 4x the total quarterly revenue of its closest competitor. Nvidia is successfully hedging its bets by selling InfiniBand to performance purists and its new Spectrum-X platform to Ethernet proponents, ensuring it captures value regardless of the protocol customers choose.

However, the challengers are bringing the fight to Nvidia. 

Arista secured the hyperscale front-end market (retaining Meta and Microsoft) and is tracking toward $10 billion in annual revenue by 2026. Collaborating with Broadcom, whose AI switch backlog exceeds $10 billion, Arista is pushing open standards like ESUN to challenge Nvidia’s proprietary grip inside the rack.

As AI workloads shift from training (smaller footprint, ultra powerful GPUs) to inference (distributed networks & efficient ASICs), the demand for these standards (ESUN, Spectrum-X, etc.) will continue to rise.

Despite the battles of Nvidia, Arista, Broadcom, the clearest winners in this bandwidth arms race are likely the component suppliers: Lumentum and Coherent. 

As chip speeds jump to 1.6 Terabits per second, copper cabling hits physical limits and cannot transmit data effectively over larger distances.

This physics problem requires a massive increase in optical lasers and transceivers for every connection. Whether the switch is made by Nvidia, Arista, or Broadcom, it requires the optical technology that these two provide.

Winners: Lumentum (LITE), Coherent (COHR)

Losers: Intel Networking Division (INTC), Cisco (CSCO)

Power infrastructure

Nuclear power has become the preferred solution for hyperscaler energy needs. 

The biggest news of January 2026: Meta announced agreements with Vistra, TerraPower, and Oklo for up to 6.6 gigawatts of nuclear capacity by 2035, on top of its existing 1.1 GW deal with Constellation. Meta's chief global affairs officer called these deals "one of the most significant corporate purchases of nuclear energy in American history."

The Vistra agreement provides immediate access to 2.1 GW from existing plants in Ohio and Pennsylvania, plus 433 MW in uprates scheduled for the early 2030s. The TerraPower (Bill Gates-backed) and Oklo (Sam Altman-backed) deals support advanced reactor development, with first power delivery expected around 2030-2032.

This follows Microsoft's Three Mile Island restart commitment, Amazon's Talen Energy acquisition, and Google's Kairos Power deal. 

The inference transition doesn't eliminate power constraints as inference facilities still require significant power, but the demand profile is going to be more predictable and better suited to baseload nuclear generation and gas turbines until they come online.

Winners: Constellation Energy (CEG), Vistra (VST), GE Vernova (GEV)
Losers: NextEra Energy (NEP), Plug Power (PLUG)

Companies to Watch

(our selection of 20 companies bound to benefit from the AI inference thesis)

Smaller companies to invest in

To dig further into smaller companies bound to benefit from the inference shift, create your own StockScreener like we did:




Bonus: ETFScreener

To find out more about ETFs available to index the inference shift, make your own ETF Screener like we did:

Our take

The 2023-2025 cycle rewarded a simple bet: buy Nvidia and hold. That trade worked spectacularly, delivering 7x revenue growth and lifting the entire ecosystem. 

But 2026-2028 requires a fundamentally different approach that we detailed in this essay: buying into and anticipating scarcity in every part of the new inference driven value chain, from custom ASICs to optical interconnects.

As the focus shifts from raw performance to cost-efficiency, the "one-size-fits-all" GPU monopoly gives way to custom silicon. Nvidia’s last-minute deal to acqui-hire Groq for $20 billion is an admission that we have entered this new reality.

However, the primary beneficiaries are the enablers: hyperscalers aggressively deploying their own chips, creating a massive tailwind for Broadcom (with its $73B backlog) and Marvell. At current valuations, we see these custom designers winning over Nvidia.

The physical landscape is shifting too. 

Inference requires proximity to the user, moving infrastructure from centralised gigafactories to distributed nodes. 

This favors cooling and networking providers like Lumentum, Coherent, and Vertiv, while agnostic components like HBM (SK Hynix/Samsung) will continue to be essential "picks and shovels" regardless of which chip wins.

Underneath it all is energy: Meta’s recent 6.6 GW nuclear deal signals that power availability is the new hard constraint for growth.

Ultimately, this pivot is about economic viability. By driving the cost of a query down 100x, the industry is unlocking mass adoption. Use cases that were prohibitively expensive in 2024 are now viable, turning AI from a luxury good into a utility just like electricity. That is how productivity gains compound into real GDP growth.

Stay invested, cautiously.

https://www.marketscreener.com/news/ai-slowdown-a-blessing-in-disguise-ce7e5bdad18ff426