Building data centers bigger, faster

(8 pages)

As digital transformation accelerates, data centers are becoming the backbone of the digital economy, supporting everything from AI workloads to real-time analytics and autonomous vehicles. McKinsey analysis finds that globally, capital expenditures on data center infrastructure (excluding IT hardware) are expected to exceed $1.7 trillion by 2030, largely because of the expansion of AI, the proliferation of edge computing, and advancements in high-performance computing (HPC). This increasing demand is changing the landscape for energy, real estate, and construction.

To meet the growing demand for data centers—and the power capacity needed—over the next five years, data center campuses will have to expand from providing tens of megawatts (MW) of power to hundreds, even expanding to accommodate a gigawatt (GW) scale (a scale of one gigawatt or more). Recent innovations, including the emergence of distilled and distributed training models for AI, could affect the build-out of data centers and intensify existing industry challenges with data center scale-ups. These new models and increasing compute demand require a reevaluation of the design and construction methods used to build data centers so that stakeholders can capitalize on economies of scale.

Capturing the scale-up opportunity will require data center players across the value chain to adopt new approaches and technologies while learning from other industries that have experienced similar breakthrough moments. This article details how data center stakeholders across the industry can keep up with the digital transformation by adopting innovative designs that will enable data centers to become bigger and faster.

New approaches across the value chain are required in order to meet data center demand

According to McKinsey analysis, the power demand for data centers is expected to reach 1,400 terawatt-hours by 2030, equivalent to 4 percent of total global power demand (Exhibit 1).

Demand for power for data centers is expected to rise significantly in the United States.

In the short-term, AI training models are the primary driver of this increase in size and scale. The data centers of the future, however, are likely to be hybrid facilities that host a mix of training, inferencing, and cloud workloads, and their scale and size will surpass those of facilities that were considered large even two years ago. To deliver the required data center infrastructure, the United States alone will need to more than triple its annual power capacity over the next five years—from 25 GW of demand in 2024 to more than 80 GW in 2030. Worldwide demand is expected to be about 220 GW by 2030 (Exhibit 2).¹

Both AI and non-AI workloads will be key drivers of global data center capacity demand growth through 2030.

As AI products proliferate and inference delivery models change, data center stakeholders will need to strike the right balance between edge and cloud computing capacity. This shift will drive fundamental changes in design and architecture, setting new standards for the industry—not to mention spurring additional advances in power density at the rack level with new chip and cooling technologies.

The savings potential is great

Today, depending on the type, design, size, location, and execution performance of a data center, the length of time between a request for services and the start of construction can range anywhere from 12 to 36 months. Reaching the full potential of data center construction could shave 10 to 20 percent off that time and build new facilities faster. It could also enable more efficient capital spending, with potential savings of 10 to 20 percent per data center on average, which could reduce the $1.7 trillion in spending expected through 2030 by up to $250 billion.

Despite recent advancements in distilled and distributed training models, the average scale of data centers is set to increase. Facilities that averaged tens of megawatts before 2020 will be expected to accommodate at the gigawatt scale. To handle the immense computational power required by modern AI and HPC applications, these massive large-scale data center campuses are designed on plots of land exceeding four million square feet.²

The shift toward large-scale data centers is about not only size but also efficiency. Efficiency is crucial because it allows operators to maximize computational output while minimizing energy consumption and, thus, environmental impact. Modern data centers are targeting a power utilization efficiency (PUE)³ as low as 1.1, compared with current industry averages of 1.5 to 1.7.

An overhaul of data center design and construction could help capture economies of scale

Despite the increase in data center size, the approach to designing and building campuses has remained relatively consistent. Most developers set multiple repeatable data centers on one site with independent redundancy systems for each building.

While the current approach enables standardization and repeatability, it does not fully lead to the economies of scale that data center developers could achieve. For instance, the diesel generators used for large-scale data centers are commonly 3.5 to 4.0 MW per unit, so building a 30.0- to 60.0-MW data center would require eight to 15 generators before accounting for redundancy. Scaling this approach to a one-GW data center would require up to 290 generators, and all of them would need to be maintained and prepared for a power loss situation, demonstrating the inefficiencies of the existing data center design methodology.

To deliver on the ambitions of gen AI, the way data centers are deployed will have to change. Simply scaling up current methods of data center design and construction would create inefficiencies, further inflating already large expenditures. According to McKinsey analysis, hyperscalers alone expect to spend $300 billion in capital expenditures over the course of 2025.

Scaling data centers requires overcoming several challenges

As stakeholders pursue pathways to increase the scale and efficiency of data center production, four challenges must be addressed.

Securing adequate power is one of the most significant hurdles for data center production. Traditional power grids often lack the capacity to support large-scale facilities without extensive upgrades, and long delays in interconnection to the grid have led developers to source alternative solutions, including behind-the-meter natural-gas turbines, nuclear plants, or nascent technology such as small modular reactors (SMRs). These solutions, while promising, come with their own set of challenges, including regulatory hurdles and the need for substantial capital investment. Moreover, developing SMRs will take time and won’t be an immediate solution for power shortages before 2030.

As data centers grow, the need for efficient and scalable cooling systems becomes more pressing. Advanced cooling technologies, such as immersion cooling and liquid cooling, are being implemented inside data centers to manage the heat that high-density computing environments generate. These technologies are essential for ensuring the efficient operation and longevity of equipment in large-scale data centers. However, advances in cooling technologies are developing and changing quickly, making it more complicated for developers to decide how and when to incorporate these long-lasting cooling technologies into data center projects.

Building large-scale data centers requires significant amounts of skilled labor. The data center industry faces potential labor shortages, especially during the build phase of projects, which have the highest labor requirements. Additionally, high turnover can result in safety, quality, and project delivery issues, affecting construction timelines.

As with other large-scale construction projects, labor costs have contributed to the inflation of construction delivery costs on top of the cost of raw materials and technology. Labor costs are expected to continue escalating because of the skilled labor shortage and the geographic concentration of new data center builds.

For large-scale data centers, the labor shortage is further exacerbated: Thousands of workers are required to be on-site during peak construction, straining already stretched skilled-trade resources. Large-scale data centers will also require significant new power-generation capacity, placing additional demand on data center labor pools.

Supply chains have improved since the height of the pandemic but have not scaled to sufficiently meet the projected demand increase. Lead times for critical long-lead equipment, such as generators, switchgears, and transformers, have recovered from pandemic highs. However, incumbent and new data center developers are seeing increasing challenges in sourcing additional equipment, raw materials, and labor to build large-scale data centers.

New tariffs, export controls, and potential reciprocal measures are causing volatility in trade, especially for critical components, which has had a negative impact on supply chains. These measures also affect the cost of building data centers. On average, data center players in the United States could see cost increases of 5 to 10 percent on mission-critical equipment because of newly introduced tariffs (the exact effects of which could vary across supply chains).

Six focus areas across the value chain can redefine data center design and construction

As stakeholders pursue plans to scale their data center builds, six actions can help them reimagine traditional data center design and technology for the long term.

Develop scalable reference designs with sequential phasing

Most data center developers have reference designs for new campuses. But to fully capture economies of scale, they could create scalable horizontal and vertical reference designs that consider regional conditions. These designs should optimize layout, structure, and sequencing at the project level; serve as the source of truth for the design methodology for each developer; and enable modularization and off-site module assembly.

The reference design should be coupled with a phasing strategy for data hall expansion in which mechanical, electrical, and plumbing (MEP) systems are consolidated into auxiliary units to use space efficiently and introduce flexibility for technology changes. Coupling these strategies also helps players efficiently balance standardization with regional customization. The modular design with a phased hall expansion would also support future technology upgrades and phased upgrades as compute demands and technology continue to evolve.

Developers should aim for their designs to be 60 to 80 percent standardized and 20 to 40 percent customized for site-specific elements. Doing so would allow designs to accept fully standardized specifications for critical long-lead equipment, consolidate procurement strategies, and reduce supply chain risk.

This approach has proved effective for data center developers that have used build-to-suit or hyperscale models. The same principles can be applied to placing retail outlets inside data centers, where the scalability of reference design is critical to enable standardization.

Integrate an end-to-end delivery model and find opportunities to accelerate schedules

To accelerate time to market, developers must fully integrate an end-to-end delivery model from initial, long-range planning to construction that links site selection, design, and construction with clear accountability across the project life cycle, including checkpoints with clear approval criteria at critical milestones (such as the purchase of real estate). Initial planning and site selection should be directly linked to project design feasibility and constructability. Objectives should be clear across stakeholders and paired with quantitative assessments to evaluate the objectives’ net present value. These assessments would help developers prioritize opportunities and improve the speed of delivery.

As data center projects get bigger, a generative scheduling tool can be added to the end-to-end delivery model. The tool uses a 4D model of a single data center project to run thousands of iterations of the project schedule and resource plan. It then identifies the optimal sequencing and project resourcing, such as workforce and construction equipment, that can be scaled across a portfolio of data center projects through common construction recipes.

McKinsey analysis finds that across industries, this approach can improve the delivery schedule by as much as 20 percent compared with a project team independently defining the path for the construction team to execute. An end-to-end delivery approach, coupled with standardized reference designs and design technology tools such as generative scheduling, reduces the learning curve of building a new data center, accelerates build schedules, and saves costs over time. Furthermore, generative schedules can be built and iterated on in a matter of weeks, which allows teams to make improvements quickly.

Rethink power, mechanical, and electrical at scale, including a review of overall system redundancy

Large-scale data centers, which primarily serve AI training workloads, rely less on performance factors such as low latency and network redundancy. These factors become crucial for optimal performance only when the model is put into operation during the inferencing workload. To serve 24/7 workloads, data centers have stringent redundancy requirements to minimize downtime. Tier-four data centers—the highest tier—are designed to have 99.999 percent reliability. To achieve this level of uptime, each critical system in the data center requires architecture that enables redundancy in case of system loss.⁴

When levels of redundancy for each system are stacked on top of each other, however, overdesign can occur. At giga scale, additional layers of redundancy result in additional costs and complexity for data center construction, commissioning, and operation. Therefore, data center developers building at scale should develop an overall campus redundancy design that integrates redundancy levels between critical systems and across facilities. At this level, developers can make risk-based decisions to optimize equipment layout and specifications to achieve high reliability.

Move beyond supplier-specific modular and prefabricated solutions to integrate at-scale modules

McKinsey surveys of data center stakeholders suggest that prefabricated (prefab) and modular solutions make up an average of 40 to 60 percent of a data center’s individual parts, with some industry leaders’ designs using as much as 80 to 85 percent of these solutions. Prefab and modular solutions present an opportunity for data center developers to not only accelerate time to market but also move skilled labor off-site, reducing the risk of safety incidents and improving quality.

The use of these solutions is a relatively standard practice in construction but today’s prefab and modular solutions are often used as inputs to a stick-build strategy. For example, many solutions include precast structural elements, skid-mounted MEP, and modularized equipment packages from OEMs, which offer only prebuilt parts rather than large-scale structures.

As data center projects scale, developers can look to other industries for modularization strategies that build large-scale modules in fabrication yards and ship them to sites in an integrated processing or equipment package such as preassembled process units for refineries or offshore platforms in oil and gas. This approach could be especially useful when building the entire MEP equipment package or the front-of-house package.

Collaborative contracting can raise the value of data center construction by fostering a cooperative environment.

Integrate scalable technologies into design and make big bets on cooling technologies for the future

AI servers consume so much energy that they get hot—so hot that air-based cooling systems, which circulate cold air around the servers, often can’t keep up. As a result, more developers have shifted to an approach that removes heat directly from racks by using liquid, which is significantly more effective in absorbing and transferring heat than air.

Today, three primary rack-based cooling technologies are available. They differ in both their application and the extent to which they depart from conventional data center cooling systems.⁵ Developers must decide which technology to integrate into design, making a long-term bet on the utility of the technology for future workloads.

Rear-door heat exchangers (RDHXs). RDHXs combine forced cold air with liquid-cooled heat exchangers. They are typically used in space-constrained data centers with a rack density between 40 and 60 kilowatts (kW). RDHXs are the most similar to conventional technology.

Direct-to-chip (DTC) technology. DTC moves a liquid mixture through a cold plate that is in direct contact with the most power-dense electronic components. This technology is currently the most popular; it can handle power densities of 60 kW to 120 kW and be retrofitted inside existing data center infrastructure.

Liquid immersion cooling. With this option, servers are placed in a tank filled with dielectric fluid. This cooling method has two variations, both of which can cool racks with a power density of 100 kW, but the dual-phase immersion variation can be used to cool racks with power densities of 150 kW. This option is the least ubiquitous of the three, although it is used for some crypto-mining applications.

Selecting a cooling technology is a critical decision for data center developers, influencing the design, layout, and cost of data center construction as well as affecting the operating cost and workloads over the lifespan of the data center.

Strategically enter collaborative contracting models with contractors and suppliers to capture scale and speed

Collaborative contracting can raise the value of data center construction by fostering a cooperative environment that aligns the interests of the stakeholders involved. By integrating contractors early in the project life cycle, owners can leverage their expertise in site selection, design constructability reviews, and long-lead procurement support.

This early involvement helps leaders identify potential risks and opportunities, leading to better-informed decisions and efficient project execution.⁶ For example, a North American developer of utility-scale renewable energy formed an alliance with preferred contractors. By sharing ideas for cost reduction and design improvements, the team saved 3 to 5 percent in capital expenditures per project over two years.

Moreover, collaborative contracting encourages sharing risks and rewards, which can drive better performance and innovation. By designing win–win incentive schemes linked to operational milestones, such as the first systems energization, both owners and contractors are motivated to achieve superior outcomes. This approach not only improves project delivery times and costs but also enhances safety, quality, and other performance metrics. According to a recent McKinsey survey of industry professionals, collaborative contracting also addresses two of the top three pain points for data center construction: the challenging supply and contractor environment, and the build-out speed of auxiliary infrastructure.

A lean-manufacturing approach positions developers for scale-up

Delivering data centers at scale will not be easy. Projects run the risk of exceeding target schedules, and costs may be amplified by increased pressure to meet demand. Players that link innovative construction approaches to holistic performance metrics will be best set up to deliver data centers faster and at lower cost than competitors.

To scale up, data center stakeholders need to focus on controlling costs and improving performance and then reduce costs further through a lean-manufacturing approach across a portfolio of projects. Developers can apply lean-manufacturing principles across three systems: technical, management, and people.

Technical system. Fit-for-purpose technical tools and systems can optimize fact-based decision-making. For example, these systems enable owners to simulate the construction phase and use digital twins to support ramp-up and operations.

Management system. Having the right setup and performance management offices at both internal levels (the owner organization) and external levels (suppliers and engineering, procurement, and construction [EPC] companies) is important to effectively and efficiently deploy resources while establishing accountability.

People system. Often, implementing a successful people system requires a culture shift: Companies need to embrace innovative change and break down silos. Creating innovative roles at the central and project levels (such as a blueprinting team) and building internal EPC capabilities (such as capital expenditure controllers) can help create the right mindset for tackling data center development at scale.

The proliferation of large-scale data centers represents a significant leap forward in digital infrastructure. While the challenges are substantial—including a rapidly changing technology landscape—data center developers have opportunities across the project life cycle and supply chains to scale up, improve project delivery, and meet the growing demands of the digital economy.

Scaling bigger, faster, cheaper data centers with smarter designs

About the authors