West Cambridge Data Centre Upgrade and Planned Disruption June 2025 - January 2026¶
Important
- This page describes the timeline and milestones of the West Cambridge Data Centre upgrade project, and the expected impacts on services run from the Research Computing Data Hall (CSD3, Dawn, SRCP, RFS, RDS, RCS, Arcus/IRIS/SKA/Gaia).
- Please notice the multi-day full maintenance planned to commence on July 8th 2025.
Last updated: Fri Jun 27 10:57:19 BST 2025
Overview¶
The project to upgrade the power and cooling systems in the West Cambridge Data Centre (WCDC) is now in its implmentation phase and is expected to require several disruptions to services run from it, between May 2025 and its completion in late 2025 or early 2026. The purpose of the project is to provide a sustainable increase to electrical and cooling capacity and so allow the expansion of services. There have already been some unexpected service disruptions in the course of the execution of the project, and more are possible as it progresses. Current information about the expected impacts to service and the status of the upgrade will be made available on this page.
Current Status¶
- [23/05/2025] The research computing data hall (DH1) has been limited to a maximum power consumption of 930KW since 2nd May. This has led to a reduction of compute capacity on the CSD3 and Dawn systems, but none elsewhere.
- [17/06/2025] The onset of warm weather has further reduced the capacity of the temporary cooling system and we are currently limited to 900KW. We are monitoring the situation.
Key Dates¶
July 8-10th 2025¶
All services unavailable to allow disruptive changes to the temporary cooling plant, pipework and cabling.
Confirmation of this work was delayed due to the possible impact on examinations and admissions activities. The power sequencing work will now be moved to August or September (TBC after agreement has been reached between the University and CUPA). We will still use this downtime to perform network and cooling improvements which should increase our computational capacity.
- The outcome of this maintenance will be additional cooling capacity and internal network, plus progress towards replacement of old pipework.
- Services (CSD3, Dawn, SRCP, Arcus/IRIS/SKA/Gaia, RDS, RCS, RFS) will be unavailable from the morning of 8th July, and restored on 10th July.
- Some services may be restored earlier, but all services should be considered at-risk and subject to access interruptions until the end of the maintenance period.
- Disruptive core network replacement work and pipe moves will take place in this period.
- DH1 capacity increases to 1.3MW.
July/August 2025¶
Pipework replacement impacting liquid cooled systems (icelake, sapphire rapids, dawn), row by row. This will allow connection of these systems to the new cooling system, and gradually free them from the current power constraints. Note that these liquid cooled systems include SRCP, Arcus/IRIS/SKA/Gaia VMs and UKAEA Sapphire Rapids HBM. We expect the impact to be as follows:
- a further outage of up to one day affecting VMs and Icelake login nodes, and two similar outages affecting Sapphire Rapids HBM nodes.
- other services will manage around this by changing which nodes are available.
September 2025¶
- Disruption to rear door cooling, row by row.
- We expect to be able to manage around this with minimal service impact.
- This work will increase the resilience of the cooling, removing some single points of failure as well as increasing the cooling capacity ready for when the power capacity is increased.
- Rescheduling of delayed power sequencing work, to prepare for power upgrade to 1.8MW in the new year.
- This will affect the entire data centre and another full shutdown over multiple days will be required, date TBC.
January/February 2026¶
- Repeat of power switching exercise to enable installation of the new distribution board.
- We should still have cooling during this work, but the success or otherwise of the sequencing work is likely to determine the risk appetite for keeping services online.
- Migration of DH1 Rows C-F [1] to new power infrastructure including new UPS, Generators and Transformer.
- This will disrupt high power systems which are not resilient, which is likely to be manageable by changing which nodes are available.
- DH1 capacity increases to 1.8MW.
February 2026¶
- Full commissioning.
- Disruption and risk to be determined.
Questions¶
If you have any questions about these developments or have issues before, during or after these periods, please contact us at support@hpc.cam.ac.uk.
Change Log¶
- [27/06/2025] Update re July 8-10 and subsequent timeline.
- [24/06/2025] Update post June 24th events.
- [23/06/2025] Updated dates and details for work on July 8-10th.
- [17/06/2025] Warm weather update, transformer repair for 24th June added and July full maintenance update.
- [10/06/2025] Version string and change log added.
- [23/05/2025] Page created.
[1] | This refers to the racks in rows C-F in data hall 1. These contain elements of CSD3, SRCP, Arcus and storage, so parts of these services may be affected during these phases of the work. |