ITER demonstrates fast data transfer to Japan and the US
The ability to move experimental data instantly and securely will determine how thousands of researchers participate in near-real time when the machine comes online in the 2030s.
While Member scientists around the world will follow ITER experiments from their home countries, they will not be able to operate plant systems remotely. They will, however, be able to analyze data within seconds of an experiment and provide feedback to operators.âItâs a kind of indirect participation,â explains Denis Stepanov, Computing Coordinating Engineer in ITERâs Control Program. âWe quickly extract scientific data from the plant network and make it widely available so researchers can run calculations and feed results back during operations.âBuilding the global data backboneAt the heart of this arrangement is the onsite Scientific Data and Computing Centre and its backup data centre in Marseille about 50 km away. This data centre has a dual purpose: it holds a redundant copy of all data generated by ITER and will serve as the distribution point to partners worldwide.âBy locating our backup and distribution hub in Marseille, we can protect the master data stored at ITER while providing high-speed, secure access for our international partners,â says Peter Kroul, Computing Center Officer at ITER.The Cadarache site is connected to the Marseille centre via a redundant pair of dedicated 400 Gbps lines. In turn, the centre is connected, via the French network RENATER, to the pan-European GÃANT, which provides access to other research and education networksâincluding ESnet (USA) and SINET (Japan). This overall structure ensures that, even during intensive experimental campaigns, data can move at full speed while the primary plant network remains isolated and protected.
A summary of the August-September data transfers across a single 100 Gbps link. While long transfers were artificially capped to allow cooperative use of networks, the September peak illustrates full 100 Gbps consumption using the MMCFTP tool. (Image courtesy of GÃANT.)
To move terabytes of data efficiently across 10,000 kilometres of fibre optics, the team needs software and hardware that can handle diverse systems without running the risk of vendor lock-in. âWe cannot dictate what technologies our partners use on their side,â says Kroul. âSo we built something flexibleâable to connect to whatever they have, while still achieving high parallelization and efficiency even on high-latency links.âThe result is ITER.sync, a high-performance, open-source-based data-replication framework developed at the Scientific Data and Computing Centre. Drawing on the principles of rsync but heavily optimized, ITER.sync automatically parallelizes data streams, tunes network parameters, and maintains near-saturation speeds even over long-distance connections where latency is high.ITER.sync was designed to operate in environments with tools used by some of the partnersâfor example, the Massively Multi-Connections File Transfer Protocol (MMCFTP), which was developed by Japanâs National Institute of Informatics (NII).
Gbps: A measure of network transmission speed, representing one billion bits of data transferred per second. Because one byte equals eight bits, 100 Gbps corresponds to roughly 12.5 GB/s of data throughput under ideal conditions. High-latency: High-latency describes network links where data packets experience long delaysâtypically due to the physical distance between systems or the number of hops through intermediate routers. Transferring data between France and Japan, for instance, involves a 10,000-kilometre journey across multiple submarine cables, introducing latency of more than 200 milliseconds. Such delays can degrade throughput unless the transfer software compensates with optimized buffering and parallel streams. ITER.sync is specifically engineered to maintain high performance even on these long, high-latency paths.Massively Multi-Connections File Transfer Protocol (MMCFTP): A high-performance data transfer protocol engineered to maximize throughput by opening and coordinating a very large number of simultaneous network connections. Unlike traditional file-transfer tools, which typically rely on a handful of parallel streams, MMCFTP can orchestrate hundreds or even thousands of concurrent flows. Parallelization: The process of dividing a large data transfer into multiple simultaneous streams to increase overall throughput. Instead of sending one massive file sequentially, ITER.sync splits and distributes the data across several network threads or channels, each handled independently. Rsync: An open-source software utility designed to synchronize files and directories between two systems efficiently by sending only the data blocks that have changed. Itâs widely used for backups and mirroring because it minimizes network load while ensuring data integrity. ITER.sync builds on rsyncâs core principles but adds advanced tuning and multi-streaming capabilities for long-distance, high-speed scientific data transfers.Saturation speeds: The point at which a network link is fully utilized, with data flowing as fast as the physical and protocol limits allow.
Global data network put to the testThis summer, ITER engineers carried out two large-scale data-transfer campaigns: one with Japanâs Remote Experimentation Centre (REC) in Rokkasho and the other with the DIII-D National Fusion Facility in San Diego (United States). For the purpose of the tests, ITER simulated the projected data acquisition scenarios. The campaign in Japan, conducted from mid-August to early September, built on a 2016 demonstration that reached 10 Gbps, which was what was available at the time. The new tests achieved two simultaneous 100 Gbps linksâa twenty-fold increase. Engineers demonstrated continuous throughput, multi-path transfers, and resilience by simulating a submarine-cable outage between Marseille and Rokkasho. Both ITER.sync and MMCFTP were used in the tests, providing valuable insight into data transfer strategies and specific tuning for long-distance transfers.It is expected that only a fraction of the data will be needed in near-real time by remote experimentalists. This data will be transferred as soon as it reaches primary storage. The bulk of the data, howeverâwhich needs to be available for off-line analysisâwill be transferred via quiet overnight syncs. This second scenario was also tested.âThe key was to test not just network speed but the whole chainâhardware, software, and reliability,â says Stepanov. âBuilding the technical link is one challenge, but coordinating with all the network providers across Europe and Asia is just as complex. It takes time, alignment, and trust.â
Network throughput graph showing the dedicated 100 Gbps path for the DIII-D data challenge, the data transfer performance for one individual 176 TB test, and the ESnet portion of the end-to-end path between ITER and DIII-D.
In parallel, the ITER computing centre completed its full-scale data challenge with ESnet and the DIII-D fusion facility at General Atomics in San Diego (United States), supported by a trans-Atlantic link operated at 100 Gbps. Over ten full-scale runs, the teams achieved consistent end-to-end performance close to the linkâs theoretical maximum. The test also demonstrated interoperability between ITERâs IBM Spectrum Scale storage and DIII-Dâs BeeGFS-based Science DMZ infrastructure, again confirming ITER.syncâs ability to bridge heterogeneous environments.âThese results show that ITERâs international data ecosystem will scale and be ready for the operations we will face in the 2030s,â says Kroul. âWe can already, with current technology, ensure that scientific data moves efficiently and reliably between ITER and partner institutions worldwide.âSee press releases from the AER and GÃANT networks.