Enable Recite

Subscribe options

Select your newsletters:

Please enter your email address:

@

Your email address will only be used for the purpose of sending you the ITER Organization publication(s) that you have requested. ITER Organization will not transfer your email address or other personal data to any other party or use it for commercial purposes.

If you change your mind, you can easily unsubscribe by clicking the unsubscribe option at the bottom of an email you've received from ITER Organization.

For more information, see our Privacy policy.

News & Media

Latest ITER Newsline

  • Contract management | E-procurement helps to simplify and streamline

    The Procurement & Contracts Division at the ITER Organization is rolling out a new e-procurement tool that will simplify and streamline contract management [...]

    Read more

  • Cooling water plant | Partners work in lockstep to keep ITER cool

    Much of the cooling water plant is now ready for commissioning, thanks to a well-executed plan and close coordination among partners. 'Sooner or later, all heat [...]

    Read more

  • American Physical Society | Alberto Loarte elected Fellow

    Alberto Loarte, head of the ITER Science Division, has been elected as a Fellow of the American Physical Society (APS). Loarte was nominated by the APS Division [...]

    Read more

  • Fusion events | Bringing power to the people

    In tandem with the annual Fête de la Science, a French exhibition on the sciences, the European research consortium EUROfusion is premiering a new travelling ex [...]

    Read more

  • Fusion world | Stellarators "an option" for future power plants

    In the history of magnetic fusion, the photo is iconic. A smiling, bespectacled middle-aged man stands next to a strange contraption sitting on a makeshift wood [...]

    Read more

Of Interest

See archived entries

How to handle the Petabytes

 (Click to view larger version...)
When it was announced in 1985 that the American "Cray-2" supercomputer had achieved a capacity of one Gigaflop per second, even some scientists had to consult the dictionary. The term Giga is derived from the Greek—meaning giant—and is the abbreviation for one billion. A Gigaflop computer can perform one billion floating-point operations (Flop) per second.

In 1985, this was one thousand fold the capacity achievable with your home computer. Today, every mobile phone contains a Gigaflop processor. And while the "big bang" hunters at CERN are dealing with Petaflops (1015 calculations per second), the new kid on the large science block, the Square Kilometer Array (SKA) which will be built in south Africa and Australia, will require supercomputers that can digest data on the Exa scale. That is a 1 followed by 18 zeros.

The steep increase of computer memory known as Moore's Law is comparable to the performance of magnetic fusion devices ... and to their generation of data. Since the first plasma pulse on JET in 1983, the raw data collected during each discharge has roughly doubled every two years. Today, about 10 Gigabyte of data is collected per each 40 second pulse; the data collected over 70,000 JET pulses amounts to roughly 35 Terabytes.

When ITER starts operation, the data generated will again reach new dimensions. Each plasma discharge—lasting 300 to 3000 seconds—will generate an estimated tens of Gigabytes per second, leading to a total of a few hundred Petabytes per year. And is not only the storage and archiving of the huge amount of data that poses a challenge, but also its accessibility in real-time.

In a recent workshop organized by Lana Abadie, responsible for the scientific archiving system within the CODAC team, the challenge of storing and accessing the flood of scientific data was addressed by experts from many different institutes and backgrounds.

"We need to store this data almost real-time to allow physicists to start their analysis code in order to allow calculations for the next pulses," explains Lana. "This data is what we call raw data, i.e., data coming from the ITER machine unfiltered. The main producers will be the various diagnostics systems. Then we need to store processed and simulated data. Different physics applications will use raw data and process them. This output needs to be stored too—and made accessible."

In other words, raw, processed, and simulated data will be accessed in the same way. But accessing the data in an efficient way is not an easy task. "Imagine you have a pile of 20,000,000 Ipods of 16GB—equivalent to the yearly production of all types of ITER data. Let's say you are looking for a song that was produced last February, but you don't even know the exact title. You remember that it was something like 'I follow' and that it was a remix of an earlier song by the same artist. Of course, you could spend quite a few hours finding the song. The challenge for CODAC is to provide data access within a few seconds. It is very important to understand the different archiving techniques and to stay abreast of upcoming technologies in that area."
 
The CODAC archiving system has to be ready for First Plasma with a well-proven scalability. The data will be stored first in the CODAC server room and will then be streamed to the IT computing centre. CODAC will develop a first prototype within the next two years. The team is currently studying a system based on HDF5, a well-known scientific data format used by many institutions such as NASA. HDF5 allows the storage of all types of data and corresponding metadata.


return to the latest published articles