In experiments of high-energy physics and upcoming astronomical observatories, data are
taken at rates much too high to be storable in long-term archives. This use case class
addresses the challenge of extracting in real-time the tiny subset of ”interesting data” out of huge
data streams in an automated way. The tools and methods currently used are not sufficient to
cope with future demands. PUNCH4NFDI will develop methods for coping with the challenges
due to ”data irreversibility”, as decisions on what part of the incoming data streams need
to be rejected have to be taken in real-time and the unavoidable resulting loss of information
is mostly not reversible.
The High Luminosity LHC physics programme is characterized by particle physics precision measurements as probes for new physics at highest energies with highest data rates. Large detector systems are deployed, each consisting of several hundred million channels. A major challenge is the need to reduce the data volume as collected by each experiment by many orders of magnitude, because it is impossible to store a significant amount of the data collected. Events can only be stored and thus made available for the offline data analysis with typical rates of 10 kHz, which is to be compared to event rates of up to 40 MHz provided by the LHC. Moreover, the decision-making is irreversible and must happen on time scales of microseconds up to several seconds.
Scientists often run simulations to study complex systems ab initio, ie. by assuming underlying physics. The aim is to compare these observations with the real world to decide whether the assumptions and/or the physics is correct. Importantly, the comparison assumes that the real-world information is correct and representative. However, when data (rates) are so large that only a fraction of the information can be saved, information is lost. Without knowing this, one will easily draw wrong conclusions from the (often today expensive) simulations. In order to avoid this, one needs to have concepts how simulations are created, evaluated and compared in the circumstances that the 'recorded' real-world is heavily filtered and hence potentially not a good comparison.
From the more than hundred FRBs now registered, only a handful is known to repeat, ie. more than 95% of all FRBs are only known to emit a single short burst of radio emission. On the other hand, a potential cataclysmic origin of FRBs would suggest that any remnant emission (e.g. a fireball) visible at other frequencies may only be short-lived. Triggering the required multi-messenger follow-up means that the data acquisition, data analysis, FRB detection and identification, localisation and triggering sent to other astronomical facilities must be done on a very short timescale, i.e.\ seconds. The whole process needs to be fully automated with extremely low false-alarm rate, as the observing time at the triggered facilities is extremely costly. The result of the observations (i.e. the properties of new discoveries) also implies a repeated re-evaluation of the archives, whereas these results in turn will influence the algorithms to be applied in the short time frame during the observations. This constitutes a new dynamic life cycle of data.
Displaced decays of particles produced in the original (primary) interaction are predicted by many models. In many cases these distinct signatures of displaced vertices are very rare, typically due to very small coupling constants. Such displaced decay modes of relatively long-lived particles are e.g. predicted in decay chains of hadrons or in several models beyond the Standard Model of particle physics. It is interesting to note that also in the context of searches for dark matter at particle detectors, displaced signatures are studied emerging from weak interactions of dark matter particles previously produced. From a reconstruction perspective, these signatures are very challenging to detect, albeit they can have the high potential for future discoveries.
While the existence of Dark Matter is arguably the best explanation for various striking astrophysical observations, any lab-based direct detection has been unsuccessful so far. This suggests that the most successful route would be to approach the problem from two complementary sides: from the cosmic scale studied by astrophysics, and the microcosmic scale studied with particle accelerators. In both cases the answer may be achieved by discovering unusual, anomalous signals in huge data streams. One reason for non-discovery yet could be that dark matter signals are not kept by current selection algorithms in HEP experiments. At the same time, we know today that Fast Radio Bursts (FRBs) - which occurs every 10\,seconds on random positions on the sky - have been rejected in the past in order to cope with high-rate man-made interference signals. Ironically, FRBs turned out to be a promising way to probe the dark matter halo of FRB host galaxies, hence probing the mentioned large cosmical scale. In addition to the development of optimised and model-independent algorithms for reconstruction and a coherent approach for reduced data structures, we will work towards generic anomaly detection based selections and triggers. The latter are intended to move away from the general trigger philosophy to select signal, but will rather reject all known standard model processes. One way, for instance for astrophysics applications, is comparing the properties of seen signals with a library of known signals and to identify dissimilarities as a selection criteria, maintaining the discovery potential for the “unknown”. Common to both the particle physics and astrophysics applications would be to trigger on the remaining anomalous event signatures. While the implementation of actual solutions for real-time anomaly detection will be the focus of the ongoing HEP and Astro experiments, the use-case includes studies of developing strategies for necessary data-reduction schemes at future colliders in the context of dark matter searches and transient searches at radio telescopes. Conceptual questions of anomaly definition, anomaly validation with (simulated) data, and interpretation in the context of dark matter phenomenology will be jointly addressed by HEP and Astro.
All LOFAR stations of the German Long Wavelength (GLOW) Consortium are currently used on a weekly basis to monitor of the order of hundred known radio pulsars, which results in a huge data base that allows to study the variability of those sources at many different timescales (from seconds to years). This is useful for the study of radio pulsars, but also probes the interstellar and interplanetary medium, as well as Earth's ionosphere.
The SKA-MPG telescope is the first prototype for the SKA at its South African site. It will be used for scientific studies and to test how to handle the enormous data and metadata rates of the SKA. A spectro-polarimetric, full-sky survey will be carried out in the S-band. The survey will improve our understanding of the Milky Way and its radio emission, which will help us to reveal gravitational waves in the polarised component of the cosmic microwave background and to uncover the mechanism of cosmological inflation.
Real-time multi-messenger astronomy is an emerging field in astronomy, where data from different observatories are combined to allow a more complete picture of the variable Universe. The first detection of a kilonova through optical follow-up of GW170817 serves as a prime example. Neutron star mergers, the cause of GW170817, promise to form a critical tool for the study of the theory of gravity, understanding the creation of heavy elements as well as mapping the expansion of the Universe. Kilonovae have, however, also been found to be faint and fast evolving. Finding and reacting to them requires an efficient combined use of the real-time output from multiple large observatories.
The exceptional resolution power of SKA results in single images from the cosmos that may be as large as one Petabyte. “Data monster" of that huge size cannot be analysed reasonably fast on traditional computing architectures. The relatively small throughput rate when reading data from disks is a serious bottleneck (memory–wall problem). Memory–based computing offers a change in paradigm: from the current processor–centric architecture to a memory–based architecture. The goal is to make ``data monster" available for dynamic archiving.