Use case class 4: community-overarching data challenges
A large amount of research data of different origin and without a common structure is
produced by experimental and theoretical groups. These data are often useful in further
scientific studies, but in many cases are not available within a FAIR context.
A cross experiment analysis with combined data sets would allow to reduce systematic
uncertainties and has the potential to provide new physical insights.
The main challenge in realizing this use case is then to make such preprocessed
data sets available in a format that will enable a common analysis. PUNCH4NFDI
will provide interfaces and corresponding physics models.
Note: The links at the indidual use cases lead to internal PUNCH4NFDI pages.
Presently, it is often a tedious task to check which experiments are how sensitive to a new theoretical idea. In most cases, there is no direct access to the statistical analyses performed to calculate the sensitivity of various experiments to specific experimental signatures. If anything, a 2-dimensional plot is published in a paper. Therefore, often one can only overlay different sensitivities in different colours on a new plot, but perform no proper statistical analysis. Using the PUNCH-SDP, the underlying dynamic research products (DRP) of these individual analyses could be accessed, and using the statistical tools provided on the platform, a proper statistical combination of the experiments could be done.
In the ideal case, one can run e.g. a combined ATLAS, CMS, XENON1t and PLANCK analysis searching for a e.g. a SUSY model with some detector signature and some dark matter properties, and that on a platform which is supported officially from the experiments and not via pure "likelihood integrators" like GAMBIT (or our earlier smaller projects like Fittino). Combine data with consistent statistical treatment (stat+dominant syst) from different layers of abstraction (selection -> fit -> likelihood in one experiment to be combined with likelihood from another exp)
The spectroscopy of exotic hadrons is an extremely active field with spectacular discoveries in recent years, in particular the first observation of pentaquarks\footnote{Y. Yamaguchi et al., $P_c$ pentaquarks with chiral tensor and quark dynamics, Phys. Rev. D 101, 091502 (2020) [arXiv:1907.04684 [hep-ph]].} with over 1000 citations. These exotic hadrons are typically formed in decay cascades starting with heavy beauty quarks and subsequently decaying through a chain involving ever lighter charm, strange and finally up and down quarks. There are specialized experiments recording precision data for all these different quark flavours. But so far these data are not combined yet. Such combinations would allow to constrain the precise dynamics of the decay cascade which is a major systematic uncertainty for the study of the exotica.
Within the collaborative research center 1245 (SFB 1245) a large amount of research data is produced by experimental and theoretical groups that cover very diverse topics. Examples are cross sections and nuclear structure information from electron scattering experiments at S-DALINAC, precision nuclear potentials and matrix elements from large-scale theoretical calculations, equations-of-state tables for astrophysical simulations and elemental abundance distributions. These data are most likely useful in further scientific studies and applications and have to be made accessible in a standardized way.
Air shower physics needs the results from HEP and HUK and on the other side extends the phase space (forward) and energy range of hadronic interactions. Combining of data is needed.
For the LHC Run 4, often referred to as HL-LHC, the data volume on disk is expected to exceed the Exabyte region after a few years of data taking. The international community is working towards highly federated storage technologies that meet the required scales but become cheaper in operation and less complex in usage. This concept is often referred to as a "Data Lake“. The Data Lake concept also foresees intelligent Data Management and high bandwidth connection to the compute resources. The developments are mainly targeted at needs and requirements set by the large research collaborations. Integration and adaption for the use by a national community or individual scientists are often only weakly covered and need to be addressed by other projects (e.g. PUNCH4NFDI).