Service Class 3: Analysis software and data irreversibility services

For the extraction of high-level information, tools can be found in the PUNCH software repository. Among others, this contains a framework for AutoML on scientific data, and a framework for conversion/reading of data for combined analyses on heterogeneous systems. For real- time applications, algorithms optimised for sorting, hardware-specific clustering and pattern recognition, and methods for transforming dynamical archive queries into dynamic filters (and vice versa) will be developed.

3.1 the PUNCH software repository (D-TA6-WP4-2)

Common approaches to data analysis within the PUNCH community will be identified and used as
basis for shared contributions to already existing open-source packages. To open up existing
software development workflows, an example repository will be provided that
demonstrates the use of state-of-the-art open development setups. This includes solutions for
automated testing, continuous integration, as well as deployment using established packaging
managers.

3.2 a framework for AutoML (D-TA3-WP3-1)

This project addresses two specific challenges of overarching importance relevant for
the successful and wide-spread use of machine learning. The first deliverable focuses on automated machine learning (AutoML), and the second deliverable provides methods to extend machine learning to very large datasets. Another challenge related to ML on large
datasets is the application of FAIR principles.

3.3 tools for combined analyses on heterogeneous systems (D-TA3-WP4-1)

A prerequisite for resolving the heterogeneous data format problem is to ensure that the datasets have appropriate metadata describing their format such that appropriate
dataset conversion or reading tools can be automatically selected. Specific examples of conversion/reading methods will be implemented for selected data
formats. These tools will be deployable within heterogeneous computing environments. The
developed framework will be transparent and easy to apply for the user when analysing multiple
datasets automatically, including the required converters/reads in the workflow.

3.4 real-time algorithms for hardware-specific clustering and pattern recognition (D-TA5-WP2-5)

The constraints given by real-time systems require a focus on algorithmic performance, highly
efficient usage of hardware resources, and latency. Among others PUNCH4NFDI will provide algorithms for massively parallel real-time sorting, clustering
and pattern recognition on specialised hardware.

3.5 methods for transforming dynamical archive queries into dynamical filters (D-TA5-WP3-2)

Methods will be developed by which one or more dynamic archives can be jointly queried to return potential triggers as well
as an estimate of how well the query could be answered. Methods for
transforming a dynamic archive query into a dynamic filter will be tested by transforming dynamic
archives into simulated real-time streams and by the use of real data streams.