Presenting ML-PPA

Presentation of the ML-based Pipeline for Pulsar Analysis at the 17th Bonn Neutron Star Workshop

Under the title The challenge of automated identification of pulsar signals from huge data streams in real time members of PUNCH4NFDI will present ML-PPA at the 17th Bonn Neutron Star Workshop.

Abstract: Data processing in modern radio astronomy is facing unprecedented challenges. With the arrival of the new generation of radio telescopes, SKA-pathfinders like MeerKAT, raw data can no longer be kept for long and sorted solely by human experts. Also, high data rates demand the use of HPC, but current common radio astronomical data reduction packages (like CASA) are of quite limited use for parallel computing. As a pilog project in dealing with these and other problems we have been developing ML-PPA = ML-based Pipeline for Pulsar Analysis. It is an automated classification system that can sort through pulsar-observation data and assign each time fragment a label (like "pulse", "pure noise" different types of RFI etc.). The data flow is represented as a sequence of 2D time-frequency images or "frames", which are analyzed by a neural network (CNN). Since the real data has a very uneven distribution of different types of frames (e.g. only 0.2% of "pulse" types), to efficiently train such systems one needs a way of producing sequences of artificial data with necessary properties. For this purpose, "digital twins" were created to simulate the signal path from the source to a telescope observing a pulsar. A corresponding pipeline was implemented in Python to generate the digital twins. In a next step, the Python code was transferred to C++, which is better suited for HPC. The first version of the framework ML-PPA has been released and successfully tested. Details of the project and its current status will be presented, together with an outlook on further endeavors.