Breakpoints detection

Here is described the procedure to detect new breakpoints in real time. It is based on the incremental algoritm described by [guralnik1999]

Incremental algorithm

The detection of new breakpoints relies on the assumption that the data is linear on specific time-intervals. The main steps for the real-time detection are applied with each new data-point and as are follows:

  1. We assume that the data since the last breakpoint (or the beginning of the day) is linear, and we fit it by a straight line.

  2. The residue (distance between each data-point and the fitted line) is computed and the likelihood criterion is computed. This criterion is given by: \(L = \log(rss/m)\) where \(rss\) is the residue and \(m\) is the number of data points on the interval.

  3. We repeat the last step assuming that there is now a breakpoint in the data. To speed up the computation, we further assume that this breakpoint is five data-points ago. With those hypotheses, the data is now fitted by two different straight lines on each new interval (beginning-breakpoint and breakpoint-end).

  4. The likelihood criterion is now computed on each interval and we keep the sum of the likelihoods on each interval.

  5. If the difference between the two likelihoods (in percentage) is above a threshold \(\delta\), then we keep the breakpoint.

Time resolution and \(\delta\)

In the algorithm described above, two parameters come into play. The first one, \(\delta\) is the threshold above which we keep new breakpoints. The second is the time resolution with which we read the data. They are defined by the user through the two attributes delta and time_average (see Station for more information).

There is a balance between keeping the two parameters low to detect more breakpoints and having them high enough to avoid concentrating on noise instead of the VLF variations and the solar flare time-scales. Moreover, higher time resolutions greatly impact the computation time, which we want to limit for real-time applications.

The optimal values for those parameters were defined by testing the algorithm on one year of data (from the 2023/11/01 to 2024/10/31 included) obtained with the AWESOME VLF receiver in Nançay (France) from the NRK station. It is assumed that the optimal parameters will not depend on the receiver, nor on the transmitters.

Based on this data, the best parameters were chosen to be\delta = 0.1 and60s for the time resolution.