I remember as a child religiously reading the Argos catalogue; probably sometimes looking for Christmas presents, but often just looking at how many things you could possibly buy from one shop. As I got older I started to wonder how on earth they managed to put such a large catalogue together. Five years after the first detection of a gravitational wave signal, I have a little insight into just how hard the latter process is, and a little more appreciation for how much the Universe has to offer.

In the last couple of weeks the second catalogue of gravitational wave detections has been published. To say that the preparation of this work was a major undertaking would be an understatement (nothing is straight-forward when you’ve got to coordinate cutting-edge instrumentation and analysis on this scale). To put into some perspective how much things have advanced in the last five years, our newest catalogue contains 39 candidate events, all taken from the first half of the third observing run (which we call O3a) of the LIGO and Virgo detectors. It took around 20 months to complete the analysis, and see the paper through the publication process. For GW150914 it took us around five months to do the same for just one event.

I’ll write here a bit about what’s contained within the paper, but I also want to talk a bit about some of the work behind-the-scenes, and how we scaled-up our processes between GW150914 and GWTC-2. This is also the first major announcement paper I’ve had a hand in (I did some of the analysis work), so I’m rather excited about that.

Before I go on, it’s worth noting the efforts of everyone in the LIGO and Virgo collaborations on putting this paper together during the Covid-19 pandemic; there has never been a stranger time to put together something like this.

What’s new?

O3, our third observing run, brought in a new age. For the first time we would be announcing detections in something approaching real-time to the general public. So the first thing which keen-eyed observers may spot is that of the 39 events we report, 26 had previously been announced via GCN circulars (and also via Twitter). This is new territory for us.

Another very clear new feature is the naming scheme for our events. The high frequency with which we started to detect GW signals in O3 meant that it was almost inevitable that we’d find two on the same day. Our previous naming scheme, which used the year, month, and day to name the events, needed to be revised, and we now include the time, down to the second, as well. That means you can expect to start seeing events with rather lengthy names like GW190630_185205 from now on (though we’ll probably continue to use the shorter “nicknames” where they’re unambiguous for some events).

This catalogue continues to push the upper-bound for black hole mass upwards, with GW190521 now likely to be the heaviest binary we’ve identified, and its heavier component is likely the most massive stellar-mass black hole ever detected, coming-in at around 95-times the mass of the Sun. It has competition from three other events in the catalogue, however.

One event from GWTC-2 looks to be clearly “lop-sided”. GW190412’s two component black holes look to have very different masses, with one around four times larger than the other, where other mergers seem to be between black holes of broadly comparable size.

Looking to the smaller side of things there are some stand-out events. One of these is GW190425, which is probably a binary neutron star; both of the measured components’ masses are consistent with being neutron stars, but the total measured mass of the system is a bit higher than expected.

A second unusual event at this end of the scale is GW190814, where one of the objects may or may not be a neutron star. The two components of this system are even more imbalanced than GW190412. If the secondary was a binary neutron star this could be the first neutron star / black hole coalescence we’d observed. It has potential competition for that honour from GW190426_152155 however, which is the weakest source in the catalogue, but which has two low mass components, which would likely be a neutron star and a black hole if the signal is real.

How did we get there?

There are many stages to constructing a catalogue like this. The paper itself outlines these, but basically these stages are:

Signal identification
Data preparation
Preliminary analysis
Production analysis

In the first stage, all of the raw data from the detector is analysed to search for “triggers”, which are potential signals. While this stage will identify gravitational wave signals, they’ll be mixed-in with various other things which look similar to gravitational wave signals, and will require further analysis and sifting.

The data from each of the triggers then needs to be checked for noise: effects in the data which are not directly from the gravitational wave. Some of these noise sources can be disruptive, and we need to run an additional step to remove artefacts from the data which are called “glitches” before the analysis can be done.

Next we conduct parameter estimation on the signal. This is a type of analysis which compares the signal to known physical simulations in order to determine the properties of the astrophysical system which produced the signal. It’s slow and requires a lot of computing power, so we normally do a rough version first, where we set things up to get fairly good answers, as quickly as possible. We then use this to decide how to set-up the final analyses. Sometimes at this stage we spot more problems with the data, and there can be a bit of back-and-forth between stages two and three to get everything ready.

The final stage is in many respects the most complicated. For each potential event we perform three different analysis tasks, and each of them can take several days or even weeks. Fortunately we’re able to run most of them in parallel, but this brings its own organisational challenges. The first analysis examines the data around the signal, and tries to measure the amount of noise in the data. This step is important for correctly measuring the signal, and determining the certainty with which we can make the measurement. Next we analyse the same data using two different gravitational wave models.

It’s difficult to work out precisely what the gravitational wave for a given astrophysical system will look like, so we make models which are easier to work with, and which we can use for the analysis. Unfortunately not all of these models agree for all systems, because both use different approximations to the underlying physics. To try and mitigate these differences we perform the same analysis with both models; hence the need for two analyses for each potential event.

This means that for GWTC-2 we had a lot of analyses to look after at various different times. For the preliminary analyses we relied on considerable amounts of person-power, and had lots of collaborators setting up exploratory analyses. When it came to the production analyses, however, it was important that everything was consistent, and because these analyses took much longer to run, it was also important that they were correct. This is where I come in, and as a result it’s where some of the weirder things about this paper come in too.

Rage against the machine

Setting-up around 80 analyses, and then monitoring their progress was going to be a monumental logistical undertaking, but fortunately it’s one which is fairly repetitive. Unfortunately, the information required to put together each analysis wasn’t in a predictable place, so automating the whole process was not a completely straightforward process. We needed a way to make sure that all of the analyses progress could be seen at a glance as well.

What we ended up with was my rather Frankenstein’s monster piece of software, version 0.1 of asimov. Asimov was able to make sure that all of the configuration settings for the various analyses had been set-up properly, and then submit those analyses to our computing cluster using HTCondor. The details of each of the analyses was then written to issues in a Gitlab issue tracker, and an appropriate tag was added to indicate the status of the analysis.

Asimov then monitored each analysis, and waited for the first noise analysis step to complete before submitting the full analysis jobs, which depended on its results. Throughout the month or so that the analyses were ongoing we used asimov to monitor the health of the analysis in the cluster, and pass state information back to the issue tracker where it could be easily checked by human beings. Asimov also handled the final post-processing of results from the analyses.

The preparation of the GWTC-2 catalogue was really our first foray into the issues of managing an analysis of this kind at scale. We learned a lot from it, which has been folded in to version 0.3 of asimov, and is now being used for the O3b analyses (which I’ll hopefully be able to talk about quite soon). Things have become considerably more complicated in the intervening few months!