Tutorial Step 2: What's in a LIGO Data File?

In this tutorial, we will use Python to read a LIGO data file. If you don't have an HDF5 LIGO data file stored on your computer somewhere, go back to Tutorial Step 1. If you do not have Python or some of the needed modules installed on your computer, go back to Tutorial Step 0.

Start the Python Interpreter

Writing a Python script has two parts:

  • The .py file, which is your code, and should be created with a text editor such as emacs or vi. If you installed Enthought Canopy, this should come with its own text editor.
  • The Python interpreter, that executes the code
To get started, try to open the Python interpreter. This will be a little different on different platforms. If you installed Canopy, then click the Canopy icon from wherever you usually run applications (for example, the Windows start menu, or the Mac Applications directory). If you installed using MacPorts, try this command in the terminal window:
Similar commands should work for users of other Unix-like systems. You should see something along these lines:
Python 2.7.3 (default, Nov 17 2012, 19:54:34) 
[GCC 4.2.1 Compatible Apple Clang 4.1 
((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" 
for more information.
You can exit out of Python by typing
>>> exit()
Let's try to write a very simple program. Move into the directory where you placed your LIGO data file. Now you should open two windows at the same time: the Python interpreter window and a text editor window where you can see the contents of a new text file, which we'll call plot_strain.py. Canopy shows both windows by default - just create a new file called plot_strain.py. On a Mac or Unix machine, do something like this:
$ cd ~/ligoData
$ emacs plot_strain.py &
$ /opt/local/bin/python-2.7
Now, in the plot_strain.py file, add the following text:
print "Hello, world!"
and save the file. Then, in the python interpreter window, you can run the entire contents of plot_strain.py by typing:
>>> import plot_strain 
Or, if you are using IPython and/or Canopy, you should be able to use the magic command run instead:
In [1]: run plot_strain
If all goes well, you'll see the text Hello, world! appear in your interpreter window.
From this point forward, we will assume you are comfortable with the basic features of Python. For a general introduction to Python, see the links at the end of Tutorial Step 0.

Data File Overview

Let's try to read in and plot some LIGO data. In plot_strain.py, let's remove the code print "Hello, world!" and replace it with:
# Import needed modules
import numpy as np
import matplotlib.pyplot as plt
import h5py
This will tell Python that we want to use the numerical Python (numpy), HDF5 module (hypy), and plotting module (matplotlib.pyplot). We'll always put this at the top of the scripts we write in this tutorial.

Next, try reading in the LIGO data file that you downloaded in the last step of this tutorial.

# Open the File
fileName = 'H-H1_LOSC_4_V1-815411200-4096.hdf5'
dataFile = h5py.File(fileName, 'r')
dataFile is now an object which can be treated as a Python dictionary, and contains all of the information from the data file. To get an idea of what's in the file, add this code to plot_strain.py:
# Explore the file
for key in dataFile.keys():
    print key 
Trying running this code in the interpreter. You can use the command import plot_strain or reload(plot_strain) if you have already imported a previous version of the file. If you are using IPython, you can also use the command run plot_strain instead. You should see something like this:
>>> reload(plot_strain)
This gives a high level picture of what's contained in a LIGO data file. There are 3 types of information:
  • meta: Meta-data for the file. This is basic information such as the GPS times covered, which instrument, etc.
  • quality: Refers to data quality. The main item here is a 1 Hz time series describing the data quality for each second of data. This is an important topic, and we'll devote a whole step of the tutorial to working with data quality information.
  • strain: Strain data from the interferometer. In some sense, this is "the data", the main measurement performed by LIGO.

For a general introduction to gravitational waves and measurements made by LIGO, we suggest reading articles or watching videos available on ligo.org.

Plot a time series

Let's continue adding to plot_strain.py to make a plot of a few seconds of LIGO data. To store the strain data in a convienient place, add these lines to plot_strain.py
# Read in strain data
strain = dataFile['strain']['Strain'].value
ts = dataFile['strain']['Strain'].attrs['Xspacing']
The code above accesses the 'Strain' data object that lives inside the group 'strain'. The "attribute" 'Xspacing' tells how much time there is between each sample, and we store this as ts. The "value" of the data object contains the strain value at each sample - we store this as strain. To see all the structure of a LOSC data file, take a look at the View a File tutorial.

Now, let's use the meta-data to make a vector that will label the time stamp of each sample. In the same way that we indexed dataFile as a Python dictionary, we can also index dataFile['meta']. To see what meta-data we have to work with, add these lines to plot_strain.py:

# Print out some meta data
print "\n\n"
metaKeys = dataFile['meta'].keys()
meta = dataFile['meta']
for key in metaKeys:
    print key, meta[key].value
In the python interpreter, enter reload(plot_strain) to see the output of this. You should see that the GPS start time and the duration are both stored as meta-data. To calculate how much time passes between samples, we can divide the total duration by the number of samples:
# Create a time vector
gpsStart = meta['GPSstart'].value
duration = meta['Duration'].value
gpsEnd   = gpsStart + duration

time = np.arange(gpsStart, gpsEnd, ts)
The numpy command arange creates an array (think a column of numbers) that starts at gpsStart, ends at gpsEnd (non-inclusive), and has an increment of ts between each element.

Finally, let's use all of this to plot a few seconds worth of data. Since this data is sampled at 4096 Hz, 10,000 samples corresponds to about 2.4 s. Add the following to plot_strain.py

# Plot the time series
numSamples = 10000
plt.plot(time[0:numSamples], strain[0:numSamples])
plt.xlabel('GPS Time (s)')
plt.ylabel('H1 Strain')
The last command, plt.show(), may not be necessary if you are working in the matplotlib interactive mode. You can turn on interactive mode with the command plt.ion().

Now, in the interpreter window, type reload(plot_strain) or run plot_strain and you should see a plot that looks like this:

You may be surprised that the data looks smooth and curvy, rather than jagged and jumpy as you might expect for white noise. That's because white noise has roughly equal power at all frequencies, which LIGO data does not. Rather, LIGO data includes noise that is a strong function of frequency - we often say the noise is "colored" to distinguish it from white noise. The wiggles you see in the plot above are at the low end of the LIGO band (around 20 Hz). In general, LIGO noise is dominated by these low frequencies. To learn more about LIGO noise as a function of frequency, take a look at the plot gallery.

• Click here to download the entire script described in this tutorial.

What's next?