In this notebook, we will image a distant supernova event around the date of maximum magnitude. To fetch data, we will use the astroquery
python package, particularly the CADC module, which queries data provided by the Canadian Astronomical Data Centre.
To launch this notebook interactively, click the Binder button.
Warning: Binder instances have limited computational resources. A notebook will shutdown after 10 minutes of inactivity and has a maximum of 2 GB of RAM. Additionally, each repository has a maximum of 100 simultaneous users.
The Open Supernova Catalog is an open data collection of supernova and their parameters, such as position, redshifts, and light-curve shape. This tutorial will query the catalog for supernova with a redshift between 0.2 and 0.3. After selecting a supernova target, we will query the CADC database for the target, then display some images.
This tutorial will go through some of the basic functionalities of the CADC module in the astroquery package. The module can be installed in two ways:
The CADC module is only available with the 'bleeding edge' master version of the astroquery module, and can be installed using the command:
pip install https://github.com/astropy/astroquery/archive/master.zip
Alternatively, you can clone and install from the source:
# If you have a github account:
git clone git@github.com:astropy/astroquery.git
# If you do not:
git clone https://github.com/astropy/astroquery.git
cd astroquery
python setup.py install
Note that these commands can also be done in a Jupyter notebook by either declaring the code cell a bash cell by pasting %%bash
at the top of the cell, or preceding each line with a !
. More information about astroquery can be found at the astroquery github repository.
For this tutorial, we want to display type Ia supernova and their host galaxy in the CFHT Legacy Survey field. To find a suitable supernova, we can query the Open Supernova Catalog then refine the results to get a list of supernova with claimedtype
of Ia
and redshift
between 0.2
and 0.3
that are part of the Supernova Legacy Survey (SNLS) where the discoverdate
is before the maxdate
so we can see the supernova before maximum light. The Open Supernova Catalog API was used to build the catalog query.
from urllib.parse import urlencode
import pandas as pd
# We want results with claimedtype of Ia in csv format
params = {'claimedtype': 'Ia', 'format': 'csv'}
catalog_domain = 'https://api.sne.space/catalog'
query_str = urlencode(params)
# The 'first' paramter returns the first value in a list
url = '{}?{}&first'.format(catalog_domain, query_str)
# Grab the data from the url and convert to a pandas dataframe
sn_data = pd.read_csv(url)
# Define the min and max redshift range
z_min, z_max = 0.2, 0.3
# Select SNLS events in redshift range where discoverdate is before maxdate
sn_data = sn_data.dropna(subset=['discoverdate', 'maxdate', 'redshift'])
sn_data = sn_data[(sn_data['redshift'] > z_min)
& (sn_data['redshift'] < z_max)]
sn_data = sn_data[sn_data['event'].str.startswith('SNLS')]
sn_data = sn_data[sn_data['discoverdate'] < sn_data['maxdate']]
print('Query url: {}'.format(url))
print('Number of results: {}'.format(len(sn_data)))
sn_data
Now that we have a list of suitable supernovae, we can get the area and date range that we need to query the CADC database for. Let's choose the first result from the sn_data
table and extract it's R.A., Dec., and date of the maximum observation. We will also grab the "supernova" name for plotting later on. Using the R.A. and Dec., we can construct an astropy SkyCoord object, and using the maximum observation date, we can create a date window starting 20 days before and ending 100 days after to query inside of.
from astropy import units as u
from astropy.coordinates import SkyCoord
from astropy.time import Time, TimeDelta
# Select a supernova and gather data
sn = sn_data.iloc[0]
sn_name, sn_ra, sn_dec, sn_maxdate = sn['name'], sn['ra'], sn['dec'], sn[
'maxdate']
# Build the SkyCoord object
coords = SkyCoord(sn_ra, sn_dec, unit=(u.hourangle, u.deg))
# Build the start and end date objects
days_before = 20.0
days_after = 100.0
maxdate = Time(sn_maxdate.replace('/', '-'), format='isot')
start_date = maxdate - TimeDelta(days_before, format='jd')
end_date = maxdate + TimeDelta(days_after, format='jd')
print('Supernova coordinates: {}'.format(coords))
print('Supernova max date: {}'.format(maxdate))
print('Query date range: {} to {}'.format(start_date, end_date))
The astroquery CADC python module has a exec_sync
function which allows you to query the CADC archive database using ADQL. The query syntax requires a string and will execute the query in a batch job, which once finished, will contain the query output results. In this tutorial, we want to query images that:
start_date
and end_date
calculated abover
band filterquality_flag
that is not junkIn addition to these results, we also want an image taken much after the supernova event has occurred in order to give a good view of the host galaxy. So similarly, we will query the same coordinates, r
filter, CFHT collection, etc, but the time the image is taken will be as new as possible.
Once we fetch the results, we will display a subset of the columns (since there are so many!). First, let's start with the query.
from astroquery.cadc import Cadc
# Build the query
query_outline = """SELECT {num} * FROM caom2.Plane AS Plane
JOIN caom2.Observation AS Observation ON Plane.obsID = Observation.obsID
WHERE (INTERSECTS( INTERVAL( {mjd_start}, {mjd_end} ), Plane.time_bounds_samples ) = 1
AND CONTAINS( POINT('ICRS', {ra}, {dec}), Plane.position_bounds ) = 1
AND LOWER(Plane.energy_bandpassName) LIKE '{filter}%'
AND collection = '{collection}'
AND calibrationLevel >= {cal_level}
AND (Plane.quality_flag IS NULL OR Plane.quality_flag != 'junk'))
ORDER BY time_bounds_lower {order}"""
# Select the parameters for the ADQL query
query_params = {
'num': '', # Restricts the number of results (empty string returns all)
'mjd_start': start_date.mjd,
'mjd_end': end_date.mjd,
'ra': coords.ra.degree,
'dec': coords.dec.degree,
'filter': 'r',
'collection': 'CFHT',
'cal_level': 2,
'order': 'ASC' # Order the results from oldest to newest
}
# Instantiate the CADC module
cadc = Cadc()
# Run the query and fetch the results
results = cadc.exec_sync(query_outline.format(**query_params))
# Select a subset of columns to preview
columns_subset = [
'productID', 'collection', 'energy_bandpassName', 'time_bounds_samples',
'time_bounds_lower', 'time_exposure'
]
print('Total number of results: {}'.format(len(results)))
# Showing a sample of the results
results[columns_subset][0:5]
Then we can search for a high signal to noise image after the supernova explosion. We will use the CFHT MEGAPIPE archive which contains deep image stacks processed by the Megapipe pipeline.
from datetime import datetime
# Select the parameters for the ADQL query to get sn host galaxy much after event
# We select only one result where the start date is the end date of the previous query
query_params = {
'num': 'TOP 1', # Select only the first result
'mjd_start': end_date.mjd,
'mjd_end': Time(datetime.now(), format='datetime').mjd,
'ra': coords.ra.degree,
'dec': coords.dec.degree,
'filter': 'r',
'collection': 'CFHTMEGAPIPE',
'cal_level': 2,
'order': 'DESC' # Order the results from newest to oldest
}
# Run the query and fetch the results
comparision_results = cadc.exec_sync(query_outline.format(**query_params),
'sync')
comparision_results[columns_subset]
Now let's access the query results, download the requested images, and visualize a region centered on the Supernovae. We will use the get_image_list
function to get cutout urls of the results. Since that function takes a table, we will stack the results and comparisons table. We also want to calculate the time difference between the time the image was captured and the supernova maximum date and append it to the results table.
from astropy.table import vstack
# Reduce results for easier displaying
interval = len(results) // 12
# Stack supernova and host galaxy results
results_subset = vstack([results[::interval], comparision_results])
# Calculate the time from maxdate for each result
results_subset['time_from_maxdate'] = [
date_taken - maxdate.mjd
for date_taken in results_subset['time_bounds_lower']
]
Using the get_image_list
function, we will fetch the data access urls and then use the astropy.io.fits
module to access the data.
import re
import warnings
from astropy import units as u
from astropy.io import fits
from astropy.wcs import WCS
# Supress fits processing warnings
warnings.simplefilter('ignore')
data_list = []
# Grab cutout urls of the table of results around the supernova position
cutout_urls = cadc.get_image_list(results_subset, coords.icrs, 0.01 * u.deg)
# Access the url and fetch the data
for cutout_url, time_from_maxdate in zip(cutout_urls,
results_subset['time_from_maxdate']):
try:
with fits.open(cutout_url) as hdulist:
wcs = WCS(hdulist[0].header)
image_data = hdulist[0].data
data_list.append({
'image_data': image_data,
'wcs': wcs,
'time_from_maxdate': time_from_maxdate
})
except:
print('Error: Problem with {}'.format(cutout_url))
continue
print("Finished processing images.")
Now that we have the image data and the coordinate data of all our cutouts, we can begin to plot the data! First we will grab the image data for all the results and check if it is valid. As a crude filtering procedure, we will remove the images that have more than 5% blank pixels, as well as images with aspect ratio less than 0.8 and greater than 1.2. Then we will display the list of supernova, as well as the number of days to and from the maximum date.
import numpy as np
def clean_image_list(data_list):
"""Cleans out the incomplete and ill-formatted images and returns the datalist"""
def exists(data):
return data['image_data'] is not None
def percent_nonzero_above_x(data, x=0.95):
image_data = data['image_data']
return np.count_nonzero(image_data) / image_data.size > x
def square_image(data, ratio_bounds=(0.8, 1.2)):
x, y = data['image_data'].shape
return (x / y > ratio_bounds[0] and x / y < ratio_bounds[1])
# Clean the images on number of nonzero pixels and aspect ratio
return list(
filter(
lambda data: exists(data) and percent_nonzero_above_x(data) and
square_image(data), data_list))
cleaned_data_list = clean_image_list(data_list)
print('Number of images pre-clean: {}'.format(len(data_list)))
print('Final number of cleaned images: {}'.format(len(cleaned_data_list)))
In this particular case, no images were removed during cleaning!
%matplotlib inline
import math
import matplotlib.pyplot as plt
from astropy.visualization import ImageNormalize, LinearStretch, ZScaleInterval
np.seterr(divide='ignore', invalid='ignore')
def plot_image_list(data_list, coords, sn_name, geo=None):
""" Plot the supernova image list"""
# Build a layout if none is passed in
if geo is None:
geo = (len(data_list), 1) # ncols, nrows
# Find the brightest and dimest dates
brightest_date = min(
[abs(data['time_from_maxdate']) for data in data_list])
dimmest_date = max([abs(data['time_from_maxdate']) for data in data_list])
# Get the WCS for projection
wcs_trans = data_list[0]['wcs']
# Dynamically create figsize based on layout
fig = plt.figure(figsize=(geo[0] * 4, geo[1] * 4))
fig.suptitle('Images of Supernova {}'.format(sn_name),
fontsize=14,
fontweight='bold',
y=0.92)
for idx, data in enumerate(data_list):
ax = plt.subplot(geo[1],
geo[0],
idx + 1,
projection=wcs_trans,
facecolor='y')
# Draw yellow circle around supernova position
ax.scatter(coords.icrs.ra.degree,
coords.icrs.dec.degree,
transform=ax.get_transform('icrs'),
s=800,
edgecolor='yellow',
facecolor='none')
# Normalize and plot the data
image_data_norm = ImageNormalize(data['image_data'],
interval=ZScaleInterval(),
stretch=LinearStretch())
ax.imshow(data['image_data'],
norm=image_data_norm,
transform=ax.get_transform(data['wcs']),
cmap='gray')
# Print days to date of maximum luminosity
# Use red if closest to maxdate, green for comparison photo, and blue otherwise
color = 'blue'
if data['time_from_maxdate'] == brightest_date:
ax.set_title('Maximum date')
color = 'red'
elif data['time_from_maxdate'] == dimmest_date:
ax.set_title('{:5.1f} days after event'.format(
data['time_from_maxdate']))
color = 'green'
ax.text(20,
20,
'days_from_max = {:5.1f}'.format(data['time_from_maxdate']),
fontsize=12,
color='white',
bbox={
'facecolor': color,
'alpha': 0.5,
'pad': 3
})
# Add grid and remove axis labels
ax.coords.grid(color='white', ls='solid')
ra, dec = ax.coords['ra'], ax.coords['dec']
ra.set_ticklabel_visible(False)
dec.set_ticklabel_visible(False)
ra.set_axislabel('')
dec.set_axislabel('')
ncols = 4
nrows = math.ceil(len(cleaned_data_list) / ncols)
plot_image_list(cleaned_data_list, coords, sn_name, geo=(ncols, nrows))