Total points: 30
Due: Friday October 14 7pm CEST
Format: IPython Notebook or python program
Author: Thomas Robitaille
Solved by: Marta Reina-Campos
The purpose of this problem set is to become familiar with working with image data, plotting it, and combining it in various ways for analysis.
The data used in this problem set was collected by the AMSR-E instrument Aqua satellite. The data consists of maps of the concentration of ice in the Arctic collected between 2006 and 2011. The data were downloaded and extracted from here and converted into a format that can be more easily used.
The data you should use is in here (located at MPIA, use this link if you need a zip archive) and can also be accessed in /local/py4sci/ice_data
if you are using the CIP computers (no need to copy over that directory, just read the files from there). To use it, copy the tgz file into your day3 directory and type:
tar -xvzf ice.tgz
The data is in 'Numpy' format, which means that you can read it as a Numpy array using:
>>> import numpy as np
>>> data = np.load('ice_data/20080415.npy')
which will give you a 2-d array. Just for information, this was created with:
>>> np.save('ice_data/20080415.npy', data)
%matplotlib inline
import matplotlib.pyplot as plt
Start off by reading in one of the maps as shown above, and plot it with Matplotlib. Note that to get the correct orientation, you will need to call the imshow
command with the origin='lower'
option, which ensures that the (0,0) pixels is on the bottom left, not the top left. You can try and use different colormaps if you like (set by the cmap
option) - see here for information on the available colormaps. You can specify a colormap to use with e.g. cmap=plt.cm.jet
(i.e. cmap=plt.cm.
followed by the name of the colormap). Note that you can make figures larger by specifying e.g.
>>> plt.figure(figsize=(8,8))
where the size is given in inches. Try and find a way to plot a colorbar on the side, to show what color corresponds to what value. Examples can be found in the Matplotlib Gallery. You may also want to try to remove the tick labels (100
, 200
, etc.) since they are not useful.
import numpy
path_file = 'PS02-ice_data/ice_data/20080415.npy' # path of the input file
data = numpy.load(path_file) # open the input file
plt.figure(figsize = (8,8)) # set the figure size
plt.imshow(data, cmap = plt.cm.GnBu_r, origin = "lower") # plot the data in the input file
ticks_colorbar = numpy.linspace(numpy.nanmin(data), numpy.nanmax(data), 11) # determine what ticks will appear
# in the colorbar
plt.colorbar(ticks = ticks_colorbar) # add colorbar to the figure
plt.axis('off') # take out axis
plt.title("Map of the concentration of ice in the Artic - 2008/04/15") # title of the figure
plt.show() # show the figure
We now want to make a plot of the ice concentration over time. Reading in a single map is easy, but since we have 137 maps, we do not want to read them all in individually by hand. Write a loop over all the available files, and inside the loop, read in the data to a variable (e.g. data
), and also extract the year, month, and day as integer values (e.g. year
, month
, and day
). Then, also inside the loop, compute a variable time
which is essentially the fractional time in years (so 1st July 2012 is 2012.5). You can assume for simplicity that each month has 30 days - this will not affect the results later. Finally, also compute for each file the total number of pixels that have a value above 50%. After the loop, make a plot of the number of pixels with a concentration above 50% against time.
You will likely notice that the ticks are in a strange format, where they are given in years since 2006, but you can change this with the following code:
>>> from matplotlib.ticker import ScalarFormatter
>>> plt.gca().xaxis.set_major_formatter(ScalarFormatter(useOffset=False))
Describe what you see in the plot.
We now want something a little more quantitative than just the number of pixels, so we will try and compute the area where the ice concentration is above a given threshold. However, we first need to know the area of the pixels in the image, and since we are looking at a projection of a spherical surface, each pixel will be a different area. The areas (in km^2) are contained inside the file named ice_data_area.npy
(if you are using the CIP pool, this is /local/py4sci/ice_data_area.npy
). Read in the areas and make a plot (with colorbar) to see how the pixel area is changing over the image.
Now, loop over the files again as before, but this time, for each file, compute the total area where the concentration of ice is 99% or above. Make a new plot showing the area of >99% ice concentration against time.
Describe what you see - how does the minimum change over time?
### PART 1 - Plot the variation of the number of pixels over 50% with time
import glob # import modules
from matplotlib.ticker import ScalarFormatter
import os
path = 'PS02-ice_data/ice_data' # path of the input files
list_filenames = glob.glob(os.path.join(path,'*.npy')) # list of ice maps
date = []
time = numpy.ndarray(shape=len(list_filenames)) # we define empty arrays of the
pixels_over_half = numpy.ndarray(shape=len(list_filenames)) # appropiate length
for filename in list_filenames: # per each ice map
index = list_filenames.index(filename) # determine its index
date.append(filename[len(path)+1:-4]) # get its date from the name
year = int(date[index][:4]); # determine the year, month and day
month = int(date[index][4:6]);
day = int(date[index][6:])
time[index] = year + (month-1)/12.0 + (day-1)/(30.0*12) # compute its fractional time
data = numpy.load(filename) # load the ice map data
data[numpy.isnan(data)] = 0 # set all NaNs to zero
pixels_over_half[index] = len(data[data > 50]) # count the number of pixels over 50 %
plt.figure(figsize=(8,8)) # set the size of the figure
plt.plot(time, pixels_over_half/10**4) # plot the number of pixels vs time
plt.ylabel(r"Pixels over 50% [$\times10^4$]") # set the y-label
plt.xlabel("Time") # set the x-label
plt.xticks(time[time%0.25 == 0], rotation = 'vertical') # use as xticks times multiples of 0.25
plt.gca().xaxis.set_major_formatter(ScalarFormatter(useOffset=False)) # show years without bias
plt.show() # show the figure
### PART 1&2 - Plot the area per pixel and the variation of the number of pixels over 50% and the area covered with time
import glob # import modules
from matplotlib.ticker import ScalarFormatter
import os
path = 'PS02-ice_data/ice_data' # path of the input files
list_filenames = glob.glob(os.path.join(path,'*.npy')) # list of ice maps
path_areas = 'PS02-ice_data/'
data_areas = numpy.load(path_areas+'ice_data_area.npy')
plt.figure(figsize = (8,8)) # set the figure size
plt.imshow(data_areas, cmap = plt.cm.GnBu_r, origin = "lower") # plot the data in the input file
ticks_colorbar = numpy.linspace(numpy.nanmin(data_areas),
numpy.nanmax(data_areas), 11) # determine what ticks will appear
# in the colorbar
plt.colorbar(ticks = ticks_colorbar) # add colorbar to the figure
plt.axis('off') # take out axis
plt.title(r"Area per pixel [km$^2$]") # title of the figure
plt.show() # show the figure
date = []
time = numpy.ndarray(shape=len(list_filenames)) # we define empty arrays of the
pixels_over_half = numpy.ndarray(shape=len(list_filenames)) # appropiate length
total_area_over_99 = numpy.ndarray(shape=len(list_filenames))
for filename in list_filenames: # per each ice map
index = list_filenames.index(filename) # determine its index
date.append(filename[len(path)+1:-4]) # get its date from the name
year = int(date[index][:4]); # determine the year, month and day
month = int(date[index][4:6]);
day = int(date[index][6:])
time[index] = year + (month-1)/12.0 + (day-1)/(30.0*12) # compute its fractional time
data = numpy.load(filename) # load the ice map data
data[numpy.isnan(data)] = 0 # set all NaNs to zero
pixels_over_half[index] = len(data[data > 50]) # count the number of pixels over 50 %
data[data < 99] = 0 # set all data less than 99% to zero
total_area_over_99[index] = sum(sum(data_areas*data)) # total area = data[!=0]*area
plt.figure(figsize=(8,8)) # set the size of the figure
plt.plot(time, pixels_over_half/10**4) # plot the number of pixels vs time
plt.ylabel(r"Pixels over 50% [$\times10^4$]") # set the y-label
plt.xlabel("Time") # set the x-label
plt.xticks(time[time%0.25 == 0], rotation = 'vertical') # use as xticks times multiples of 0.25
plt.gca().xaxis.set_major_formatter(ScalarFormatter(useOffset=False)) # show years without bias
plt.figure(figsize=(8,8)) # set the size of the figure
plt.plot(time, total_area_over_99/10**9) # plot the number of pixels vs time
plt.ylabel(r"Total area over 99% [$\times10^9\,$km$^2$]") # set the y-label
plt.xlabel("Time") # set the x-label
plt.xticks(time[time%0.25 == 0], rotation = 'vertical') # use as xticks times multiples of 0.25
plt.gca().xaxis.set_major_formatter(ScalarFormatter(useOffset=False)) # show years without bias
plt.show() # show the figure
Find the date at which the area of the region where the ice concentration is above 99% is the smallest. What is the value of the minimum area?
Next, read in the map for this minimum, and the map for the same day and month but from 2006. Make a side-by-side plot showing the 2006 and the 2011 data.
Compute the difference between the two maps so that a loss in ice over time will correspond to a negative value, and a gain in ice will correspond to a positive value. Make a plot of the difference, and use the RdBu
colormap to highlight the changes (include a colorbar).
# Part 1 - Find the minimum area where the ice concentratio is above 99 % and its corresponding date
year_month_day = date[numpy.argmin(total_area_over_99)]
minimum_area = min(total_area_over_99)
print("Date and area of the minimum ice concentration above 99%: ", year_month_day, minimum_area, " km^2")
# Part 2 - Open the file corresponding to the minimum and its equivalent day on 2006 and plot them side to side
# set the filenames for both files
filename_minimum_ice_coverage = path + "/" + year_month_day + ".npy"
filename_2006_ice_coverage = path + "/2006" + year_month_day[4:] + ".npy"
# open and load both input files
data_minimum_ice_coverage = numpy.load(filename_minimum_ice_coverage)
data_2006_ice_coverage = numpy.load(filename_2006_ice_coverage)
fig = plt.figure(figsize=(10,5)) # create a figure of size (10,5) inches
ax1 = fig.add_subplot(121) # add the first subplot
im1 = ax1.imshow(data_2006_ice_coverage, cmap = plt.cm.GnBu_r,\
origin = "lower") # plot the data
ax1.set_title("2006" + year_month_day[4:]) # add a title with the date
ax1.axis('off') # take out axis
ticks_colorbar = numpy.linspace(numpy.nanmin(data_2006_ice_coverage),\
numpy.nanmax(data_2006_ice_coverage), 11) # determine what ticks will appear
# in the colorbar
plt.colorbar(im1, ticks = ticks_colorbar) # add colorbar to the first subplot
ax2 = fig.add_subplot(122) # add second subplot
im2 = ax2.imshow(data_minimum_ice_coverage, cmap = plt.cm.GnBu_r,\
origin = "lower") # plot the data
ax2.set_title(year_month_day) # add a title with the date
ax2.axis('off') # take out axis
ticks_colorbar = numpy.linspace(numpy.nanmin(data_minimum_ice_coverage),\
numpy.nanmax(data_minimum_ice_coverage), 11) # determine what ticks will appear
# in the colorbar
plt.colorbar(im2, ticks = ticks_colorbar) # add colorbar to the second subplot
plt.show() # show the figure
# Part 3: determine the difference between the two maps
data_difference_maps = data_minimum_ice_coverage - data_2006_ice_coverage
plt.figure(figsize=(7,7)) # create a figure of size (7,7) inches
plt.imshow(data_difference_maps, cmap = plt.cm.RdBu, origin = "lower") # plot the data
plt.title("Difference between maps (Negative is loss)") # add a title
plt.axis('off') # take out axis
ticks_colorbar = numpy.linspace(numpy.nanmin(data_difference_maps),\
numpy.nanmax(data_difference_maps), 11) # determine what ticks will appear
# in the colorbar
plt.colorbar(ticks = ticks_colorbar) # add colorbar to the first subplot
plt.show()
Compute average ice concentration maps for 2006 and 2011, and plot them side by side.
# small function to quickly determine the average maps given a list of filenames
def func_average_maps(list_filenames):
sum_data = 0 # empty array
for file in list_filenames: # per each map in the list
data = numpy.load(file) # load the data
data[numpy.isnan(data)] = 0 # remove the NaNs values
sum_data += data # add them all together
average_map = sum_data/(len(list_filenames)) # average over the number of maps
return average_map # return the average map
list_filenames_2011 = glob.glob(os.path.join(path,'2011*.npy')) # list of ice maps
list_filenames_2006 = glob.glob(os.path.join(path,'2006*.npy')) # list of ice maps
average_data_2006 = func_average_maps(list_filenames_2006) # determine average maps for 2006
average_data_2011 = func_average_maps(list_filenames_2011) # determine average maps for 2011
# plot the averages side by side
fig = plt.figure(figsize=(10,5)) # create a figure of size (10,5) inches
ax1 = fig.add_subplot(121) # add the first subplot
im1 = ax1.imshow(average_data_2006, cmap = plt.cm.GnBu,\
origin = "lower") # plot the data
ax1.set_title("Average for 2006") # add a title
ax1.axis('off') # take out axis
ticks_colorbar = numpy.linspace(numpy.nanmin(average_data_2006),\
numpy.nanmax(average_data_2006), 11) # determine what ticks will appear
# in the colorbar
plt.colorbar(im1, ticks = ticks_colorbar) # add colorbar to the first subplot
ax2 = fig.add_subplot(122) # add second subplot
im2 = ax2.imshow(average_data_2011, cmap = plt.cm.GnBu,\
origin = "lower") # plot the data
ax2.set_title("Average for 2011") # add a title
ax2.axis('off') # take out axis
ticks_colorbar = numpy.linspace(numpy.nanmin(average_data_2011),\
numpy.nanmax(average_data_2011), 11) # determine what ticks will appear
# in the colorbar
plt.colorbar(im2, ticks = ticks_colorbar) # add colorbar to the second subplot
plt.show() # show the figure
The data that we have here only cover five years, so we cannot reliably extract information about long term trends. However, it is worth noting that the minimum ice coverage you found here was a record minimum - never before (in recorded history) had the size of the ice shelf been so small. This is part of a long term trend due to global warming. In 2012, the record was again beaten, and most scientists believe that by ~2050, the Arctic will be completely ice-free for at least part of the summer.