The natural language of any signal, periodic in space or time or both is **Fourier**. A large part of Optics *f*2*f* (all about light waves that are periodic in space and time) is devoted to Fourier analysis and many of the figures are made using Fourier transforms. In our view Fourier is the key to understanding light propagation, and many other topics in physics. To get more insight into Fourier techniques it is worth exploring other contexts. This article illustrates a data analysis example where Fourier techniques provide useful insight. Not optics, but still interesting and very important.

One important dataset (both historically and looking ahead) is the CO_{2} record collected at the Mauna Loa Observatory first by Charles Keeling, then Ralph Keeling, and others. Everyone should analyse the CO_{2} record and share their conclusions. This exercise highlights some of the important choices we have to make when we analyse data. Would you make the same choices?

The data is available via FTP (link to .txt file) from the NOAA. The code below (partly inspired by this post) is written in python and run using anaconda. You can cut and paste the code into your own python notebook or download the full notebook here (right click and save target). First some preliminaries.

import numpy as np import matplotlib.pyplot as plt import pandas as pd dpi=[196.0/255.0,59.0/255.0,142.0/255.0] # Durham pink dr=[170.0/255.0,43.0/255.0,74.0/255.0] # Durham red dy=[232.0/255.0,227.0/255.0,145.0/255.0] # Durham yellow dg=[159.0/255.0,161.0/255.0,97.0/255.0] # Durham green db=[0,99.0/255.0,136.0/255.0] # Durham blue dp=[126.0/255.0,49.0/255.0,123.0/255.0] # Durham purple dv=[216.0/255.0,172.0/255.0,244.0/255.0] # Durham violet

Now import the data and plot.

co2_file = "ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_weekly_mlo.txt" df_co2 = pd.read_csv(co2_file, sep='\s+', comment='#', header=None, usecols=[0,1,2,3,4,5,6,7], names=['year','mm','dd','time','CO2_ppm','days','CO2_1yr','CO2_10yr']) #print(df_co2.head(6)) #print( 6*" . . . ") #print(df_co2.tail(6)) x = pd.Series(df_co2['time']).values y = pd.Series(df_co2['CO2_ppm']).values for ii in range (0,np.size(y)): #set missing data to previous value if y[ii]<0: y[ii]=y[ii-1] plt.rcParams.update({'font.size': 18}) plt.rcParams.update({'figure.figsize':(6,6)}) fig = plt.figure(1,facecolor='white') ax = fig.add_axes([0.05, 0.3, 0.8, 1.0]) p1=plt.plot(x,y,'.') plt.xlim(1974,2020) plt.ylim(320,418) plt.ylabel('CO$_2$ (ppm)') plt.xlabel('Year')

This outputs a plot of the complete data set from 1974 to May 2019.

As the CO_{2} data is periodic and we can fit it using Fourier analysis. To see what harmonics contribute we perform a numerical Fourier transform using pythons built-in fft module.

from numpy.fft import fft, ifft, fftshift num=np.size(x) f=fft(y) n_yrs=x[num-1]-x[0] #no. of year freq=np.arange(0,num/n_yrs,1/n_yrs) # lowest freq, i.e. freq step is 1/n_yrs spectrum=f.real*f.real+f.imag*f.imag nspectrum=spectrum/spectrum[0] plt.rcParams.update({'font.size': 24}) plt.rcParams.update({'figure.figsize':(8,6)}) fig = plt.figure(1,facecolor='white') ax = fig.add_axes([0.05, 0.3, 0.8, 1.0]) p1=plt.semilogy(freq,nspectrum,color=db) plt.xlim(0,5.5) plt.ylim(1e-8,4e-5) plt.xlabel('Frequency (yr$^{-1}$)') plt.ylabel('Normalised spectrum'))

The plot shows that power spectrum, see p. 134 in Optics *f*2*f*.

A crucial part in this code is the frequency scale (line 7). The trick (see p. 252 in Optics *f*2*f* for how the fft algorithm works) is that we know that the smallest frequency is once over the total time of the whole data series (n_yrs), i.e. the increments are 1/(n_yrs). The number of points along the frequency axis must be the same as the number of points in the time series (num) so the maximum frequency value is num/(n_yrs). We are lucky with this dataset because we know the fundamental is once per year, so if we get the scaling wrong it is pretty obvious.

The Fourier spectrum suggests that there is a significant second harmonic (biannual oscillation) and some third and forth. Let’s choose to fit the data with the fundamental and second harmonic only. This next piece of code does the curve fit (up to line 10) and make some plots (all the rest). We can call our fit the Business-As-Usual (BAU) model and in principle we can use this to predict the future assuming that the patterns of the last 45 years continue to be followed. The fit function has seven parameters, A, B and C characterise the secular trend, D and E the amplitude and phase of the seasonal variation, and F and G the second harmonic. We can either fit the whole data or choose a cut-off date to see how well the model predicted the recent history. Below we omit the last two years and then see how the model is doing over the last two years.

from scipy.optimize import leastsq, curve_fit def fit3(x,A,B,C,D,E,F,G): return A+B*x+C*x*x+D*np.cos(2*np.pi*x+E)+F*np.cos(2*2*np.pi*x+G) date_index=2250 #cut-off date for checking progress A,B,C,D,E,F,G=curve_fit(fit3,x[0:date_index],y[0:date_index])[0] # exclude last two years take only 2250 points x_fit=np.arange(x[0],x[num-1],(x[num-1]-x[0])/num) y_fit=fit3(x_fit,A,B,C,D,E,F,G) plt.rcParams.update({'font.size': 18}) plt.rcParams.update({'figure.figsize':(6,6)}) fig = plt.figure(1,facecolor='white') ax = fig.add_axes([0.05, 0.3, 0.8, 1.0]) p1=plt.plot(x,y,'.') p1=plt.plot(x_fit,y_fit,'.') plt.xlim(1990,1999) plt.ylim(345,415) plt.ylabel('CO$_2$ (ppm)') plt.xlabel('Year') ax = fig.add_axes([0.25, 0.75, 0.5, 0.5]) p1=plt.plot(x,y,'.') p1=plt.plot(x_fit,y_fit) plt.xlim(1994,1996.2) plt.ylim(354.2,365) ax = fig.add_axes([1.0, 0.3, 0.8, 1.0]) p1=plt.plot(x,y,'.') p1=plt.plot(x_fit,y_fit,'.') plt.xlim(2010,2019) plt.ylim(345,415) plt.xlabel('Year') ax = fig.add_axes([1.2, 0.4, 0.5, 0.5]) p1=plt.plot(x,y,'.') p1=plt.plot(x_fit,y_fit) plt.xlim(2017,2019.2) plt.ylim(402.2,413) plt.show()

Although the model reproduces the seasonal variation, the secular variation sometimes drifts lower (as in the 90s) or higher (after 2015) than the model value. We shall come back to this.

As an aside we compare the BAU model secular variation to the temperature anomaly (HadCRUT4, you need to copy the data file and save it locally in the same direction).

temp_file = "HadCRUT_2019_06_02.txt" df_temp = pd.read_csv(temp_file, sep='\s+', comment='#', header=None) temp = pd.Series(df_temp[1]).values time=np.arange(1850.08333,2019.3,1.0/12.0) def fit1(x,A1,B1,C1): return A1+B1*x+C1*x*x A1,B1,C1=curve_fit(fit1,time[1400:2031],temp[1400:2031])[0] # data range to match CO2 temp_fit=fit1(time,A1,B1,C1) fig = plt.figure(1,facecolor='white') ax = fig.add_axes([0.05, 0.3, 0.8, 1.0]) p1=plt.plot(time,temp,'.') p1=plt.plot(time[1400:2300],temp_fit[1400:2300],'.') plt.xlim(1974,2019.3) plt.ylim(-0.5,1.2) plt.xlabel('Year') plt.ylabel('Temp. anomaly $^\circ$C') ax = fig.add_axes([1.2, 0.3, 0.8, 1.0]) p1=plt.plot(x,y,'.') #p1=plt.plot(x_fit,y_fit,'.') p1=plt.plot(x_fit,y_fit-D*np.cos(2*np.pi*x_fit+E)-F*np.cos(2*2*np.pi*x_fit+G),'.') plt.xlim(1974,2019.3) plt.ylim(285,455) plt.xlabel('Year') plt.ylabel('CO$_2$ (ppm)')

This plot illustrates the strong correlation between the level of CO_{2} and temperature (at least for the period 1975-2020). In principle we can use this to predict the future,and this is how we arrive at the +3 – +4 degree rise by 2100 under the BAU scenario. But if we reduce our emissions, how soon might we know if we are doing better?

To answer this we can look at the residuals (the difference between the actual measurement that the BAU model prediction). If the recent the trend is below the statistical uncertainty we can start to say that we are making progress. The plot below shows the residuals for the two time periods considered above. We have included 5-week average data with errorbars.

new_y=y-y_fit # 5 week average x_re=np.reshape(x[0:2350],(470,5)) y_re=np.reshape(new_y[0:2350],(470,5)) x_av=np.mean(x_re,axis=1) y_av=np.mean(y_re,axis=1) x_std=np.std(x_re,axis=1) y_std=np.std(y_re,axis=1) fig = plt.figure(1,facecolor='white') ms=25 # marker size ax = fig.add_axes([0.05, 0.3, 0.75, 1.0]) p1=plt.plot(x_fit,new_y,'.',color=db) for ii in range (0,470): p2=plt.plot([x_av[ii],x_av[ii]],[y_av[ii]-y_std[ii]/np.sqrt(5),y_av[ii]+y_std[ii]/np.sqrt(5)],linewidth=4,color=dr,alpha=0.5) plt.xlim(1990,1999) plt.ylim(-3.0,3.0) plt.xlabel('Year') plt.ylabel('Residuals: Data $-$ BAU Model (ppm)') ax = fig.add_axes([0.85, 0.3, 0.75, 1.0]) p1=plt.plot(x_fit,new_y,'.',color=db) p1=plt.plot(x_fit[date_index:2350],new_y[date_index:2350],'.',color=dg) for ii in range (0,470): p2=plt.plot([x_av[ii],x_av[ii]],[y_av[ii]-y_std[ii]/np.sqrt(5),y_av[ii]+y_std[ii]/np.sqrt(5)],linewidth=4,color=dr,alpha=0.5) plt.xlabel('Year') plt.xlim(2010,2019) plt.ylim(-3.0,3.0) ax.set_yticklabels([]) ax = fig.add_axes([1.65, 0.3, 0.55, 1.0]) nbins=120 n, bins, patches=ax.hist(new_y[0:date_index],nbins,range=(-3.0,3.0),orientation='horizontal',facecolor=db, edgecolor=db, alpha=0.25) n1, bins1, patches=ax.hist(new_y[0:date_index],40,range=(-1,1),orientation='horizontal',facecolor=db,edgecolor=db, alpha=0.25) n2, bin21, patches=ax.hist(new_y[date_index:2350],nbins,range=(-3.0,3.0),orientation='horizontal',facecolor=dg,edgecolor=dg, alpha=0.75) ax.set_yticklabels([]) plt.xlabel('No. of weeks') plt.ylim(-3.0,3.0) print (np.array2string(100*np.sum(n[40:80])/np.size(x[0:date_index]),precision=1),'% within 1 ppm of BAU prediction')

What these plots show is that the BAU model is not perfect because of a long term drift in the secular level that is not included. This drift is around between 1 and 2 ppm, and in the short term makes it hard to say that any policy change is having an effect until we see an undershoot of > 2 ppm.

On the right we show a histogram of the residuals. Over 75% of the data are within +/- 1 ppm of the BAU prediction, but only 11% are within standard error +/- 0.15 ppm. So we cannot say that the BAU model is a ‘good fit’ [1], because it does not capture this long term drift. The data for the last two years is shown in green. Clearly, the recent trend is worse that predicted by the BAU model for the previous 4 decades. My conclusion, so far, no progress, rather things seems to b egetting worse although too early to say whether is something more than the type of drift that we have seem before.

Finally, we can repeat the Fourier transform on the residuals to see if there is any structure left. This time we have chosen a loglog scale so we can look at the long term drift.

f=fft(new_y) spectrum=f.real*f.real+f.imag*f.imag nspectrum=spectrum/spectrum[1] plt.rcParams.update({'font.size': 24}) plt.rcParams.update({'figure.figsize':(8,6)}) fig = plt.figure(1,facecolor='white') ax = fig.add_axes([0.05, 0.3, 0.8, 1.0]) #p1=plt.semilogy(freq,nspectrum,color=db) p1=plt.loglog(freq,nspectrum,color=db) plt.xlim(5e-2,5.5) plt.ylim(1e-5,2e0) plt.xlabel('Frequency (yr$^{-1}$)') plt.ylabel('Normalised spectrum')

The results looks like 1/f noise as we might expect.

**References:**

[1] Hughes and Hase, Measurements and their uncertainties, OUP 2010.