# Dissecting Data: CO2

Temperature isn’t the only thing about the Earth that’s changing, right before our eyes, so of course we’ll look at lots of graphs of lots of data about Earth’s changes. But because we want to get scientific about it, we’ll do more than just look at graphs. We’ll take the data and pick it apart, try to understand what it’s made of and what it’s telling us. We’ll pick up a scalpel and slice into it, dissect it, looking for the information that’s there in the most succinct and compact possible form.

And we’ll start in early 1958. That’s when Charles Keeling begin measuring how much CO2 is in the air on a regular basis, monitoring it continuously. Thanks to his efforts, we have a record of atmospheric CO2 from then until now with just a few gaps, from the Mauna Loa Atmospheric Observatory in Hawaii.

CO2 is measured in parts per million by volume (ppmv), sometime just called parts per million (ppm). If you plucked a million molecules out of the air at random, it’s the number of them you’d expect to be CO2. The current CO2 concentration is 408 ppm, but it changes from day to day, month to month, year to year. The data we’ll focus on is Mauna Loa’s monthly average CO2 concentration (in ppm) for each month from March 1958 through January 2018, albeit with a few months missing. And here it is:

It’s pretty clear just by looking, that overall, CO2 concentration is going up. I often remind people that “looks like” is not valid statistical evidence, that visual inspection of a graph can give you great ideas but also false impressions and should be regarded with caution. But in this case, go with your gut. It’s going up.

That raises the question, how fast?

The first guess (educated guess, that is) usually comes from fitting a straight line to the data. It finds the straight line which keeps as close to the data as a straight line can. There are different ways to define “close” so there are different ways to define the line, but the most common by far is called least squares regression and it chooses this line:

It follows pretty closely, by increasing at a rate of 1.54 ppm/year. But again looking at the graph, it seems there is still some pattern — the data are consistently above the line very early and very late, while consistently below the line during the 1980s and 1990s. And there are those wiggles up and down, they look like they might be happening on a yearly basis, that we haven’t even mentioned yet. Clearly there’s more going on here than just following a straight line.

We often gain real insight by looking at the difference between what the data values are, and what the “model we’ve got so far” says they would be. Since the “model so far” is a straight line, those are just the straight line values. Let’s take each data value, subtract what it would be if it followed the straight line at that time, and call the result residuals. Then the data are equal to “model so far” (straight line) plus residuals.

For the straight line model so far, here are the residuals:

Yes, there’s some regular up-and-down fluctuation that just keeps going, and yes, there’s still long-term change that we haven’t included in our model so far.

So it’s going up but not just following a straight line. That means the rate of increase — the “how fast” it’s going up or down, isn’t constant. It changes. The linear model fails to capture this because it always has a constant rate of increase. That’s what makes it “linear.”

To include rate change we need another model, and often the first guess (educated guess, that it) comes from fitting a quadratic function to the data. A quadratic has the form $x = \beta_o + \beta_1 t + \beta_2 t^2$, where $\beta_o, \beta_1, \beta_2$ are constants, t is the time, and x is the data value. “Fit” means to find the values for those constants which give the quadratic curve coming closest to the data at all the right times. And again, “closest” can be defined in several ways so lots of different ways of picking the “best” values are available. We’ll stick to the most common way, least squares regression.

And that gives us this (the red line):

That matches the data a lot better than the linear model — at least it “looks like!” When we look at the residuals, the differences between the data and what this quadratic model would give, it’s this:

We’re now at the point where the long-term changes in these residuals are no bigger than, or smaller than, those regular cyclic up-and-down fluctuations that keep repeating year after year.

Let’s switch gears and look at the quick probably-yearly up-and-down pattern. The first thing I want to do is isolate it from other changes. To do that, I’ll estimate what those other, slower changes are with a fancy smoothing function:

Now I’ll subtract this “slow changes model” and look at its residuals:

This looks like it’s mostly yearly fluctuation. Let’s plot this same data, but instead of plotting residual by time, let’s plot it by month. That way, all the Januarys will be plotted in one place, all the Februarys, etc. up to all the Decembers. When we do that, these CO2 residuals paint this picture of a seasonal pattern:

With this view of the seasonal pattern, with as much of the non-seasonal pattern cleared away as practical, we see that CO2 peaks each May at about 3.02 ppm above the long-term yearly average value, and reaches its minimum in September or October down at about -3.23 ppm.

Now we can subtract this model — the “seasonal pattern model” from the data to get residuals from the seasonal pattern. We’ll call them de-seasonalized, and just note that they’re strongly related to taking anomaly values. In any case, when we subtract the seasonal pattern from the data we get de-seasonalized data:

We can fit our quadratic long-term model to this:

and again we get an impressive fit, with these residuals left over:

Those residuals still show hints of pattern, but it’s more complex and it’s smaller in size, in amplitude. This becomes clear if we plot all our components on the same graph on the same scale. We’ll put a red line for the long-term quadratic pattern (shifted to average value zero), a blue line for the seasonal pattern, and a black line for the residuals left over, so you can see how they size up next to each other:

Probably the most important result is encompassed in that long-term quadratic model — it’s certainly the bulk of the change over the last 60 years. It fits the data well enough that it gives a good estimate, not only of the value of the long term trend, but of the rate of increase as well.

By the quadratic model, the rate of increase is itself getting faster, speeding up consistently from year to year. CO2 rose by only 0.79 ppm/year back in early 1958, but the rate is now up to 2.29 ppm/yr, more than twice as fast. Other, more sophisticated estimates than just using a quadratic model suggest the early rise rate was around 0.7 ppm/year, but now it’s up to 2.5 ppm/year.

There’s other interesting stuff yet to be uncovered. For instance, the seasonal cycle, which we approximated and then removed for convenience, is not the same throughout the whole time period. The earlier seasonal cycle was smaller, but the present seasonal cycle tends to be on the bigger side.

We know the physics behind the main changes we found. For the seasonal cycle it’s the fact that plants take in CO2 when they grow (they need the carbon), and release it when they die, rot, and decay. This happens on a yearly cycle in the northern hemisphere, where most of the world’s land plants are beause that’s where most of the world’s land is. Plant growth from Spring through early Autumn lowers CO2 in the air, then plant decay from Autumn until the next spring raises it up again.

As for the long-term rise, that’s because of human activity. It’s mostly from burning fossil fuels, which releases CO2 directly into the air. Over the years we’ve emitted more and more CO2, with the result that not only has CO2 risen, it has risen at an ever-faster rate, and so far there’s no evidence that pattern has changed. CO2 growth isn’t slowing down, it looks like it’s still speeding up.

## 5 thoughts on “Dissecting Data: CO2”

1. Very nice exposition–and yes, I know I keep writing that! I was wondering early on why the seasonal signal was being retained, but dealing with that issue later in the essay started to make a lot of sense as the discussion unfolded.

Like

2. I’ve previously heard people make a big deal of the effect of the El-Niño Southern Oscillation (ENSO) on the annual change in CO2 concentration so I’m surprised that it doesn’t seem at all important once the quadratic fit is removed.

I’ve always ascribed the temporary slowdown in the increase in CO2 emissions to the collapse of the Eastern Bloc.

Like

3. Excellent.

It’s interesting that the “smallest” component, the residuals after quadratic removal and after seasonal, reveal the secular variation …. the stuff that is a more-or-less one-off kind of change.

Like

4. Larry

In a short analysis (“Global Warming Acceleration Plus Miscellaneous”, http://www.columbia.edu/~jeh1/) sent out by email today, James Hansen does brief separate analyses of the maxima and minima for surface temperatures from 1995 to the present and the differing trends if those are considered as separate datasets.

On your Open Mind blog (where I would have posted this note if I could find a relevant place to put it) I have suggested a few times in the past that if you were to analyze separately the trends in the maxima and minima for sea ice extent, that might be quite interesting as opposed to looking just at the general trend.

I never got a response from you or others in the comments I left, asking you to consider doing such an analysis. But I think it is a unique approach that produce very interesting results. So, I’m asking again here if you might consider doing such an analysis. —- Best regards.

Like

1. What, too, would be interesting, is why minima, as Professor Hansen suggests, are better to use for such estimates, than maximum, which was Jeremy Grantham’s original proposal.

Like