Keep your eye on the man, not the dog

A time series consists of a set of data (almost always for a single variable — like temperature) at a set of times. The word “series” implies more than one — if we have taken many measurements, or computed many averages or transformations or whatever, and we know the times at which each applies, then we have a time series.

If we have N such data values at N times, let’s call them xj, where the subscript j can go from 1 to N. Let’s call the times at which they apply tj. We need to know both, so each observation consists of a pair of numbers, with (tj,xj) being the jth data pair.

As an example, consider the daily temperature (the average of the daily high and low temperatures) at Kremsmünster, Austria (the data are avaiable from ECA&D, the European Climate Assessment and Dataset Network). There’s quite a bit of data, N=51,861 days of it, so to keep things clearer without overcrowding a graph, I’ll start with a graph of just a small piece, the year 1969:

The first thing we notice is that it changes — a lot. The changes don’t seem to follow a simple, smooth progression, rather they jitter around in a random dance that never stops. But there’s also semblance of a pattern. In the early and late parts of the year (the season we call “winter”) temperature seem to be colder than average, while in the middle of the year (“summer”) they seem to be hotter.

Looking at more of the data, the entire decade from 1960 through 1969, it’s clear that the seasonal pattern is consistent. Summer wasn’t hotter, nor winter colder, only during 1968, but in all the years:

We can make a model of this pattern in many ways. I’ll use Fourier analysis to mimic the seasonal pattern which best matches the actual data — all of it, not just the decade of the 1960s. That looks like this (plotting just the 1960s so the graph isn’t so crowded):

Our mathematical model, shown as the thick red line, repeats exactly every year, as a good seasonal pattern should.

Of course the data don’t match the model exactly, as the data do not repeat exactly every year. We can learn a lot by computing the differences between the data values and our seasonal-model values, i.e. what the values actually are minus what they would have been if they followed the seasonal pattern. Such differences are called residuals, and for the 1960s they look like this:

The residuals look, and for the most part are, just random. Hence our actual mathematical model of the data — so far — is that it is a seasonal pattern plus random fluctuations. Let’s call the seasonal pattern f(tj), where f is a function of time and tj is the set of times for which we have data. Let’s call the random deviations \varepsilon_j. Then our model is

x_j = f(t_j) + \varepsilon_j.

In order for the pattern to be seasonal, the function f(t) has to be periodic (perpetually repeating) with a period of 1 year. Hence it can change throughout the year but it can’t change from one year to the next, every different year is just like the others.

Because it repeats, the seasonal pattern is predictable. The deviations from that pattern, being random, are not predictable. Therefore what we’ve done is to separate the data into two components: the deterministic part (that seasonal function f(t)) and the stochastic part (the random deviations \varepsilon_j). We can call the predictable part the signal and the random part the noise. Our model is

data = signal + noise,

and the signal is periodic, the seasonal pattern.

The predictable part, which so far we’ve identified with the seasonal pattern, is the climate. The data themselves make up the weather. The climate includes both the long-term average and its seasonal variation, but the weather also includes the residuals on top of that. That’s why one definition of climate (due to John Herbertson, but often attributed to Mark Twain) is: “Climate is what, on an average, we expect. Weather is what we actually get.”

When “what, on an average, we expect” remains the same from year to year, when expectation changes only in cyclic fashion and the cycle repeats year after year, we say the climate is stable. That too is what we tend to expect. If winters tend to be extremely cold year after year, we expect that to continue in the years to come. When summers tend to be extremely hot, or very mild, or for that matter drenched in rain or parched in drought, if such conditions repeat year after year, with random fluctuations of course but with consistency, we expect those conditions to repeat in future years. It’s the nature of human experience that we expect climate to be stable.

We do not expect weather to be stable. Weather includes the fluctuations as well as the long-term average. Those fluctuations are random in the sense of being unpredictable in the long term. But they will happen; the one thing about random fluctuations that we can reliably anticiptate is that they won’t stop, they’ll keep on fluctuating.

If our seasonal pattern includes all the signal, all the part that we expect to persist in future years, then the residuals — what’s left over after we subtract away the signal — is just noise. It’s just random. When the residuals are random, of course we can’t predict them in detail but we can say some things about them. For one thing, their average value should be zero. The typical size of the fluctuations is also predictable.

It’s like rolling dice: you can’t predict what the dice roll will be (unless the dice are loaded!), but you can predict that the average of many rolls will be very near 7 (for rolling a pair of dice and adding their results together). You can also know, in advance, that the roll can’t be lower than 2 or higher than 12 (for rolling a pair of dice and adding their results together). We don’t know, and can’t know, what the next roll will be, but we’re not entirely ignorant either. If we do our job well, we should be able to tell what the average will be and how extreme the fluctuations will be. At best, we can even understand the probability of each possible result. We can certainly do this with dice rolls, in fact for a pair of dice we can compute the probability for each and every possible outcome.

Casinos who offer the gambling game “craps” know those probabilities, they know all the odds, and they use that information to structure the bets they offer and the price for losing or payoff for winning, in order to have an advantage (a “house edge”) in the long-term average. Any individual roll might pay off for the player, not the house, but they know that in the long run the average of their payoffs will be on the house side. That’s how casinos stay in business; they might lose any single bet, but averaged over a vast number of bets the odds are so strongly in their favor that operating a casino is a sound business strategy.

The casino takes advantage of the fact that the average, the degree of variation, even the probabilities for each possible outcome, remain the same. The average, the probabilities, we could call the climate of dice rolls, while the individual rolls are the weather. Casinos rely on the climate (of dice rolls) being stable, and it is — which is how they make a handsome living from operating gambling games. Stable “dice climate” is good for the casino business. Changes to the average, or the probabilities, are bad for the casino business. That’s why they go to such lengths to prevent you from replacing their dice with “loaded” dice. Their entire business model, the reliability of the house winning in the long run, depends on the odds remaining the same. When you use loaded dice, you change the “dice climate” and if the casino doesn’t find out, if they let you play in the changed climate that you know about but they don’t, it’s not only bad for their business, it can be disastrous.

Let’s look at the residuals again, but instead of just the 1960s, let’s view all 142+ years from January 1st 1876 through January 31st 2018:

It certainly has kept fluctuating the whole time. But there’s a hint — just a hint, mind you — of a recent change in the average. It looks like the residuals, which we’ll call anomaly values, seem to be a wee bit higher — on average, mind you — in the last few decades than they were before.

I’ve often emphasized (and will continue to do so) that “looks like” isn’t proper justification for making a conclusion. But it’s a great way to get ideas. The way to test those ideas is to use rigorous statistical methods. In the next post, we’ll test whether or not the changes that seem to have happened to daily temperature in Kremsmünster, Austria, are real changes to the nature of things, or just “look like” it because they’re random, and random means unpredictable and sometimes surprising.


One thought on “Keep your eye on the man, not the dog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s