I’ve had questions about the use of ensembles of climate models for a while. I was helped by working through a bunch of papers which explain the origin of ensemble forecasting. I still have my questions but maybe this post will help to create some perspective.
The Stochastic Sixties
Lorenz encapulated the problem in the mid-1960’s like this:
The proposed procedure chooses a finite ensemble of initial states, rather than the single observed initial state. Each state within the ensemble resembles the observed state closely enough so that the difference might be ascribed to errors or inadequacies in observation. A system of dynamic equations previously deemed to be suitable for forecasting is then applied to each member of the ensemble, leading to an ensemble of states at any future time. From an ensemble of future states, the probability of occurrence of any event, or such statistics as the ensemble mean and ensemble standard deviation of any quantity, may be evaluated.
Between the near future, when all states within an ensemble will look about alike, and the very distant future, when two states within an ensemble will show no more resemblance than two atmospheric states chosen at random, it is hoped that there will be an extended range when most of the states in an ensemble, while not constituting good pin-point forecasts, will possess certain important features in common. It is for this extended range that the procedure may prove useful.
Epstein picked up some of these ideas in two papers in 1969. Here’s an extract from The Role of Initial Uncertainties in Prediction.
While it has long been recognized that the initial atmospheric conditions upon which meteorological forecasts are based are subject to considerable error, little if any explicit use of this fact has been made.
Operational forecasting consists of applying deterministic hydrodynamic equations to a single “best” initial condition and producing a single forecast value for each parameter..
..One of the questions which has been entirely ignored by the forecasters.. is whether of not one gets the “best” forecast by applying the deterministic equations to the “best” values of the initial conditions and relevant parameters..
..one cannot know a uniquely valid starting point for each forecast. There is instead an ensemble of possible starting points, but the identification of the one and only one of these which represents the “true” atmosphere is not possible.
In essence, the realization that small errors can grow in a non-linear system like weather and climate leads us to ask what the best method is of forecasting the future. In this paper Epstein takes a look at a few interesting simple problems to illustrate the ensemble approach.
Let’s take a look at one very simple example – the slowing of a body due to friction.
Rate of change of velocity (dv/dt) is proportional to the velocity, v. The “proportional” term is k, which increases with more friction.
dv/dt = -kv, therefore, v = v0.exp(-kt), where v0 = initial velocity
With a starting velocity of 10 m/s and k = 10-4 (in units of 1/s), how does velocity change with time?
Figure 1 – note the logarithm of time on the time axis, time runs from 10 secs – 100,000 secs
Probably no surprises there.
Now let’s consider in the real world that we don’t know the starting velocity precisely, and also we don’t know the coefficient of friction precisely. Instead, we might have some idea about the possible values, which could be expressed as a statistical spread. Epstein looks at the case for v0 with a normal distribution and k with a gamma distribution (for specific reasons not that important).
Mean of v0: <v0> = 10 m/s
Standard deviation of v0: σv = 1m/s
Mean of k: <k> = 10-4 /s
Standard deviation of k: σk = 3×10-5 /s
The particular example he gave has equations that can be easily manipulated, allowing him to derive an analytical result. In 1969 that was necessary. Now we have computers and some lucky people have Matlab. My approach uses Matlab.
What I did was create a set of 1,000 random normally distributed values for v0, with the mean and standard deviation above. Likewise, a set of gamma distributed values for k.
Then we take each pair in turn and produce the velocity vs time curve. Then we look at the stats of the 1,000 curves.
Interestingly the standard deviation increases before fading away to zero. It’s easy to see why the standard deviation will end up at zero – because the final velocity is zero. So we could easily predict that. But it’s unlikely we would have predicted that the standard deviation of velocity would start to increase after 3,000 seconds and then peak at around 9,000 seconds.
Here is the graph of standard deviation of velocity vs time:
Now let’s look at the spread of results. The blue curves in the top graph (below) are each individual ensemble member and the green is the mean of the ensemble results. The red curve is the calculation of velocity against time using the mean of v0 and k:
The bottom curve zooms in on one portion (note the time axis is now linear), with the thin green lines being 1 standard deviation in each direction.
What is interesting is the significant difference between the mean of the ensemble members and the single value calculated using the mean parameters. This is quite usual with “non-linear” equations (aka the real world).
So, if you aren’t sure about your parameters or your initial conditions, taking the “best value” and running the simulation can well give you a completely different result from sampling the parameters and initial conditions and taking the mean of this “ensemble” of results..
Epstein concludes in his paper:
In general, the ensemble mean value of a variable will follow a different course than that of any single member of the ensemble. For this reason it is clearly not an optimum procedure to forecast the atmosphere by applying deterministic hydrodynamic equations to any single initial condition, no matter how well it fits the available, but nevertheless finite and fallible, set of observations.
In Epstein’s other 1969 paper, Stochastic Dynamic Prediction, is more involved. He uses Lorenz’s “minimum set of atmospheric equations” and compares the results after 3 days from using the “best value” starting point vs an ensemble approach. The best value approach has significant problems compared with the ensemble approach:
Note that this does not mean the deterministic forecast is wrong, only that it is a poor forecast. It is possible that the deterministic solution would be verified in a given situation but the stochastic solutions would have better average verification scores.
One of the important points in the earlier work on numerical weather forecasting was the understanding that parameterizations also have uncertainty associated with them.
For readers who haven’t seen them, here’s an example of a parameterization, for latent heat flux, LH:
LH = LρCDEUr(qs-qa)
which says Latent Heat flux = latent heat of vaporization x density x “aerodynamic transfer coefficient” x wind speed at the reference level x ( humidity at the surface – humidity in the air at the reference level)
The “aerodynamic transfer coefficient” is somewhere around 0.001 over ocean to 0.004 over moderately rough land.
The real formula for latent heat transfer is much simpler:
LH = the covariance of upwards moisture with vertical eddy velocity x density x latent heat of vaporization
These are values that vary even across very small areas and across many timescales. Across one “grid cell” of a numerical model we can’t use the “real formula” because we only get to put in one value for upwards eddy velocity and one value for upwards moisture flux and anyway we have no way of knowing the values “sub-grid”, i.e., at the scale we would need to know them to do an accurate calculation.
That’s why we need parameterizations. By the way, I don’t know whether this is a current formula in use in NWP, but it’s typical of what we find in standard textbooks.
So right away it should be clear why we need to apply the same approach of ensembles to the parameters describing these sub-grid processes as well as to initial conditions. Are we sure that over Connecticut the parameter CDE = 0.004, or should it be 0.0035? In fact, parameters like this are usually calculated from the average of a number of experiments. They conceal as much as they reveal. The correct value probably depends on other parameters. In so far as it represents a real physical property it will vary depending on the time of day, seasons and other factors. It might even be, “on average”, wrong. Because “on average” over the set of experiments was an imperfect sample. And “on average” over all climate conditions is a different sample.
The Numerical Naughties
The insights gained in the stochastic sixties weren’t so practically useful until some major computing power came along in the 90s and especially the 2000s.
Here is Palmer et al (2005):
Ensemble prediction provides a means of forecasting uncertainty in weather and climate prediction. The scientific basis for ensemble forecasting is that in a non-linear system, which the climate system surely is, the finite-time growth of initial error is dependent on the initial state. For example, Figure 2 shows the flow-dependent growth of initial uncertainty associated with three different initial states of the Lorenz (1963) model. Hence, in Figure 2a uncertainties grow significantly more slowly than average (where local Lyapunov exponents are negative), and in Figure 2c uncertainties grow significantly more rapidly than one would expect on average. Conversely, estimates of forecast uncertainty based on some sort of climatological-average error would be unduly pessimistic in Figure 2a and unduly optimistic in Figure 2c.
The authors then provide an interesting example to demonstrate the practical use of ensemble forecasts. In the top left are the “deterministic predictions” using the “best estimate” of initial conditions. The rest of the charts 1-50 are the ensemble forecast members each calculated from different initial conditions. We can see that there was a low yet significant chance of a severe storm:
Figure 5 – Click to enlarge
In fact a severe storm did occur so the probabilistic forecast was very useful, in that it provided information not available with the deterministic forecast.
This is a nice illustration of some benefits. It doesn’t tell us how well NWPs perform in general.
One measure is the forecast spread of certain variables as the forecast time increases. Generally single model ensembles don’t do so well – they under-predict the spread of results at later time periods.
Here’s an example of the performance of a multi-model ensemble vs single-model ensembles on saying whether an event will occur or not. (Intuitively, the axes seem the wrong way round). The single model versions are over-confident – so when the forecast probability is 1.0 (certain) the reality is 0.7; when the forecast probability is 0.8, the reality is 0.6; and so on:
We can see that, at least in this case, the multi-model did a pretty good job. However, similar work on forecasting precipitation events showed much less success.
In their paper, Palmer and his colleagues contrast multi-model vs multi-parameterization within one model. I am not clear what the difference is – whether it is a technicality or something fundamentally different in approach. The example above is multi-model. They do give some examples of multi-parameterizations (with a similar explanation to what I gave in the section above). Their paper is well worth a read, as is the paper by Lewis (see links below).
The concept of taking a “set of possible initial conditions” for weather forecasting makes a lot of sense. The concept of taking a “set of possible parameterizations” also makes sense although it might be less obvious at first sight.
In the first case we know that we don’t know the precise starting point because observations have errors and we lack a perfect observation system. In the second case we understand that a parameterization is some empirical formula which is clearly not “the truth”, but some approximation that is the best we have for the forecasting job at hand, and the “grid size” we are working to. So in both cases creating an ensemble around “the truth” has some clear theoretical basis.
Now what is also important for this theoretical basis is that we can test the results – at least with weather prediction (NWP). That’s because of the short time periods under consideration.
A statement from Palmer (1999) will resonate in the hearts of many readers:
A desirable if not necessary characteristic of any physical model is an ability to make falsifiable predictions
When we come to ensembles of climate models the theoretical case for multi-model ensembles is less clear (to me). There’s a discussion in IPCC AR5 that I have read. I will follow up the references and perhaps produce another article.
The Role of Initial Uncertainties in Prediction, Edward Epstein, Journal of Applied Meteorology (1969) – free paper
Stochastic Dynamic Prediction, Edward Epstein, Tellus (1969) – free paper
On the possible reasons for long-period fluctuations of the general circulation. Proc. WMO-IUGG Symp. on Research and Development Aspects of Long-Range Forecasting, Boulder, CO, World Meteorological Organization, WMO Tech. EN Lorenz (1965) – cited from Lewis (2005)
Roots of Ensemble Forecasting, John Lewis, Monthly Weather Forecasting (2005) – free paper
Predicting uncertainty in forecasts of weather and climate, T.N. Palmer (1999), also published as ECMWF Technical Memorandum No. 294 – free paper
Representing Model Uncertainty in Weather and Climate Prediction, TN Palmer, GJ Shutts, R Hagedorn, FJ Doblas-Reyes, T Jung & M Leutbecher, Annual Review Earth Planetary Sciences (2005) – free paper