In Part Four – The Thirty Year Myth we looked at the idea of climate as the “long term statistics” of weather. In one case, climate = statistics of weather, has been arbitrarily defined as over a 30 year period. In certain chaotic systems, “long term statistics” might be repeatable and reliable, but “long term” can’t be arbitrarily defined for convenience. Climate, when defined as predictable statistics of weather, might just as well be 100,000 years (note 1)
I’ve had a question about the current approach to climate models for some time and found it difficult to articulate. In reading Broad range of 2050 warming from an observationally constrained large climate model ensemble, Daniel Rowlands et al, Nature (2012) I found an explanation that helps me clarify my question.
This paper by Rowlands et al is similar in approach to that of Stainforth et al 2005 – the idea of much larger ensembles of climate models. The Stainforth paper was discussed in the comments of Models, On – and Off – the Catwalk – Part Four – Tuning & the Magic Behind the Scenes.
For new readers who want to understand a bit more about ensembles of models – take a look at Ensemble Forecasting.
The basic idea behind ensembles for weather forecasts is that we have uncertainty about:
- the initial conditions – because observations are not perfect
- parameters in our model – because our understanding of the physics of weather is not perfect
So multiple simulations are run and the frequency of occurrence of, say, a severe storm tells us the probability that the severe storm will occur.
Given the short term nature of weather forecasts we can compare the frequency of occurrence of particular events with the % probability that our ensemble produced.
Let’s take an example to make it clear. Suppose the ensemble prediction of a severe storm in a certain area is 5%. The severe storm occurs. What can we make of the accuracy our prediction? Well, we can’t deduce anything from that event.
Why? Because we only had one occurrence.
Out of a 1000 future forecasts, the “5%ers” are going to occur 50 times – if we are right on the money with our probabilistic forecast. We need a lot of forecasts to be compared with a lot of results. Then we might find that 5%ers actually occur 20% of the time. Or only 1% of the time. Armed with this information we can a) try and improve our model because we know the deficiencies, and b) temper our ensemble forecast with our knowledge of how well it has historically predicted the 5%, 10%, 90% chances of occurrence.
This is exactly what currently happens with numerical weather prediction.
And if instead we run one simulation with our “best estimate” of initial conditions and parameters the results are not as good as the results from the ensemble.
The idea behind ensembles of climate forecasts is subtly different. Initial conditions are no help with predicting the long term statistics (aka “climate”). But we still have a lot of uncertainty over model physics and parameterizations. So we run ensembles of simulations with slightly different physics/parameterizations (see note 2).
Assuming our model is a decent representation of climate, there are three important points:
- we need to know the timescale of “predictable statistics”, given constant “external” forcings (e.g. anthropogenic GHG changes)
- we need to cover the real range of possible parameterizations
- the results we get from ensembles can, at best, only ever give us the probabilities of outcomes over a given time period
Item 1 was discussed in the last article and I have not been able to find any discussion of this timescale in climate science papers (that doesn’t mean there aren’t any, hopefully someone can point me to a discussion of this topic).
Item 2 is something that I believe climate scientists are very interested in. The limitation has been, and still is, the computing power required.
Item 3 is what I want to discuss in this article, around the paper by Rowlands et al.
Rowlands et al 2012
In the latest generation of coupled atmosphere–ocean general circulation models (AOGCMs) contributing to the Coupled Model Intercomparison Project phase 3 (CMIP-3), uncertainties in key properties controlling the twenty-first century response to sustained anthropogenic greenhouse-gas forcing were not fully sampled, partially owing to a correlation between climate sensitivity and aerosol forcing, a tendency to overestimate ocean heat uptake and compensation between short-wave and long-wave feedbacks.
This complicates the interpretation of the ensemble spread as a direct uncertainty estimate, a point reflected in the fact that the ‘likely’ (>66% probability) uncertainty range on the transient response was explicitly subjectively assessed as −40% to +60% of the CMIP-3 ensemble mean for global-mean temperature in 2100, in the Intergovernmental Panel on Climate Change (IPCC) Fourth Assessment Report (AR4). The IPCC expert range was supported by a range of sources, including studies using pattern scaling, ensembles of intermediate-complexity models, and estimates of the strength of carbon-cycle feedbacks. From this evidence it is clear that the CMIP-3 ensemble, which represents a valuable expression of plausible responses consistent with our current ability to explore model structural uncertainties, fails to reflect the full range of uncertainties indicated by expert opinion and other methods..
..Perturbed-physics ensembles offer a systematic approach to quantify uncertainty in models of the climate system response to external forcing. Here we investigate uncertainties in the twenty-first century transient response in a multi-thousand-member ensemble of transient AOGCM simulations from 1920 to 2080 using HadCM3L, a version of the UK Met Office Unified Model, as part of the climateprediction.net British Broadcasting Corporation (BBC) climate change experiment (CCE). We generate ensemble members by perturbing the physics in the atmosphere, ocean and sulphur cycle components, with transient simulations driven by a set of natural forcing scenarios and the SRES A1B emissions scenario, and also control simulations to account for unforced model drifts.
[Emphasis added]. So this project runs a much larger ensemble than the CMIP3 models produced for AR4.
Figure 1 shows the evolution of global-mean surface temperatures in the ensemble (relative to 1961–1990), each coloured by the goodness-of-fit to observations of recent surface temperature changes, as detailed below.
The raw ensemble range (1.1–4.2 K around 2050), primarily driven by uncertainties in climate sensitivity (Supplementary Information), is potentially misleading because many ensemble members have an unrealistic response to the forcing over the past 50 years.
And later in the paper:
..On the assumption that models that simulate past warming realistically are our best candidates for making estimates of the future..
So here’s my question:
If model simulations give us probabilistic forecasts of future climate, why are climate model simulations “compared” with the average of the last few years current “weather” – and those that don’t match up well are rejected or devalued?
It seems like an obvious thing to do, of course. But current averaged weather might be in the top 10% or the bottom 10% of probabilities. We have no way of knowing.
Let’s say that the current 10-year average of GMST = 13.7ºC (I haven’t looked up the right value).
Suppose for the given “external” conditions (solar output and latitudinal distribution, GHG concentration) the “climate” – i.e., the real long term statistics of weather – has an average of 14.5ºC, with a standard deviation for any 10-year period of 0.5ºC. That is, 95% of 10-year periods would lie inside 13.5 – 15.5ºC (2 std deviations).
If we run a lot of simulations (and they truly represent the climate) then of course we expect 5% to be outside 13.5 – 15.5ºC. If we reject that 5% as being “unrealistic of current climate”, we’ve arbitrarily and incorrectly reduced the spread of our ensemble.
If we assume that “current averaged weather” – at 13.7ºC – represents reality then we might bias our results even more, depending on the standard deviation that we calculate or assume. We might accept outliers of 13.0ºC because they are closer to our observable and reject good simulations of 15.0ºC because they are more than two standard deviations from our observable (note 3).
The whole point of running an ensemble of simulations is to find out what the spread is, given our current understanding of climate physics.
Let me give another example. One theory for initiation of El Nino is that its initiation is essentially a random process during certain favorable conditions. Now we might have a model that reproduced El Nino starting in 1998 and 10 models that reproduced El Nino starting in other years. Do we promote the El Nino model that “predicted in retrospect” 1998 and demote/reject the others? No. We might actually be rejecting better models. We would need to look at the statistics of lots of El Ninos to decide.
Kiehl 2007 & Knutti 2008
Here’s a couple of papers that don’t articulate the point of view of this article – however, they do comment on the uncertainties in parameter space from a different and yet related perspective.
First, Kiehl 2007:
Methods of testing these models with observations form an important part of model development and application. Over the past decade one such test is our ability to simulate the global anomaly in surface air temperature for the 20th century.. Climate model simulations of the 20th century can be compared in terms of their ability to reproduce this temperature record. This is now an established necessary test for global climate models.
Of course this is not a sufficient test of these models and other metrics should be used to test models..
..A review of the published literature on climate simulations of the 20th century indicates that a large number of fully coupled three dimensional climate models are able to simulate the global surface air temperature anomaly with a good degree of accuracy [Houghton et al., 2001]. For example all models simulate a global warming of 0.5 to 0.7°C over this time period to within 25% accuracy. This is viewed as a reassuring confirmation that models to first order capture the behavior of the physical climate system..
One curious aspect of this result is that it is also well known [Houghton et al., 2001] that the same models that agree in simulating the anomaly in surface air temperature differ significantly in their predicted climate sensitivity. The cited range in climate sensitivity from a wide collection of models is usually 1.5 to 4.5°C for a doubling of CO2, where most global climate models used for climate change studies vary by at least a factor of two in equilibrium sensitivity.
The question is: if climate models differ by a factor of 2 to 3 in their climate sensitivity, how can they all simulate the global temperature record with a reasonable degree of accuracy.
Second, Why are climate models reproducing the observed global surface warming so well? Knutti (2008):
The agreement between the CMIP3 simulated and observed 20th century warming is indeed remarkable. But do the current models simulate the right magnitude of warming for the right reasons? How much does the agreement really tell us?
Kiehl  recently showed a correlation of climate sensitivity and total radiative forcing across an older set of models, suggesting that models with high sensitivity (strong feedbacks) avoid simulating too much warming by using a small net forcing (large negative aerosol forcing), and models with weak feedbacks can still simulate the observed warming with a larger forcing (weak aerosol forcing).
Climate sensitivity, aerosol forcing and ocean diffusivity are all uncertain and relatively poorly constrained from the observed surface warming and ocean heat uptake [e.g., Knutti et al., 2002; Forest et al., 2006]. Models differ because of their underlying assumptions and parameterizations, and it is plausible that choices are made based on the model’s ability to simulate observed trends..
..Models, therefore, simulate similar warming for different reasons, and it is unlikely that this effect would appear randomly. While it is impossible to know what decisions are made in the development process of each model, it seems plausible that choices are made based on agreement with observations as to what parameterizations are used, what forcing datasets are selected, or whether an uncertain forcing (e.g., mineral dust, land use change) or feedback (indirect aerosol effect) is incorporated or not.
..Second, the question is whether we should be worried about the correlation between total forcing and climate sensitivity. Schwartz et al.  recently suggested that ‘‘the narrow range of modelled temperatures [in the CMIP3 models over the 20th century] gives a false sense of the certainty that has been achieved’’. Because of the good agreement between models and observations and compensating effects between climate sensitivity and radiative forcing (as shown here and by Kiehl ) Schwartz et al.  concluded that the CMIP3 models used in the most recent Intergovernmental Panel on Climate Change (IPCC) report [IPCC, 2007] ‘‘may give a false sense of their predictive capabilities’’.
Here I offer a different interpretation of the CMIP3 climate models. They constitute an ‘ensemble of opportunity’, they share biases, and probably do not sample the full range of uncertainty [Tebaldi and Knutti, 2007; Knutti et al., 2008]. The model development process is always open to influence, conscious or unconscious, from the participants’ knowledge of the observed changes. It is therefore neither surprising nor problematic that the simulated and observed trends in global temperature are in good agreement.
The idea that climate models should all reproduce global temperature anomalies over a 10-year or 20-year or 30-year time period, presupposes that we know:
a) climate, as the long term statistics of weather, can be reliably obtained over these time periods. Remember that with a simple chaotic system where we have “deity like powers” we can simulate the results and find the time period over which the statistics are reliable.
b) climate, as the 10-year (or 20-year or 30-year) statistics of weather is tightly constrained within a small range, to a high level of confidence, and therefore we can reject climate model simulations that fall outside this range.
Given that this Rowlands et al 2012 is attempting to better sample climate uncertainty by a larger ensemble it’s clear that this answer is not known in advance.
There are a lot of uncertainties in climate simulation. Constraining models to match the past may be under-sampling the actual range of climate variability.
Models are not reality. But if we accept that climate simulation is, at best, a probabilistic endeavor, then we must sample what the models produce, rather than throwing out results that don’t match the last 100 years of recorded temperature history.
Broad range of 2050 warming from an observationally constrained large climate model ensemble, Daniel Rowlands et al, Nature (2012) – free paper
Uncertainty in predictions of the climate response to rising levels of greenhouse gases, Stainforth et al, Nature (2005) – free paper
Why are climate models reproducing the observed global surface warming so well? Reto Knutti, GRL (2008) – free paper
Twentieth century climate model response and climate sensitivity, Jeffrey T Kiehl, GRL (2007) – free paper
Note 1: We are using the ideas that have been learnt from simple chaotic systems, like the Lorenz 1963 model. There is discussion of this in Part One and Part Two of this series. As some commenters have pointed out that doesn’t mean the climate works in the same way as these simple systems, it is much more complex.
The starting point is that weather is unpredictable. With modern numerical weather prediction (NWP) on current supercomputers we can get good forecasts 1 week ahead. But beyond that we might as well use the average value for that month in that location, measured over the last decade. It’s going to be better than a forecast from NWP.
The idea behind climate prediction is that even though picking the weather 8 weeks from now is a no-hoper, what we have learnt from simple chaotic systems is that the statistics of many chaotic systems can be reliably predicted.
Note 2: Models are run with different initial conditions as well. My only way of understanding this from a theoretical point of view (i.e., from anything other than a “practical” or “this is how we have always done it” approach) is to see different initial conditions as comparable to one model run over a much longer period.
That is, if climate is not an “initial value problem”, why are initial values changed in each ensemble member to assist climate model output? Running 10 simulations of the same model for 100 years, each with different initial conditions, should be equivalent to running one simulation for 1,000 years.
Well, that is not necessarily true because that 1,000 years might not sample the complete “attractor space”, which is the same point discussed in the last article.
Note 3: Models are usually compared to observations via temperature anomalies rather than via actual temperatures, see Models, On – and Off – the Catwalk – Part Four – Tuning & the Magic Behind the Scenes. The example was given for simplicity.