Feeds:
Posts

## Natural Variability and Chaos – Three – Attribution & Fingerprints

I’ve been somewhat sidetracked on this series, mostly by starting up a company and having no time, but also by the voluminous distractions of IPCC AR5. The subject of attribution could be a series by itself but as I started the series Natural Variability and Chaos it makes sense to weave it into that story.

In Part One and Part Two we had a look at chaotic systems and what that might mean for weather and climate. I was planning to develop those ideas a lot more before discussing attribution, but anyway..

AR5, Chapter 10: Attribution is 85 pages on the idea that the changes over the last 50 or 100 years in mean surface temperature – and also some other climate variables – can be attributed primarily to anthropogenic greenhouse gases.

The technical side of the discussion fascinated me, but has a large statistical component. I’m a rookie with statistics, and maybe because of this, I’m often suspicious about statistical arguments.

### Digression on Statistics

The foundation of a lot of statistics is the idea of independent events. For example, spin a roulette wheel and you get a number between 0 and 36 and a color that is red, black – or if you’ve landed on a zero, neither.

The statistics are simple – each spin of the roulette wheel is an independent event – that is, it has no relationship with the last spin of the roulette wheel. So, looking ahead, what is the chance of getting 5 two times in a row? The answer (with a 0 only and no “00” as found in some roulette tables) is 1/37 x 1/37 = 0.073%.

However, after you have spun the roulette wheel and got a 5, what is the chance of a second 5? It’s now just 1/37 = 2.7%. The past has no impact on the future statistics. Most of real life doesn’t correspond particularly well to this idea, apart from playing games of chance like poker and so on.

I was in the gym the other day and although I try and drown it out with music from my iPhone, the Travesty (aka “the News”) was on some of the screens in the gym – with text of the “high points” on the screen aimed at people trying to drown out the annoying travestyreaders. There was a report that a new study had found that autism was caused by “Cause X” – I have blanked it out to avoid any unpleasant feeling for parents of autistic kids – or people planning on having kids who might worry about “Cause X”.

It did get me thinking – if you have let’s say 10,000 potential candidates for causing autism, and you set the bar at 95% probability of rejecting the hypothesis that a given potential cause is a factor, what is the outcome? Well, if there is a random spread of autism among the population with no actual cause (let’s say it is caused by a random genetic mutation with no link to any parental behavior, parental genetics or the environment) then you will expect to find about 500 “statistically significant” factors for autism simply by testing at the 95% level. That’s 500, when none of them are actually the real cause. It’s just chance. Plenty of fodder for pundits though.

That’s one problem with statistics – the answer you get unavoidably depends on your frame of reference.

The questions I have about attribution are unrelated to this specific point about statistics, but there are statistical arguments in the attribution field that seem fatally flawed. Luckily I’m a statistical novice so no doubt readers will set me straight.

On another unrelated point about statistical independence, only slightly more relevant to the question at hand, Pirtle, Meyer & Hamilton (2010) said:

In short, we note that GCMs are commonly treated as independent from one another, when in fact there are many reasons to believe otherwise. The assumption of independence leads to increased confidence in the ‘‘robustness’’ of model results when multiple models agree. But GCM independence has not been evaluated by model builders and others in the climate science community. Until now the climate science literature has given only passing attention to this problem, and the field has not developed systematic approaches for assessing model independence.

.. end of digression

In my efforts to understand Chapter 10 of AR5 I followed up on a lot of references and ended up winding my way back to Hegerl et al 1996.

Gabriele Hegerl is one of the lead authors of Chapter 10 of AR5, was one of the two coordinating lead authors of the Attribution chapter of AR4, and one of four lead authors on the relevant chapter of AR3 – and of course has a lot of papers published on this subject.

As is often the case, I find that to understand a subject you have to start with a focus on the earlier papers because the later work doesn’t make a whole lot of sense without this background.

This paper by Hegerl and her colleagues use the work of one of the co-authors, Klaus Hasselmann – his 1993 paper “Optimal fingerprints for detection of time dependent climate change”.

Fingerprints, by the way, seems like a marketing term. Fingerprints evokes the idea that you can readily demonstrate that John G. Doe of 137 Smith St, Smithsville was at least present at the crime scene and there is no possibility of confusing his fingerprints with John G. Dode who lives next door even though their mothers could barely tell them apart.

This kind of attribution is more in the realm of “was it the 6ft bald white guy or the 5’5″ black guy”?

Well, let’s set aside questions of marketing and look at the details.

### Detecting GHG Climate Change with Optimal Fingerprint Methods in 1996

The essence of the method is to compare observations (measurements) with:

• model runs with GHG forcing
• model runs with “other anthropogenic” and natural forcings
• model runs with internal variability only

Then based on the fit you can distinguish one from the other. The statistical basis is covered in detail in Hasselmann 1993 and more briefly in this paper: Hegerl et al 1996 – both papers are linked below in the References.

At this point I make another digression.. as regular readers know I am fully convinced that the increases in CO2, CH4 and other GHGs over the past 100 years or more can be very well quantified into “radiative forcing” and am 100% in agreement with the IPCCs summary of the work of atmospheric physics over the last 50 years on this topic. That is, the increases in GHGs have led to something like a “radiative forcing” of 2.8 W/m² [corrected, thanks to niclewis].

And there isn’t any scientific basis for disputing this “pre-feedback” value. It’s simply the result of basic radiative transfer theory, well-established, and well-demonstrated in observations both in the lab and through the atmosphere. People confused about this topic are confused about science basics and comments to the contrary may be allowed or more likely will be capriciously removed due to the fact that there have been more than 50 posts on this topic (post your comments on those instead). See The “Greenhouse” Effect Explained in Simple Terms and On Uses of A 4 x 2: Arrhenius, The Last 15 years of Temperature History and Other Parodies.

Therefore, it’s “very likely” that the increases in GHGs over the last 100 years have contributed significantly to the temperature changes that we have seen.

To say otherwise – and still accept physics basics – means believing that the radiative forcing has been “mostly” cancelled out by feedbacks while internal variability has been amplified by feedbacks to cause a significant temperature change.

Yet this work on attribution seems to be fundamentally flawed.

Here was the conclusion:

We find that the latest observed 30-year trend pattern of near-surface temperature change can be distinguished from all estimates of natural climate variability with an estimated risk of less than 2.5% if the optimal fingerprint is applied.

With the caveats, that to me, eliminated the statistical basis of the previous statement:

The greatest uncertainty of our analysis is the estimate of the natural variability noise level..

..The shortcomings of the present estimates of natural climate variability cannot be readily overcome. However, the next generation of models should provide us with better simulations of natural variability. In the future, more observations and paleoclimatic information should yield more insight into natural variability, especially on longer timescales. This would enhance the credibility of the statistical test.

Earlier in the paper the authors said:

..However, it is generally believed that models reproduce the space-time statistics of natural variability on large space and long time scales (months to years) reasonably realistic. The verification of variability of CGMCs [coupled GCMs] on decadal to century timescales is relatively short, while paleoclimatic data are sparce and often of limited quality.

..We assume that the detection variable is Gaussian with zero mean, that is, that there is no long-term nonstationarity in the natural variability.

The climate models used would be considered rudimentary by today’s standards. Three different coupled atmosphere-ocean GCMs were used. However, each of them required “flux corrections”.

This method was pretty much the standard until the post 2000 era. The climate models “drifted”, unless, in deity-like form, you topped up (or took out) heat and momentum from various grid boxes.

That is, the models themselves struggled (in 1996) to represent climate unless the climate modeler knew, and corrected for, the long term “drift” in the model.

### Conclusion

In the next article we will look at more recent work in attribution and fingerprints and see whether the field has developed.

But in this article we see that the conclusion of an attribution study in 1996 was that there was only a “2.5% chance” that recent temperature changes could be attributed to natural variability. At the same time, the question of how accurate the models were in simulating natural variability was noted but never quantified. And the models were all “flux corrected”. This means that some aspects of the long term statistics of climate were considered to be known – in advance.

So I find it difficult to accept any statistical significance in the study at all.

If the finding instead was introduced with the caveat “assuming the accuracy of our estimates of long term natural variability of climate is correct..” then I would probably be quite happy with the finding. And that question is the key.

The question should be:

What is the likelihood that climate models accurately represent the long-term statistics of natural variability?

• Virtually certain
• Very likely
• Likely
• About as likely as not
• Unlikely
• Very unlikely
• Exceptionally unlikely

So far I am yet to run across a study that poses this question.

### Articles in the Series

Natural Variability and Chaos – One – Introduction

Natural Variability and Chaos – Two – Lorenz 1963

Natural Variability and Chaos – Three – Attribution & Fingerprints

Natural Variability and Chaos – Four – The Thirty Year Myth

Natural Variability and Chaos – Five – Why Should Observations match Models?

Natural Variability and Chaos – Six – El Nino

Natural Variability and Chaos – Seven – Attribution & Fingerprints Or Shadows?

Natural Variability and Chaos – Eight – Abrupt Change

### References

Bindoff, N.L., et al, 2013: Detection and Attribution of Climate Change: from Global to Regional. In: Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change

Detecting greenhouse gas induced climate change with an optimal fingerprint method, Hegerl, von Storch, Hasselmann, Santer, Cubasch & Jones, Journal of Climate (1996)

What does it mean when climate models agree? A case for assessing independence among general circulation models, Zachary Pirtle, Ryan Meyer & Andrew Hamilton, Environ. Sci. Policy (2010)

Optimal fingerprints for detection of time dependent climate change, Klaus Hasselmann, Journal of Climate (1993)

### 48 Responses

1. In good statistical analysis the goal is always to reach conclusions that are as objective as possible, but the dilemma is that statistical inference is never fully objective, as explained by the literature on Bayesian inference.

The basic idea of trying to determine which observations can best differentiate warming due to GHGs from other changes in the climate is sound, but suffers significantly from typical problems of statistical inference from history data.

The approach of the Hegerl et al paper is to use climate models to tell, how various changes in the input affect the observable output variables in those models. Using that kind of analysis it’s possible to derive statistical confidence levels valid assuming that the models are right on all relevant details and that also other assumptions used in the analysis are correct, but telling objectively, how likely it is that models and other assumptions are right enough on those details is obviously impossible.

One fundamental problem that persists still in the present models – and is likely to persist also in future models – is that all relevant models have been developed using knowledge about historical climate both explicitly and implicitly at innumerable points during the development process. Their authors have known that certain choices will lead to problems in explaining the past, and therefore avoided those choices even, when these problems are the only real reason for that. After this kind of development process no data that may have influenced by the choices or later tuning of the model can provide a fully independent test of it’s validity. Such tests are still useful and meaningful, but deriving quantitative confidence limits from such tests breaks purist rules making the resulting numerical values suspect. Only predictions about the future are totally free of this problem, but it will take very long before a sufficient amount of new observational data becomes available. New observations that tell about some previously unknown feature of the past are not as good, because there are correlations between various parts of the past Earth system.

This kind of problems have led IPCC to present Guidance Note for Lead Authors of the IPCC Fifth Assessment Report on Consistent Treatment of Uncertainties. If the issue were not difficult, no such guidance note had been necessary, but there are good reasons to ask, how far this kind of guidance can really resolve the important issues.

One essential factor in the guidance is the acceptance of the partially subjective nature of all inference. That has led to the use of two metrics:

– Confidence in the validity of a finding, based on the type, amount, quality, and consistency of evidence (e.g., mechanistic understanding, theory, data, models, expert judgment) and the degree of agreement. Confidence is expressed qualitatively.
– Quantified measures of uncertainty in a finding expressed probabilistically (based on statistical analysis of observations or model results, or expert judgment).

The first is fully based on subjective assessment, while the second is to a large part objective with subjective contribution, as I have discussed above.

2. The AR5 attribution statement simply fails if Natural variability is found to be non-zero on decadal time frames. However, if the analysis of Tsung and Zhouis correct then the AMO has been responsible for about 40% of the observed 20th century warming since1950.

So where is the evidence that models accurately describe Natural Variability ?
Do models simply use stochastic noise and if so, what is the amplitude and why?

AR5 is a little ambiguous on this point:

The observed recent warming hiatus, defined as the reduction in GMST trend during 1998–2012 as compared to the trend during 1951–2012, is attributable in roughly equal measure to a cooling contribution from internal variability and a reduced trend in external forcing (expert judgement, medium confidence).The forcing trend reduction is primarily due to a negative forcing trend from both volcanic eruptions and the downward phase of the solar cycle. However, there is low confidence in quantifying the role of forcing trend in causing the hiatus because of uncertainty in the magnitude of the volcanic forcing trends and low confidence in the aerosol forcing trend. Many factors, in addition to GHGs, including changes in tropospheric and stratospheric aerosols, stratospheric water vapour, and solar output, as well as internal modes of variability, contribute to the year-to-year and decade- to-decade variability of GMST.

The Atlantic Multi-decadal Oscillation (AMO) could be a confounding influence but studies that find a significant role for the AMO show that this does not project strongly onto 1951–2010 temperature trends.

Then later on it says

Zhou and Tung (2013a) show that GMST are consistent with a linear anthropogenic trend, enhanced variability due to an approximately 70-year Atlantic Meridional Oscillation (AMO) and shorter-term variability. If, however, there are physical grounds to expect a nonlinear anthropogenic trend (see Box 10.1 Figure 1a), the assumption of a linear trend can itself enhance the variance assigned to a low-frequency oscillation. The fact that the AMO index is estimated from detrended historical temperature observations further increases the risk that its variance may be overestimated, because regressors and regressands are not independent. Folland et al. (2013), using a physically based estimate of the anthropogenic trend, find a smaller role for the AMO in recent warming. To summarize, recent studies using spatial features of observed temperature variations to separate AMO variability from externally forced changes find that detection of external influence on global temperatures is not compromised by accounting for AMO-congruent variability (high confidence). There remains some uncertainty about how much decadal variability of GMST that is attributed to AMO in some studies is actually related to forcing, notably from aerosols. There is agreement among studies that the contribution of the AMO to global warming since 1951 is very small (considerably less than 0.1°C; see also Figure 10.6) and given that observed warming since 1951 is very large compared to climate model estimates of internal variability (Section 10.3.1.1.2), which are assessed to be adequate at global scale (Section 9.5.3.1), we conclude that it is virtually certain that internal variability alone cannot account for the observed global warming since 1951.

I get the impression that the authors are not over-confident that natural variability is insignificant.

• The AR5 statement refers to likelihood of anthropogenic forcing being responsible for >50% of observed warming. If internal variability contributed 40% that would be consistent with the statement. Though note that the 40% figure refers to the period 1961-2010. Their method would produce a smaller internal variability attribution for 1951-2010, which is the proper comparison to AR5.

Regarding modelled internal variability, models are not homogeneous on this issue. See Ed Hawkins’ plot of pi control runs: http://www.climate-lab-book.ac.uk/2013/variable-variability/

I’ve used Berkeley Earth land data back to 1800 to check against a model simulation of the past millennium+historical (MPI-ESM-P) and on 60-year timescales there was absolutely no indication of variability behaviour which would be clearly different from what the model could produce. Trend magnitudes tracked quite nicely all the way to the present. Also, the peaks and troughs of 60-year trends substantially coincided, indicating dominant forced component and relatively weak internal variability on such timescales: about +/-0.2K at most.

• Found a quick+dirty plot I made relating to the last paragraph here.

x-axis is start years of 60-year trend periods, y-axis is K/60yr.

• Paul S,

Models get their 60 year periodic variation by tweaking aerosol magnitude and sensitivity. It’s not at all clear that’s what’s happening in the real world. Quasi-periodic oscillations are common in chaotic systems. The triggers can be quite small and don’t qualify as a forcing. Reasonably accurate hindcasting is a necessary condition for validation. It isn’t sufficient.

• DeWitt,

Note that this is partly an “out-of-sample” comparison. The simulations were performed circa 2011, prior to the Berkeley release.

What is clear is that the 1815 Tambora eruption was huge in relative terms and we would therefore expect a large cooling trend for 60-year trends with end points after about 1820. We would then expect 60-year trends ending around 1870-1880 to show large warming due to recovery from this volcanic cool period. We would then expect settling down into a relatively calm period due to lesser volcanic activity.

None of these expectations relies on any form of “tweaking” and yet each of these expected effects are what we see in the Berkeley observations. Likewise with anthropogenic forcing over the past 60 years. We expect, based on basic understanding, a relatively large warming trend and we see relatively large warming trend. Of course it could be that all of these feature matches are purely coincidental, and the variability is instead the result of unforced oscillations… but you’re playing with long odds.

• Paul S,

But we don’t expect to see the reduction in trend from ~1950-1975, nor do we expect to see the current slowing in the trend. The recent faster than expected loss of Arctic Sea ice, not to mention the warming around Svalbard in the early twentieth century are consistent with quasi-periodic fluctuations in ocean currents in the Atlantic, i.e. the AMO. Not so long odds after all, IMO.

• By the way, calculating a 60 year trend removes any variability introduced by an oscillation with a period of 60 years, so I don’t really see that your plots are relevant with respect to the AMO.

• Dewitt,

But you’re now talking about variability over shorter periods. The AR5 attribution statement refers to a 60-year period, so I’ve been addressing potential for internal variability to affect trends over that timescale.

Of course as you go to smaller timescales there is increased potential for internal variability to cancel out or amplify forced changes. And as you go to regional spatial scales internal variability could dominate.

• Paul S,

From the attribution paper cited above:

We find that the latest observed 30-year trend pattern of near-surface temperature change can be distinguished from all estimates of natural climate variability with an estimated risk of less than 2.5% if the optimal fingerprint is applied.

So again, where is your justification for 60 year trends? When you filter out frequencies above 0.017/year, the climate sensitivity looks a lot lower.

• From the comment to which I replied:

• Paul S,

A serious problem with a 60 period is that it consumes a lot of degrees of freedom. Calculating 60 year trends is even worse than a 60 year moving average. Each trend calculated consumes 2 degrees of freedom. So if you start with 180 years of data, you only have about 60 degrees of freedom left after you construct your 120 year plot. 180 years of data is not enough. The rule of thumb is that you don’t want to consume more than 10% of the original degrees of freedom. You would need 1200 years of data for that.

AR5 calculates one 50 or 60 year trend, not 120.

• A 60 year moving trend analysis may only consume 62 degrees of freedom. But the problem is that the time series is autocorrelated to start with so the number of degrees of freedom is already smaller than the number of data points.

Then there’s the possibility that the temperature time series obeys Hurst-Kolmogorov statistics, i.e. the noise structure is fractionally integrated with an integration parameter 0<d≤0.5. That means that instead of the variability becoming constant at longer intervals, it continues to increase. So your statement that the natural variability would be greater for intervals less than 30 years than for 60 years kmay be incorrect. The instrumental record isn’t long enough to test for this, but there are indications from the ice core data that Hurst-Kolmogorov statistics do apply. See the work of Demetris Koutsoyiannis.

3. All major climate models used for attribution studies contain many parameters that must be tuned one at a time. The best set of parameters for a given model are those that best represent our current climate, but the parameters also influence climate sensitivity. Few scientists and funding agencies would be interested in a model that didn’t reproduce the historical warming record, no matter how well it reproduced current climate. The IPCC probably wouldn’t use it. So attribution may be an inherently circular process: Models with certain parameters survive because they attribute warming to humans, not to chaotic or oscillatory unforced variability. The surviving models attribute warming to humans.

For example, climate models have long attributed the pause in warming around 1960 to increasing aerosols. Aerosol parameters may have been tuned to fit this pause. According to Nic Lewis, all current models currently attribute more cooling to aerosols than the “likely” range reported in AR5. And the recent hiatus in warming (not be accompanied by a measurably increase in aerosols) suggests that both the 1960s and current pauses may be oscillations.

4. on November 5, 2014 at 2:30 pm | Reply WebHubTelescope

It’s clear that the natural variability on the year-to-year time scale is controlled by ENSO and the sporadic volcanic activity. No real way to predict eruptions, so we are left with predicting ENSO. And it turns out to be not so difficult to predict ENSO after all: http://arxiv.org/abs/1411.0815

5. The idea that natural variation due to the AMO caused about 40% of the variation over the last century or so still does not address any other sources of natural variation (PDO, ENSO, solar changes in UV or other effects, magnetic field variation, etc.). Since the average temperature over the Holocene is thought (by proxies) to have varied up and down, and been warmer than present over much of the period, and since natural variation had to be the main cause before, why do you think the recent variation had to be over 50% caused by human activity? What caused the previous variation? Since the climate is clearly a bounded chaoetic process, with variations in ocean currents and storage, seasons, high latitude ice variation, and more, there is no basis for assuming the recent warm level is significantly based on any human activity. The argument that negative feedback can’t possibly reduce the small human effect to even smaller level is without any basis. Cloud variation and thus albedo change is a reasonable model to explain some negative feedback, but the actual cause of the recent plateau may simply be that natural variation simply was much larger than the human effect.

6. Studies assessing ability of CMIP models to reproduce long term variability statistics:

Knutson et al. 2013
http://journals.ametsoc.org/doi/abs/10.1175/JCLI-D-12-00567.1

van Oldenborgh et al. 2013
http://iopscience.iop.org/1748-9326/8/1/014055

7. SOD: Recently, McShane and Wyner showed that Brownian noise (but not noise with less autocorrelated) performed about as effectively as real proxy data in reconstructing the historical temperature record. From the CMIP5 project, we apparently learned that models are incapable of hindcasting decadal climate variability. However, Hegerl (1996) claims to have separated a warming signal from the typical climate noise present in a far less sophisticated models by looking for patterns in 15-30 year trends. In your reading, have you ever come across anyone who has tested the reliability of “optimal fingerprinting” by using something besides a climate model to generate spatially- and temporarily-autocorrelated noise?

One thing that annoys me is the use of “scaling factors”. Box 10.1, Figure 1 of AR5 WG1 basically says to me that climate models overestimate the anthropogenic component of warming by about 10% and the natural component of warming (volcanos) by about 40%. (The reciprocal of the 0.7 gradient is 1.4.) The latter is presumably mostly due to the high sensitivity to aerosols used by models to reproduce the 1960’s pause in warming. If this apparent error were corrected, the scaling factor for human induced warming would have to decrease to compensate. The total error in this fitting process is the distance from the cross to the point where both scaling factors are 1.0

• Frank,

What is the McShane and Wyner paper title?

I’m very weak on the “optimal fingerprinting” method so I don’t know how it has been used / tested in other areas.

If it comes down to the detailed mechanics of how the method works I will need to study. But in principle I think I understand the idea.

The scaling factors, as I understand them, are a result – not an input. So the method of “fingerprinting” looks for a “shape fit” of observations to the model results vs a “shape fit” to the “natural variability”. The scaling factors provide some insight into how accurately the magnitude of observations matches the magnitude of the model results. If the scaling factors are way off then it implies maybe the fit is a bit lucky rather than being a good model.. this is what the papers appear to say.

There’s quite a history of papers covering this mechanism. What’s strange is that the “pattern test” method gets great attention but the “null test” (i.e. the test against natural variability) seems to be just assumed to be right on the money, with the text of the papers usually noting in passing that “we probably need some work in this area”.

So what power has the statistical test got?
Or what are the premises upon which the statistical test relies?

• Frank refers probably to the paper that contributed to the argument on the value of multiproxy analyses of the past millennium. The paper is published in this openly accessible issue of The Annals of Applied Statistics (Vol 5, Nr 1, 2011).

I give the link to the issue, because the issue contains also 15 related discussion papers.

• Thanks, it looks fascinating. I’ve read the editorial and started on the discussion paper..

• Frank,

On my earlier comment on scaling, here is a comment from IPCC TAR, ch. 12, p.700:

This interpretation only makes sense, however, if it can be assumed that important sources of model error, such as missing or incorrectly represented atmospheric feedbacks, affect primarily the amplitude and not the structure of the response to external forcing.

The majority of relevant studies suggest that this is the case for the relatively small-amplitude changes observed to date, but the possibility of model errors changing both the amplitude and structure of the response remains an important caveat..

There are more detailed explanations in the various papers, such as Hasselmann and Hegerl et al cited in this article.

• SOD: For each type of forcing, we know from laboratory experiments how to convert a certain amount of forcing agent into a radiative forcing in terms of W/m2. (Given the heterogenous and complicated behavior of particulates, there is significant uncertainty in this process.) For no-feedbacks climate sensitivity, we know how to convert a radiative forcing into a temperature change. Climate models are simply more sophisticated tools for converting forcing agents (amount or W/m2) into a temperature change. So, if the climate models are correct, the scaling factor should be 1.0. If the scaling factor is 2, it is my interpretation that the attribution analysis concluded that the best fit to observations would be twice what the model predicted. If the error bar for the scaling factor includes 1.0, the model is consistent with the data. If the error bar for a particular forcing includes 0, one can’t reject the null hypothesis that forcing caused no change. (It is, of course, also possible I didn’t fully understand or remember what I read several years ago on this subject.)

Consider reading an attribution paper that separately considers both the first and second halves of the 20th century. I believe you will find that the error bar for GHG forcing includes zero for the first half (which is why all of the attribution statements are limited to warming in the second half of the 20th century). I think you will see how the scaling factor is used as a “fudge factor” so that the dynamic range of temperature change from the model matches the dynamic range of the observations. There is more temperature change during this period than expected from the [uncertain] natural and anthropogenic forcing, so the average scaling factor is greater than 1.

8. “That is, the increases in GHGs have led to something like a “radiative forcing” of 3.7 W/m².”

Did you mean to say that? I don’t think it is true. Per AR5 1750-2011 increases in (well -mixed) GHGs caused a radiative forcing of 2.83 W/m². The forcing from a doubling of atmospheric CO2 concentration is ~3.7 W/m².

I agree that detection & attribution studies are far from perfect. Even if the AOGCM control runs that they use to estimate natural variability do so realistically, I suspect that when D&A studies are based on the short period 1951-2010, as for the main AR5 attribution statement, the division of the warming between GHG and other anthropogenic causes is affected by the pattern of the AMO’s influence on surface temperatures.

So far as natural variability being realistic goes, the power spectra in Fig. 9.33 of AR5 WG1 may be worth taking a look at.

• Thanks niclewis. I was relying on memory for the value. I’ve fixed up the article.

• Here is the figure from AR5, click for a larger view:

As another perspective on the same topic over a longer time period/frequency domain, here is figure 2 from Links between annual, Milankovitch and continuum temperature variability,
Peter Huybers & William Curry, Nature (2006):

(For newcomers, the frequency axis is reversed in the two graphs).

9. Paul S, you said (your comment of November 7, 2014 at 5:50 pm):

..Of course it could be that all of these feature matches are purely coincidental, and the variability is instead the result of unforced oscillations… but you’re playing with long odds.

The way I think about the problem is not to pose it as “choose A or B, make your choice“.

As I said in the article, the fact that there has been radiative forcing is clear to everyone who grasps atmospheric physics basics. And this means that we expect some warming.

The question I prefer to pose is how reliable are our estimates of “natural variability”? Or more specifically, how reliable are our estimates of the statistics of “natural variability”?

I plan to take a look at your other points in more detail, I have been very short of time and I want to spend time on this important topic.

This article hasn’t yet covered the AR5 assessment of natural variability. I was led, from AR5, to one of the source papers that this article does cover.

This earlier paper intrigued me for two reasons:

1. No real question about the reliability of estimates of natural variability. That is, the question was not posed. However, enormous effort has been exerted on the question of distinguishing between models of climate response to external forcing vs “natural variability”. (And this enormous effort has continued to the present day).

2. The statistical significance for claiming “we have distinguished between the climate response to GHGs and natural variability” is about the same in 1996 as it is in 2013 in AR5. And yet in 1996 the models were quite spatially coarse and required modelers to “top up” momentum and heat from the atmosphere-ocean interaction to avoid “drift”.
Somehow the climate modelers assessment of the drift issue was so accurate – and their fix so accurate – that 17 years later the statistical significance for claiming “we have distinguished between the climate response to GHGs and natural variability” is about the same as for the rudimentary models of 1996. Did we just luck out?

I find this last point hard to fathom. Of course, it’s easy to pose the question instead as a dichotomy and then everyone can just take their position in the red corner or the blue corner..

• The way I think about the problem is not to pose it as “choose A or B, make your choice“.

That’s not how I’m posing things either. Clearly both are occurring. However, when the largest multi-decadal trends occur, of either sign, they occur when we expect such large magnitude trends to occur based on known radiative forcings. When expected trends from forcing are small observed trends have been small. This says to me that unforced temperature changes over 60-year timescales are small, about 0.2K at most.

If you look at 30-year periods I would say there is a greater potential for internal variability to influence.

The statistical significance for claiming “we have distinguished between the climate response to GHGs and natural variability” is about the same in 1996 as it is in 2013 in AR5.

I don’t think this is necessarily a like-for-like comparison. IPCC authors assess the results from papers for uncertainties not counted in the original error bars, such as those you’ve described here, which can result in larger assessed uncertainty. The TAR is the relevant IPCC report for that 1996 paper, in which the attribution statement suggested dominant anthropogenic influence was likely (>66% chance).

• Paul S,

Can you elaborate on this:

However, when the largest multi-decadal trends occur, of either sign, they occur when we expect such large magnitude trends to occur based on known radiative forcings. When expected trends from forcing are small observed trends have been small. This says to me that unforced temperature changes over 60-year timescales are small, about 0.2K at most.

What time period are you reviewing?

If you look at 30-year periods I would say there is a greater potential for internal variability to influence.

Why?

• Paul S.,

You say that the IPCC TAR applies a lower confidence to the result than the Hegerl et al 1996 study.

I’m a little confused reading the whole chapter on attribution as to where that comes from. What is clear is that this IPCC report concludes it is “very unlikely” that the observed temperature changes are “entirely due to internal variability”.

Anyone familiar with atmospheric radiation and the effect of increasing GHGs would surely go along with that. I’d say that it’s a red herring to introduce the idea of observed temperature changes being entirely due to internal variability.

Here are some extracts from that chapter:

p.697

The warming over the past 100 years is very unlikely to be due to internal variability alone as estimated by current models. Estimates of variability on the longer time-scales relevant to detection and attribution studies are uncertain.

Nonetheless, conclusions on the detection of an anthropogenic signal are insensitive to the model used to estimate internal variability and recent changes cannot be accounted for as pure internal variability even if the amplitude of simulated internal variations is increased by a factor of two or more. In most recent studies, the residual variability that remains in the observations after removal of the estimated anthropogenic signals is consistent with model-simulated variability on the space- and time-scales used for detection and attribution.

Note, however, that the power of the consistency test is limited. Detection studies to date have shown that the observed large-scale changes in surface temperature in recent decades are unlikely (bordering on very unlikely) to be entirely the result of internal variability

p.728

Changes over the past 30 to 50 years are very unlikely to be due to internal variability as simulated by current models.

p.730

The observed warming is inconsistent with model estimates of natural internal climate variability.

While these estimates vary substantially, on the annual to decadal time-scale they are similar, and in some cases larger, than obtained from observations. Estimates from models and observations are uncertain on the multi-decadal and longer time-scales required for detection.

Nonetheless, conclusions on the detection of an anthropogenic signal are insensitive to the model used to estimate internal variability. Recent observed changes cannot be accounted for as pure internal variability even if the amplitude of simulated internal variations is increased by a factor of two or more.

It is therefore unlikely (bordering on very unlikely) that natural internal variability alone can explain the changes in global climate over the 20th century (e.g., Figure 12.1).

[I don’t know whether TAR applies % confidence levels to words in the same way that AR4 and AR5 did].

The main study on internal variability seems to come from A Comparison of Surface Air Temperature Variability in Three 1000-Yr Coupled
Ocean–Atmosphere Model Integrations
, Stouffer, Hegerl & Tett, Journal of Climate (2001).

As with Hegerl et al 1996, what one hand gives, the other hand takes away:

The interpretation of the above results depends critically on the stationarity of the observed global mean SAT time series. From Figs. 12 and 13, it is clear that on long timescales, the model’s global mean SAT is nearly stationary..

..The geographical distribution of the variance computed from 1-yr mean and 5-yr mean SAT time series from the models and observations is largest over the extratropical continents and relatively small over the oceans. The tropical Pacific Ocean is the exception to this generalization. In this region, only the HadCM2 model adequately simulates the magnitude of the observed tropical SAT variability. However, the spatial extent and timescale of the HadCM2 tropical variability appear questionable.

It is clear that higher-resolution ocean models and improved parameterizations are needed in order to properly simulate the variability in this region..

..The assessment of the quality of the model’s variability simulation on longer than decadal timescales is very difficult as discussed earlier. The results presented here and by MS96 suggest that as the timescales become longer, the oceans play an increasingly important role in generating variability. The horizontal resolution of the oceanic component of the coupled models used here is coarse and does not resolve many oceanic processes such as oceanic eddies. The effects of the oceanic eddies are parameterized by subgrid-scale mixing schemes and the use of relatively large diffusion coefficients. The use of these subgrid-scale parameterizations may have an adverse effect on the simulation of oceanic variability.

And of course the models used in this paper also have “the hand of god” flux adjustments as described in the article.

10. My problem with attribution using models is that I don’t believe the frequency spectrum of the variability of an unforced model is anything like realistic. There’s questionable levels of dissipation built into the model physics so they don’t run away when unforced. Comparing the frequency spectrum of actual and hindcast temperatures is meaningless, IMO. The models have been designed and the forcings tweaked to make the hindcast work. It’s not at all surprising that the spectra more or less matched the spectra of the instrumental data.

11. Thanks for the post, an interesting topic.

Curiosity and cynicism makes me wonder exactly what “flux correction” and “drift” are. What is the process that determines whether energy should be vacuumed up? And post-2000 they no longer drift or has a ‘tool’ been put in place to stop/compensate for the drift?

• http://www.ipcc.ch/ipccreports/tar/wg1/315.htm

Ok I read up on it myself, some detail in the link above. If I understand it correctly then flux-adjustment is sorta like manipulating the output(?) data so its consistent with observations where as non-flux adjusted models post-2000 have had much more careful consideration given to input parameters in order to tune the models to the historical data. I understand this is probably crude/incorrect but if roughly true then I don’t necessarily see why either would be more desirable when it comes to attribution. Both seem to have god-like property only the second is coated with a veneer of physical plausibility.

• HR,

As I remember, flux adjustment used to be required to maintain conservation of energy. So adding or removing joules from an individual cell occurred between time steps or every so many time steps. Part of it had to do with running the model with prescribed sea surface temperatures rather than a coupled air/ocean model. I think. I should go look it up, but I’m not sufficiently motivated at the moment.

• HR,

The problem is that the atmosphere and ocean march to very different timescales. You have a very light fluid moving very fast across a very heavy fluid which is moving slowly. Likewise for heat transfer, you have a fast moving fluid with a very small heat capacity moving over a slow moving fluid with a very large heat capacity.

So that creates a technical problem with models. Each grid is way too big really.

I’ve never run a climate model (apart from the toy ones you see in this blog) but my understanding is that some problems manifested themselves very obviously – one grid or area going “off the rails”. This is normal with numerical solutions to differential equations until you are down to a small enough grid or an accurate enough parameterization.

Other problems manifested themselves by the climate becoming “obviously” wrong, where “obviously” is a judgement call.

As models improved various problems got sorted out. Some of this comes from improved resolution, some from better parameterization. All models get compared with observations to constrain the results.

But if you have a parameterization which makes the model work better – is this an “after the fact” curve fit? Or an improvement to the model’s representation of climate physics? That is, can it predict the future better?

Which I think is the question you are asking. Kind of the crux of the climate question. And the answer is..

• SOD wrote: “But if you have a parameterization which makes the model work better – is this an “after the fact” curve fit? Or an improvement to the model’s representation of climate physics? That is, can it predict the future better?

Which I think is the question you are asking. Kind of the crux of the climate question. And the answer is..”

What do you mean by a parameterization the makes the model “work better”? One can make models reproduce the historical record better (especially total warming and the 1960’s pause) by making them more sensitive to aerosols. That definition of “work better” is based on the hypothesis that unforced variability is negligible and that all warming is forced warming.

On the other hand, you can tune parameters to match what is currently observed. Any sensible model must have TOA OLR and reflected SWR and total precipitation within experimental error of the observed values since these are the parameters that control the major energy fluxes (radiation and latent heat). I think that the annual seasonal changes are interesting tests because these are experiments that can be repeated (every year is a replicate experiment) and don’t require stable measurements over many decades of slow change. I find the following paper (recommended by Pekka) on observed and modeled annual variations in TOA fluxes informative.

http://www.pnas.org/content/110/19/7568.full

• Frank,

That’s a very interesting paper.

In regard to parameterizations making models “work better”, I was thinking less of some large scale fudge and more about specific details, which is the bulk of the work that goes on – the strength of the jet stream seems wrong, the precipitation profile doesn’t fit observations, the MOC inter-annual variation is too small, etc.

In all of the modeling which involve fluid flow there will be parameterizations to get the model to match observables. Models are just empirical formulas that match some observable data.

But it’s not clear whether you have matched the process with the right dependent variables or just have a pointless curve fit which will be invalid with changes to variables not captured by your parameterization.

The climate science literature (that I’ve read) makes it clear that this is very widely understood, no one appears confused by it. And yet..

12. This ‘scientist’ writing this blog is no scientist. He admits this by claiming to shut down his mind when hearing news about genetics on TV.

About chaos: we fix chaos with history. History, when it comes to climate matters, has been all about Ice Ages for the last several million years and the story is, Ice Ages last over 100,000 years and Interglacial Periods last roughly 10,000 years which is 10% of the length of the Ice Ages.

These Ice Ages all start suddenly. And we still know little about these things except that they happen over and over again like literal clockwork.

We are now nearing the end of our precious Interglacial 10,000 years! And scientists are worried we will be too warm? Seriously, that is irrational.

13. […] « Natural Variability and Chaos – Three – Attribution & Fingerprints […]

14. Just to comment on the start of this post — there’s WAY too much to comment on the whole, especially with Comments — there are several alternative ways to develop and define applied probability, each roughly equivalent in rigor. Which way is preferable can be a question of taste, but that means neither the “definition of independence” idea here nor the significance test framework have a monopoly on “objective means of inference”. They happen to be what was predominantly taught in science and engineering 1950-1980 in university. *shrug*

There are alternatives, for example approaches based instead upon a definition of conditional probability where interdependence is the cornerstone of inference, and conclusions are drawn and compared only with respect to explicitly drawn frames, not some illusory notion of objectivity. (Sure, “objective frames” can be modeled as uninformative densities within this method, but at least there their silly natures are exposed.) The point is that quantitative comparisons only make sense within a coherent semantics, and in the case of Science, that depends upon properly encoding commonly accepted mechanistic models of systems and processes.

Moreover, if one proceeds with conditional probability primary, it is no longer true that assessments of outcomes (as posterior densities) depend in a make-or-break way upon perfect inferences along the way, nor even upon inferences based upon proper Bayeesian methodology. An approximately correct inference is no difference than a measurement having observational errors of some degree, and they all do. Some provide more bits of information regarding the true picture then others, but for complicated systems and this perspective, it is no longer useful to describe and argue in terms of direct refutation. Moreover, the complexity of a model is directly penalized because it has more dimensions and, so, whatever probability mass it wins in prediction is spread out over more parts.

I do not know the details of the Hegerl and Hesselman claims or transgressions. I do know relative stationarity, if you will, is a really slippery idea, but remember that stationarity talks to distributional invariance over time. That idea is a construct not a phenomenon. One might, for example, and using an exceptionally simple case, assert nonstatiinarity if hell bent on modeling whet is really a Gamma or a Cauchy density with a Gaussian, because of the couples between the mean and variance of these two alternative descriptions.

It does seem counterproductive — and to my personal mind quite unscientific — to persue critiques of past work in a field without actively offering an alternative, coherent and simpler synthesis or hypothesis which explains observations and simulations better (and not merely as well as), and arguing why therefore it should be accepted.

15. […] Part Three – Attribution & Fingerprints we looked at an early paper in this field, from 1996. I was led there by following back through […]

16. […] Natural Variability and Chaos — Three — Attribution and Fingerprints […]

17. SoD

Hergel et al 1996 paragraph 3.a caught my attention with the following statement.

“To estimate the internal variability of climate we use as model data two 1000-yr control simulations with the HAML and GFDL models and the 210-yr control simulation with HAMO.”

It appears that one of the starting points in developing its footprint is to “estimate” internal variability. So how is this done? Near as I can tell it uses paleoceanography data, represented in figure 1. But graph 1c is much different than the following paper on paleo SSTs.

Newton, A., R. Thunell, and L. Stott (2006), Climate and hydrographic variability in the Indo-Pacific Warm Pool during the last millennium, Geophys. Res. Lett., 33, L19710, doi:10.1029/2006GL027234.

Could AR5 conclusions be in error due to flawed paleoceanography data?

Regards

Richard

18. […] I recently spent some time reading AR4 and AR5 (the IPCC reports) on Attribution (Natural Variability and Chaos – Seven – Attribution & Fingerprints Or Shadows? and Natural Variability and Chaos – Three – Attribution & Fingerprints). […]

19. […] I recently spent some time reading AR4 and AR5 (the IPCC reports) on Attribution (Natural Variability and Chaos – Seven – Attribution & Fingerprints Or Shadows? and Natural Variability and Chaos – Three – Attribution & Fingerprints). […]

20. […] activity is lower than 50%. I have no idea. It is difficult to assess, likely impossible. See Natural Variability and Chaos – Three – Attribution & Fingerprints for […]