Hadcrut 3, innovative view

Posted: December 16, 2012 by tchannon in Analysis, climate, Dataset, methodology, weather
Image

Figure 1

Hadcrut 3 was last updated October 2012 so I assume it is still active.

I was looking at something else but updated Hadcrut 3 at the same time then took a quick look. The innovation is the first time I have shown a novel presentation of annual data where opinions are wanted.

As you can see this seems to highlight some curiosities about the Hadcrut data.

Firstly what I am doing which is different

I try to stay the right side of Nyquist and Shannon, which is made very difficult by the universal violations before data is provided to the public. This is a bad situation.

In this case I am working from published gridded data, computing my own weighted means[1].

Time series.

One solution to the “annual” data problem is low pass at just over one year and then output minimal additional sampling to create a Nyquist meeting dataset, done here, 1/3rd year sampling. I am intending to go into this in more detail later but I need to say a certain amount now.

Conventional/normal practice in field after field is treat a time series like bundles of wood where any sequence will do if you want to know the mean length. This is wrong as has been known for many years. An example of a good way to make an organisation a laughing stock, such as the UK Met Office declaring 2011 CET was a very hot year when the people (correctly) knew otherwise, answer is herein.  (neither was 2010 as cold as claimed). Some concern ought to be expressed over financial data, many other fields putting corruptions into published results… where people follow the latest knee jerk of a data point.

Unfortunately it is not valid to take the mean of time related items without taking into account when. This will literally give spurious results and it does every day. In effect time has to be averaged too, with both history and the future taken into account. (cold snap 2010/11 occurred in December so conventional 2011 mean didn’t see it, spins off into the meaning of life and practicalities)

This is what competent low pass filters do. It could also be viewed as an extension of [data] sampling theory.

Correct is removing all frequencies from the data above the Nyquist limit for the output sample rate. For annual data samples this is conventionally one sample a year therefore the Nyquist limit is at two years therefore all frequencies faster than two years must be removed first. In practice this needs a guard band too.

I’ll show what that looks like in a later post, is an alternate ploy.

In this case I low pass at just longer than one year so it is annual as in a year at a time but output every third of a year. Strictly if you want to recreate the true analogue original, or even plot you should use a reconstruction filter, beyond here and not worth the effort with this kind of thing. In real world analogue you bet you do, at least for pro stuff.

Given eg. Hadcrut is provided as monthly (not valid time series either) that is 12 samples a year. A minor problem appears, sampling evenly three times a year doesn’t fit. Enter skulduggery.

Oversample by two times, filter with the new 24 samples a year and decimate by 8, 24/8=3 but now there are samples at 1/6th, 3/6 ad 5/6 of a year with the twist this is zero based whereas months are conventionally labelled one based (1,2,3…), modulo math. Now there are 6 months before and after half way.

Image

Figure 2

Oversample? [2]

This proper method retains exact time information as we are about to see.

The data is restricted to one year on how fast it can move (with faster data automatically taken into account, not lost) but the when is retained. If a reconstruction filter is used there is essentially infinite time resolution.

Put another way, a reconstructed analogue signal has exact timing at any resolution and could for example be resampled at completely different times. (is kind of how sample rate conversion is done)

The Hadcrut 3 plots

Figure 1, the head image doesn’t show time in detail, Figure 3 is provided for a close look.

I found figure 1 a surprise, a reason for this post. What is going on 1888…1920, 1972…2000? Why is hemispheric data pretty much synchronous sometimes and not others?

Keep in mind the data is my computation from gridded, not official to which statistical adjustments are apparently made.

Image

Figure 3

Plot with annual grid. Now you can see the timing scatter.

Some surprises on what precedes what, or not at all. Why?

Comments on the methodology are welcome since figuring out a good method is important. What I am doing is one way probably of many.

Mistakes, quite possibly.

What do I know anyway.


[1] Computation from gridded to mean is common for all gridded datasets here, exact matches for some quantisation excepted, eg. rss (some missing), uah (no missing), really seems to depend on whether the providers do post processing. Method used by all is cosine weighted mean, ie. correct on area.

[2] Oversampling can be done one of two ways. In this case duplicate the preceding sample into a new sample, then suitably low pass filter. In this case the final filter does this automatically. If you want to know more will be found in Signal Processing basics. Zero fill and scale the result is the alternative. If you are not convinced you need hands on.

Data and plot data in OpenOffice .ods inside a .zip here (334k)

Post by Tim Channon

Comments
  1. redcords says:

    Well I’m going to say it, I can’t be the only one.

    Every post here by Tim Channon is written in some unintelligible variant of English that is barely comprehensible. Is it too much to ask that Tim uses normal grammar and English?

    All the points he is trying to make are buried under a mountain of confusion that the reader has to wade through.

    Terseness is great, get to the point in as few words as possible. People with an engineering background will often communicate in this way, I understand this. He is similar to Mosher on this point.

    But if everybody has to read it 3 times to maybe get an idea of the meaning (with all the possibilities of misinterpretation) then it’s not worth the effort. Which is pretty similar to what I think of almost everything Mosher says.

  2. marchesarosa says:

    Why is there a wider gap between the hemispheres from 1920 to 1965 and from 2000 onwards?

  3. Steve Richards says:

    Very interesting.

    It would be useful to see the macros that do the filtering etc

    Did this technique improve the accuracy of the results?

    If we are going to approach this from a signal processing perspective (about time to) has anyone tried to use the Kalman filter to get an average temp from these data sets?

    Kalman filters are what enable fibre optic gyros to seek ‘true north’ by extracting very weak signals, very quickly from very noisy signals (sounds like what we need to do).

  4. vukcevic says:

    I have no problem with Tim’s writing, his investigative work is very detailed and meticulous, for which he should be congratulated. Tim expresses himself in a way which he considers do justice to his work, his articles are always highly technical and meaning has to be very precise.
    Of course you are entitled to your opinion, but in my view your request is unjustified and unfair, and instead we all should express our gratitude, that we are able to advance our knowledge by having access to his findings. Tim’s work on the blog is purely voluntary, and highly appreciated by many of us.
    My English is not exactly excellent, particularly in the writing form, either grammar or the sentence construction.

  5. graphicconception says:

    The way the hemispheric data seem to converge and diverge on a roughly 80-year cycle looks interesting.

    As Steve Richards implied, time series analysis is a powerful tool that has hardly been applied to climate data yet as far as I know. For example. frequency analysis would have fun with diurnal, lunar, planetary, solar and Milankovitch cycles. If the “known” cycles could be removed then we would see what was left for CO2 etc to influence.

    However, I have some concerns about tchannon’s post. For instance:

    1. There is mention of Shannon and Nyquist which implies frequency analysis to me but there does not seem to be any.

    2. I did not understand the “bundles of wood” analogy.

    3. If the original data was sampled then it is already too late to sample above the Nyquist frequency.

    4. “… this is conventionally one sample a year therefore the Nyquist limit is at two years therefore all frequencies faster than two years must be removed first.” Isn’t that confusing frequency and wavelength? For one output sample per year you will need at least two input samples per year so frequencies faster than once per six months will need to be removed.

    5. “Oversample by two times …” and “This proper method retains exact time information …”. Won’t it, in fact, add a half sample time delay and smear the input signal?

    6. “If a reconstruction filter is used there is essentially infinite time resolution.” Yes and no. You can plot as many output points as you like but the extra points that were not in the original signal are just interpolated. OK, it may be a band-limited interpolation method but interpolation nonetheless.

    As I said at the start, the graph looked interesting but when I tried to follow the method I ended up with lots of questions. As far as I can tell, none of my points above casts doubt on the result but it would be nice to have that extra bit on confidence in the method.

    Thanks for the post.

  6. tchannon says:

    redcords,
    I know there is a clash of cultures in addition to my poor command of English. At my age there is nothing to be done.
    Perhaps hacking down a longer post, losing things I dare not write, was a bad idea. Never mind, is how it struck you.

    marchesarosa ,
    Quite so. Why?

  7. tchannon says:

    Steve Richards,
    The code is in C written by me with in this case post processing in the spreadsheet. The actual filter is arbitrary brick wall, reported characteristic is in the ss. Doesn’t make much difference (from experience) what detail filter characteristic it has hence I don’t bother with the finer points of design.
    The FIR filter core is a convolution engine with end correct wrapped around, is unique.

    Part of my writing this was coming in from outside and wondering what could be done with finite data instead of the contiguous streams usually handled.

    Kalman, not looked as such but it has crossed my mind although I have no experience.
    I suspect there is a distinct difference between handling properly done data and systems, vs. the mess which is climatic data.

  8. Roger Andrews says:

    Tim:

    There are two problems with HadCRUT3 which in my view make it unsuitable for analytical purposes.

    First, it combines air temperatures measured with thermometers five feet or so above the ground with sea temperatures measured at depths of up to five fathoms below the sea surface. Combining these two variables would be OK if they tracked each other, but they don’t.

    Second, the series that are used to construct HadCRUT3 (CRUTEM3 and HadSST2) are both heavily adjusted relative to the raw SST and SAT data. And I don’t think I need to add anything about the quality of the adjustments.

    But if you accept the official HadCRUT3 numbers at face value the NH-SH differences seem to show a relationship to the 60-year planetary cycle.

    You might try subtracting your NH and SH numbers to see what you get.

  9. tchannon says:

    graphicconception,
    I’ve done no frequency analysis there, been done many times in the past. It is the strange divergence which got my eye, also seen before but not that I recall with this data. There again, hemispheric is rarely shown, not published by the authors so far as I know.

    3.
    Yup

    4.
    Mention of Shannon and Nyquist, to do with sampling theory and the bundles of wood analogy is all to do with trying to get a message across about illegal math. What works just fine with everyday things but does not work where time is part of the math.

    Getting this message across is very difficult because wrong math is endemic in science. There is no easy way to tell someone they were taught wrongly, the books are wrong, their colleagues are wrong. Try and you are the one getting a cold shower.

    In the case of meteorology a difficult semantic problem is present. Humans like to know about things they invented, such as divisions of time, a year, yet the real world has no such distinction.

    How cold was 1991? The real answer is it all depends on what you mean. For most people it is add up say 365 days and divide by 365. This is right in that particular context but wrong for a proper sequence. Maybe it helps if I point out the answer might be dependent on which hemisphere you live in, the human calendar flips year either mid-winter or high-summer.

    With time honouring math a time window must not have hard edges, otherwise fake wiggles are added.

    If you get the rough concept that is a starting point. Main thing is to be aware of this, be dragons close by.

    5. Oversampling?

    Blue is input, low pass filter to create red as result. Curve at ends is first order end correct starting to fail near the ends. (bit of an optical illusion with that plot, to me the reds wobble)

    6. Interpolation by filtering, fine. Can’t show anything here but it you literally go from analogue, through a correct sampled data system and then turn back to analogue you will find time is accurate. If it wasn’t the system is broken, non-linear, distorting, whatever. Intuitively this can seem wrong.

  10. tchannon says:

    Like this RA.

    Looks dodgy prior to 1880. Period looks like 79 years a figure which turns out to be mentioned by Landscheidt. All very unclear so I would take that as co-incidence.

  11. Brian H says:

    Tim;
    When you go for terseness, perhaps randomly sampling about 1/3 of the original sentence’s words isn’t the best approach.

  12. tchannon says:

    Yep, you got a two thirds discount, a recurring theme.

  13. Roger Andrews says:

    Tim:

    Getting back to your original question: Why is hemispheric data pretty much synchronous sometimes and not others?

    Well, it’s because NH temperatures fluctuate while SH temperatures are close to monotonic. So when you subtract the two you see cyclicity, and the cyclicity matches the AMO index. But this doesn’t tell us anything because the AMO index is calculated from detrended SSTs, making a correlation inevitable.

    So I think your question basically boils down to; what causes the AMO?

  14. tchannon says:

    wikipedia… “The AMO has a strong effect on Florida rainfall.”
    Turns into beer or is that only rains Sundays and saint’s days? :-)

    Why would this be Atlantic only?

    There is some kind of modal pattern between north and south between Australia and the far east, a pattern which moves in the far east too. This kind of pairing seems to exist in a number of things.

    I reckon it’s as likely a global. oscillation, but who can say given this is pushing measurement anyway.

    Here is
    Hadcrut3 diff
    RSS TLT diff normalised to Had3 ref period 1980…2000
    UAH TLT diff ditto

    Overlay of the wikipedia AMO, plot version “Atlantic Multidecadal Oscillation according to the methodology proposed by van Oldenborgh et al.”

    .

    Suggesting “AMO” is local to the Atlantic looks unwise.

    In the past I have suggested this is about solar asymmetry, pointing to a magnetic connection. The next couple of generations will get to find out if things swing back.

  15. Roger Andrews says:

    “Why would this be Atlantic only?”

    Well it isn’t just the Atlantic. The AMO cyclicity is dimly visible in the Southern Hemisphere records and plainly visible over much of the Northern Hemisphere, including in the North Pacific SST record.

    Which raises a question. In the North Pacific we have the PDO, which is based on the difference in SSTs between the east and west Pacific, but we have no AMO-equivalent index based on detrended SST values. Why don’t we have one? I could think of no reason, so I made a Pacific Multidecadal Index up. And here it is plotted against the AMO since 1910, with both indices calculated by simple linear detrending of North Atlantic and North Pacific ICOADS SST anomalies for consistency.

    The AMO and PMO aren’t identical but they are very similar, which the AMO and PDO aren’t.

  16. Paul Vaughan says:

    The multidecadal oscillation is global, but some regions are more strongly amplified or less strongly amplified according to how land, mountains, & ice lay relative to the winds & sea.

    GRADIENTS drive flows. Many things in the climate discussion are grey, but the role of gradients in circulation is black & white.

    Gradients across the Southern Ocean are steep and they basically form a ring at constant latitude. There are few north-south features to deflect the ring meridionally, so in a sense the deep south including Antarctica is sealed off in its own room behind a circulatory wall.

    In the north, there are meridional deflectors where east continental coasts meet western ocean boundaries.

    The steep winter land-ocean temperature gradients give rise to steep pressure gradients and hence strong midlatitude westerly winds. Note the waviness of the winter gradient in the north:

    The amplitude of flow-driving equator-pole gradients and resultant midlatitude westerly winds is modulated by the solar cycle in cross-ENSO aggregate. We know this from AAM & LOD records via the laws of large numbers & conservation of angular momentum. This is not speculation; it’s observed via any of dozens of methods.

    The PDO is orthogonal to the multidecadal temperature wave, preceding it by a 1/4 cycle. It represents gradient-driven pumping up the Pacific western boundary, so the integral appears in temperature. Something analogous happens in the Atlantic.

    Wyatt, Kravtsov, & Tsonis (2011) have illustrated the multidecadal northern hemisphere winter coupling holistically:

    http://link.springer.com/article/10.1007/s00382-011-1071-8/fulltext.html#Fig4

    Jean Dickey (NASA JPL) & colleagues have known what Wyatt+ (2011) illustrated for decades:

    “The decadal ‘‘noise’’ involves coupled variations in the distributions of temperature, mass, and velocity (21, 22) and so is manifested in the steric sea level, moments of inertia, and the Earth’s variable rotation.” (bold emphasis added)

    Munk, W. (2002). Twentieth century sea level: an enigma. PNAS 99(10), 6550-6555.

    http://www.pnas.org/content/99/10/6550.full.pdf

    (“(21, 22)” is a reference to Jean Dickey’s work.)

    The multidecadal rate of change of solar-terrestrial weave pitch indicates a simple equator-pole heat & water pump doppler effect.

    The simple reason this was not widely noticed before: ENSO (which scrambles conventional exploration).

    Important:
    Thermal wind is a bit of a misnomer, so be careful to research how the term is actually used by climatologists. (Guessing wrong from context is almost assured, leading to stubborn misunderstandings that are hampering collective awareness & discussion progress.) Strenuous suggestion: Make it a priority to understand thermal wind. It’s the primary solar-terrestrial mechanism. Discussion won’t advance until everyone understands this.

    Warning: The conventional narrative on solar-terrestrial relations is fundamentally flawed. Solar activity modulates terrestrial GRADIENTS. Solar variation does not modulate terrestrial global averages directly, but rather through the temporal integral of spatial-gradient-driven circulation.

  17. Edim says:

    Detrending the AMO makes no sense to me. It’s just a temperature index like any other and it shows the similar oscillations like any other.

    There’s a nice correlation with the SCL, especially considering there must be other factors affecting global climate.