Met Office cost saving

Posted: January 15, 2013 by tchannon in Analysis, Cycles, Forecasting, methodology


On taking a break from working on unpublished data after discovering two unexpected curiosities it crossed my mind a spot of twiddling with fantasy might go down well.

Brief explanation: what happens to extrapolation when withholding known data, so input is through end 1979, through end 1989 and so on, then see what is forecast to 2030 based on the shape. Four coefficients have been allowed (one more than sensible). For compare there is a signal processed end corrected low pass filtered version of the input monthly Hadcrut4. All done on “automatic”, no prompting by me.

The result is the same as from Hadcrut3 gridded when I compute from that (did this ages ago) without the stats messing. Same flat then second ramp up @2060, suggesting not a lot has changed.

Do I get a 10% cut from the savings? (insert grin)

We seem to have some new faces on the blog, writes the co-moderator and contributor, so I had better not run too fast. I am using an unpublished tool, many thousands of lines of C, where one thing it can do is create a non-discrete Fourier model of a dataset and write out the magic incantation to recreate a discrete time series which is not constrained to the input timeframe. Involves a lot of random numbers and fast code. And other stuff.

I hate misleading anyone so please do not take this work any more seriously than “Hmm, how curious”

A lot of data passes by me but until today I had never looked at Hadcrut4, of no interest because in my view it is so dire underneath is a waste of time. Decision: omit justification and background, too long and involved.

My background is more from signal processing.

And also…

As a taster from work in progress


Scale is Kelvin, dates are dates, 4ft ground temperature through air min/max, no time skew. What data is published is wrong, this is largely corrected, reverse engineering incorrect homogenisation arising from a lack of understanding of physics, perfectly understandable back then. (although few today seem to get it)

Roughly speaking it takes 14 days for heat to travel 4 feet (just over 1 metre) in the ground at this site, maybe a couple of days to 1 foot. A solid acts as a thermal delay line, heat flows in and out, so there is time delay. Something mind bending has turned up to do with the relationship with air temperature. Let this sulk at the back of what passes for a brain.

From a sampling point of view you can’t sample temperature using a min/max thermometer, nor is meteorological maths correct. Lots of awkward problems where pragmatics comes into it. On the other hand a deep ground temperature thermometer is integrating, it can produce valid data except the technology and reading method was poor.

This is also the true Earth temperature as a body: in ephemeral air as human seem to think is the temperature is not.

Oh, in case you were wondering, the joke at the start of this article expands to this


This isn’t rational yet perhaps it gives a feel. That is the point of this stuff, trying to get an understanding rather than any final answer.

A brief look a the official Hadrut4 global plot suggests the Met Office have at long last moved slightly on end correction. I might write on this later and note: I have no particular bone with these people when they do the right thing, difficult with all eyes watching. I would far prefer they lost the climate arm, put work into improving operational, which shows signs of lack of care, probably starving. They also have to take on board Hurst when it comes to statistics.

If my English seems strange, is how it comes out unless I spend an inordinate time refining. (ie. saves anyone complaining)

Post by Tim Channon

  1. tallbloke says:

    Interesting, thanks Tim. I think if we do get a protracted solar minimum, all bets are off with the predictive work in this analysis.

  2. Doug Proctor says:

    Interesting. The downturn, I take, would be “normal” variability. Is what the IPCC use to say that, without CO2, we’d be in a cooling phase?

    Works both ways. CO2 forcing will have to be increasing if “natural” variation is cooling AND temperatures go up.

    The US contiguous 2012 now out, shows great jump, sure Hansen happy/excited. Will HadCruT show it?

    Next couple of years … the narrative either gets redesigned or goes away. I bet on the rewriting.

    If Ehrlich can say he was never wrong, just had his dates a bit off, so can the CAGW crowd.

  3. Greg Goodman says:

    Tim, it may be interesting to do a similar thing (fourrier) on the time derivative. Firstly since we are all obsessed with climate _change_ we would do better to study the rate of change rather than trying to guess it by eye from the time series.

    Secondly, it would mean that the irrelevant base temperature of the “anomaly” disappears, any constant longer term rate of increase becomes the DC fourrier term and probably most importantly there is very little difference between the values at each end of the sample.

    This may mean that you can avoid the need to further distort the data with something like a Hamming window function before doing the FT.

    I presume that you are doing this with the temp time series, thus totally artificially bending down the end of the data.

    Having got a probalby less distorted FT model of dT/dt you can then integrate it back to get a time series if you wish.

    I did a somewhat similar exercise here, though I simply fitted the cosines rather than doing an FT:

    I did the same processing for ICOADS (unprocessed by Hadley) and hadSST3, take you pick.

    Both of these are quite close to the new Met Office 5 year “decadal” forcast. They also probalby show why this year’s ten year forcast only shows five years. They didn’t want us to see the 2017-2022 bit 😉

    If you want to copy those images into this blog entry, feel free.


  4. Greg Goodman says:

    PS the middle panel of each of those models fits a long term 160 y rate of change of 0.42 K/century (ICOADS) and 0.49 K/century (Hadley), neither of which is enought to save us from a rather chilly future for the next 20y or so 😦

    PPS. I’ve just notices a x label error on those plots that no one picked up at the time. There nothing FFT about these plots , it’s just date in x axis.

  5. tchannon says:

    I am struggling to reply, I think because we are coming at this so differently.

    You are going into an area I avoid Greg because it makes little sense to me. I assume by time derivative you mean integrate. That is generally a problem with improperly sampled data as this is. Similarly it is very noisy.
    Tried a low noise Lanzos integrator N=7 gets a strong 1 year, 0.5 and a load of noise noise thereabouts. Longer term nothing large or making sense to me. Can try filtering out the noise but whole thing is arbitrary. Maybe I did it wrong.

    I am not using D(F)FT, no binning, no end effect, no windowing. In this case there is a good low pass cross check.

    A “DC” Fourier term is not. It is some irresolvable something, beyond the transform capability all lumped in one bin. This is one of the reasons I decided to do something about it.

    Had4 as shown was a quick chuck something together, not particularly serious since I have large reservations about the underlying data and methods.

    Perhaps I need to do something clearer which shows how it is constructed.

    I’ll take another look at the JC stuff, not seen it before.

  6. Greg Goodman says:

    Tim: “I assume by time derivative you mean integrate. ”

    err, no, I mean derivative, time differential , dT/dt. It’s the exact opposite of integration. I guess you just mis-read or something.

    If you are not using a windowing function that’s fine. You’ve never said exactly what you mean by your fourrier processing so I’m guessing.

    Tim: “A “DC” Fourier term is not. It is some irresolvable something”
    Indeed. It will include any stationary term but is also influenced by anything longer than the window of data available. I would have done better to use the term residual.

    Since we are primarily interested in rate of change it makes sense to me to study dT/dt rather than trying to infer rate of change by eyeballing the time series.

    It should show the same thing except for a scaling factor that changes linearly with frequency. If there is a difference, looking at how and why should be informative.

    Indeed, the residual will be different. Looking at dT/dt gives a slightly different view of the data. Viewing from a different perspective can often be useful.

    With evenly spaced data dT/dt is simply the first difference, so it is trivial to calculate and re-run processing such as the fourrier analysis.

  7. Greg Goodman says:

    [setting notify option]

  8. tchannon says:

    Oops, brain noise, integrate differentiate, which is what I did. Sorry about that.

    Most science data violates Nyquist and Shannon so there is a lot of junk, artefacts. This is the worst possible situation for differencing.

    I don’t really understand what rate of change has to do with the basic dataset.

    *if* we could remove all cyclic entities there might be what some people call a trend but we are ignorant of H so there is unknown variance, how long is a piece of string. Moreover the data must have a central tendency, linear trend is impossible.

    I have written software which uses multi-dimensional optimisation to fit successive or concurrent functions to sampled data, irregular or not, usually in a least squares sense. Hence some of the limitations of a normal transform are sidestepped. Usually straight Fourier is most useful.
    No papers on this. Optimiser is my own from the 1980s, created for hard problems.

    This can attempt to fit eg. very long period sine in the limit case of being given a straight line, is an approximation.Or perhaps is a fragment of a longer curve, ambiguous but all we have, ultimately comes down to a human call on whether there is actual meaning. I’ll have to leave a lot unsaid. This has been under development of a number of years so it is quite stable today but like so many things is living, needs more doing.

    There are novel uses. A fun one is finding doublets in data where Fourier cannot split them. Lock to one and subtract it out, there is the other (not theoretic, I do it). Such as in lunar data where the causal difference works out at 18.6 years, Saros. Got that as Fourier it can be turned on it’s head and used as a function generator, including out of original timeline.

    Plots I was showing are function output.

    Maybe that is clearer.

  9. Greg Goodman says:

    Tim: “Most science data violates Nyquist and Shannon so there is a lot of junk, artefacts. This is the worst possible situation for differencing. I don’t really understand what rate of change has to do with the basic dataset”

    Differentiation does amplify HF noise. However, if it is just random noise rather than bias or data corruption (bias “correction”) it can be filtered out. I have only found the S/N a problem in second diff.
    and it is still manageable with a heavier filter.

    One advantage is that instrument changes or crappy corrections only have a punctual effect and don’t pollute the rest of the record. I’d rather have a negative glitch than a permanent offset halfway through the data. Also having then end at a level close to the beginning is generally helpful for Fourier type techniques.

    Tim: *if* we could remove all cyclic entities there might be what some people call a trend

    I may have mislead you into thinking I was out to find a linear trend. Quite the opposite, I have often criticised climate science for seeking linear trends in chaotic systems and the nasty habit of “detrending”.

    If the constant residual term in dT/dt shows a “trend” in T, this will more likely be a segment of a much longer variation beyond the length of the data and should be read in that way. However, since there is very likely just such a “trend” in the last 300y a Fourier based technique will be less perturbed if it falls in the residual of dT/dt and does not affect the cyclic decomposition. And as I said before, differentiating will attenuate longer frequencies anyway.

    Just as an exercise, try it on your hadcrut4 example and see whether it shows any interesting differences.

  10. tchannon says:

    Already looked, forced a result, don’t like doing this. In effect I have to either filter or force where to look (because the automatic director latches on to the wrong thing)
    Don’t think I saved anything.

    Wait. Might be a new blog post, not decided.

  11. tchannon says:

    Doesn’t like it, happens with some data. This can take a long time to figure out. In this case I think the periodicy is irregular, inconsistent between early and late data. I could start to do fancy stuff but I rarely show that, is about getting a feel. I could for example put a confidence on the samples.

    In some case working with a function of the data linearises it in the analysis domain.

    Best bet is simply show filtered, is a tiny signal anyway. Snag with that is where to draw the line given long period is very small.

    Following is thrown together fast, no plot descriptions

    Diff and filtered

    Filtered at about 15 years, end corrected.

    .ods in zip

  12. Greg Goodman says:

    Yes, your first plot here is similar to what I saw working with hadSST3 and icoads for the article on ‘climate etc.’. I was able to fit my 3 cosine + const model by non-linear least squares as I detailed in the article.

    It is necessary to provide initial values for periods etc. but provided the periods were well spaced eg 20,60,200y it would latch on. This was done with a variety of intial values to ensure it was not just latching onto local minima.

    Your second plot is close to the middle panel (dTdt) of my hadSST3 plots except that I used a lighter filter:

    If you are doing full spectrum Fourier type method you will probaby need to do some kind of anti-alias filtering. But I’m sure you know all about that.

    What I was interested in was seeing how your Fourier spectrum of this looked. My variable window FFT was useful in looking at the extent to which the Hadley processing was affecting the spectrum but it was a bit crude. The spectra you derived for the planetary data looked a lot more sophisticated.

    Aren’t you able to do the same sort of thing here?

    PS. “Filtered at about 15 years, end corrected.”
    What’s this end correction? That’s a couple of times I’ve seen you mention that. This is a perenial problem with filtering, if you have a solution I’d like to know how it’s done.

  13. Greg Goodman says:

    Tim: “I don’t really understand what rate of change has to do with the basic dataset.”

    If we are looking for the effects of changing “forcings” (measured in W/m2) we should be looking not at temperature (an energy quantity) but at dT/dt which is a power term (watts).

    An example, here, shows how looking at the derivative of Arctic ice cover makes it instantly much clearer what is happening.

    If you look at the time series of ice extent/area it will be a lot less obvious.

  14. tchannon says:

    Ah, this spins off into other things.
    Not about you, my take: The usage of the word force in politics of the kind involved is the usage of a “hot” word, emotion, is propagansist. Unfortunately most of these conflict with my understanding.

    The entropy of the body if in this case we ignore the hot core as negligible is the entropy of the entire body. That of a gas at a point where humans seem to care about it has an indeterminate relationship to the whole.
    Temperature is the measure of entropy all else being the same.

    Watts is a dissipation of a flux in a resistance bringing entropy raising of a thermal capacity assuming nothing else is going on.

    Dire problems appear when Stephan -Boltzman is applied to a real body, which is not part of the equation. If sufficient other parameters are known what might be called extended SB could be used. I contend we do not know well enough.

    The air near the surface to a good approximation knows nothing about flux or watts. It is a mass with thermal capacity at some temperature. How it acquires or loses heat is somewhat complex.
    How this air is related to other factors is I think unclear. (why I am looking at the ground temperature data above)

    Polar sea ice is another large subject riddled with problems. I get the impression we would need days of discussion to come up to speed on our positions. You might find this link worth a quick look
    I forgot to mention one critical thing, orbital inclination.

    Quite a bit more around on sea ice, won’t mention more for now.

  15. Greg Goodman says:

    OK, I was not intending to discuss sea ice, it was just an illustration of why I think studying dTdt directly is more informative. I’ve commented on the post you linked to so, no more sea ice here.

    I asked this earlier and things got distracted. This is a persistent problem, so if you have a technique I’d like to know about it.

    Tim : “Filtered at about 15 years, end corrected.”
    What’s this end correction? That’s a couple of times I’ve seen you mention that. This is a perennial problem with filtering, if you have a solution I’d like to know how it’s done.


  16. Greg Goodman says:

    just had a look at your ods. What’s the basis for that filter kernel? Nice and compact.

  17. Greg Goodman says:

    OK, I got it. Not sure that getting that fancy with the diff is relevant with the state of the data, but it’s cheap enough to do.

    I’d still like to know about your filter end correction 😉

  18. Greg Goodman says:

    I think you slipped up on the formula : /12 should be /(1/12) ; the increment is one month not twelve years. Could explain why the filtered dT/dt looked so small. Once I corrected that I got your final plot with a range of +/- 3e-02 K/annum , ie about 3K/century max swing which seems credible.
    [ Ah yes, I wondered whether I had finger trouble. –Tim]

    Pretty much the same as the what I showed for HadSST3 in the Curry post.

    Actually, the coeffs in this sort of formula normally alternate and the coeffs are usually larger for an 11pt formula, though I have not been able to find the same you are using. Maybe I have hit unreliable sources. Do you have a link ?

    Anyway thanks for pointing this out. The data is noisy and maybe a broader sample is better than the simple first differences I was using.

  19. tchannon says:

    I’ve never seen a mention of how this is done so I invented my own. Unlikely it is anything new.

    What I call first order end correction involves padding with the correct value.

    The effect is disconnection from offsets.

    In practice I use a different method which is wrapped around a convolution engine, has to work even where a filter is longer than the data. Took me about 6 weeks to figure out this part of the code. How do you correct in the other domain, hair tearing time.

    A twist contrary to everyone knows is that the length of the filter is irrelevant, only the characteristic matters.

    As usual, there are human calls on what is safe, what can practically be done. I mention this because not doing so would lose trust. How clearly depends on whether the work is casual.

  20. tchannon says:

    Filter kernel?

    Link is in the .ods but I will give it here anyway, following taken from a draft post about several snippets which might be published sometime

    “Second up Pavel Holoborodko from Japan. He has clearly written practical work on Numerical Differentiation, handling reality, noisy data. Web pages are here.

    Wonderful stuff.

  21. Greg Goodman says:

    Thanks, there’s some useful explanations on Pavel’s pages. I’d noticed that your differential has notably reduced the noise. How much does it distort the data in the process, I’m not sure I like the look of the frequency responses. Too much like a sinc fn for my liking and then I may as well do a runny mean before I start.

    I’ve never found a magic wand for this kind of problem. A short kernel is always pretty ugly. Maybe there’s an advantage in doing diff and low pass together in terms of losing less data points. Interesting stuff.

    “What I call first order end correction involves padding with the correct value.”

    What is a “correct value” and how does one know when one has been found? Sounds like you’ve given it some serious thought, so what is the criterion for “correct”?

  22. tchannon says:

    There are ss examples around but it isn’t something you would use for significant stuff.
    Trying to remember where these are. Ah yes there was a demo a couple of years ago or so, probably still valid. Have to look around see if it is online somewhere.

    The convolution version is a whole different game although off the top of my head I can’t remember how it works.

  23. tchannon says:

    I’ve dug out a couple of files which are demo of end correction.

    Looks like I was avoiding a bridge too far for readers by using filter characteristics they would know.

    At some point I have been intending publishing an article on end correction, probably as part of other things to do with data handling. One of the obstacles is the apparent very little interest in the subject out there, mostly armchair spectators. Given this takes me time and effort there is little point.

    I’ll think on what to do.

  24. Greg Goodman says:

    “One of the obstacles is the apparent very little interest in the subject out there”

    This is significant problem for many sorts of data processing, espectially climate related where the most recent developments are very important and no one wants to wait 10y for the filter window to fill up.

    There are also applications in image processing where any filtering or convolution based processing like blur softening , edge enhancement etc. usually resort to some fairly crude techniques like continuing the last pixel, trend or mirroring.

    If you have some techniques , which I suppose must ultimately be based on continuation of analysed trends or frequencies, I can think of a number of domains that would be very interested.

  25. Greg says:

    I’ve been thinking about these low-pass diff/filters.

    The differential is defined to be the limit of the dx/dy slope as dx tend to zero. The closest we have to this in discrete data is the one step difference. So what are these fancy n-point expressions calculating?

    The 3-point diff is 1-0-1 weighting which is in essence the mean of the slope either side of the point. It has the advantage of producing values on the same grid as the data. Also the simple first difference needs to be offset by 0.5 to preserve phase. This may sometimes be inconvenient.

    So 3-point is identical to first difference plus a two point running mean. A very light LP filter.

    The higher centred difference formulae are extensions of the same idea with a bit more sophisticated filtering applied. The frequency response is monotonic for the std centred filters, at least for the first diff. The length of the n-point centred difference determines frequency of the filtering.

    The point is that this is not a more accurate formula for the differential it is simply a calculation based on the surrounding data that that builds some kind of weighted averaging into the calculation. It is diff plus low pass filter all in one. There is nothing closer to the base definition of the differential for discrete data than the one point first difference.

    Both convolution (hence conv filters) and differentiation (ie kernel based diffs as here) are linear transformations, so the order in which they are done does not affect the result.

    If there is too much H.F. content in the data or it is too noisy, then filtering can be applied before or after (or at the same time as) running the diff.

    As always the choise of filter requires determining the character of the signal and the noise in order to make an appropriate choice of filter.

    Averaging (the base of runny means) is a suitable technique for removing random (gaussian distribution) noise from a constant or slowly changing signal. It is certainly NOT a good choice when the data is know to contain periodicity between one and two times the window width. It generally has a lousy frequency and phase response.

    What the japanese chap is calling “noise resistant differential” has a frequency response that looks uncannily close to the sinc function ie the freq resp of a running mean.

    It is clear that all this is doing is applying an arbitrary choice of filter and assuming one size fits all. This is the essential problem with a lot of digital signal processing done by scientists in general and climate scientists in particular.

    If you need to run a filter because of disruptive noise of h.f. then as always this should be done with forethought by studying the data, not by pulling a magic “noise resistant” formula out of a box.

    In essence the NRD makes that choice arbitrary as do the n-point formulae (though they are probably less disruptive filters).

    Filtering always needs to be tailored to the data. I see no advantage in combining differentiation with filtering. It simply makes it less clear what the properties of the filter are.

    since this is mathematically equivalent to doing the two precesses separately and a simple first diff is so fast to run I see no advantage and a lot of disadvantages in mixing the two processes.

    It is always worth challenging what methods one is using and understanding what these diff/filter kernels do is interesting, so thanks for pointing it out.