Hottest ever in England: Why meteorological mathematics misleads

Posted: March 28, 2013 by tchannon in Analysis, climate, Dataset, methodology, Natural Variation, weather

Image

Figure 1

Today I am tackling a very difficult subject which is widely misunderstood, the correct treatment of digital data.

Lets go back to childhood. Scribbling lines with a pencil on paper. Today as grown up we represent the squiggle by a series of X and Y numbers in a computer. There is no other way to do a digital representation of a continuous or analogue line. Confusingly in English, one is the analogue of the other.

Even here I am not being strictly correct, omitting the reconstruction filter.

Figure 1. all three sets are for the same data.

Red traces are identical as a reference, meteorological maths, simply the common average of all days (or months) in a year and then truncated at a tenth of a degree.

This process handles averaging in amplitude but that is all. It makes a total mess of everything else. Error ranges to +1C

Top set, signal processed at the correct Nyquist limit for annual data output points, which means the data is limited to 2 year movement.

Middle set, a sensible minimum sample rate for displaying annual is thrice a year. Here I have kept Nyquist the same but output at this new sample rate. Now you start to see reality, proper processing keeps averaging in time too. When events happen appears.

Bottom set, this is the closest to correct I can do. Nyquist is now set at annual and with three samples per year this is valid. (theoretic limit is 2)

Figure 1 has revealed two things.

  • the highest temperature in CET at an annual level occurred as a spike at the turn of 2006/2007 but this is hidden by Met Office / Hadley / CRU practice of insisting on annual.
  • 1996 had a cool spike which because it is within the hard limits of a calendar year is captured by standard math

As you will see in a moment, winters of both 1947 and 1963 are grossly understated by standard annual CET.

The recent cold 2010 and hot 2011 were not, bad math. There is a trivial explanation: 2010 had two winter cold snaps, 2011 had none. Remember, much of the cold winter of 2010/11 came in December 2010, an early instance.

Image

Figure 2

Here is bottom trace for all of CET.

A fun time can be had by comparing this new representation with work with Martin’s CET commentary.

Root of site here (no new browser window)

Selectable timeline here (new browser window)

Image

Figure 3

This is a difference of the top trace and standard met practice, can only do this one where the output sample points are at the middle of a year.

A word of caution. I’ve spent some time enhancing the software I am using to properly handle irrational input/output data sample points, I’ve implemented full data resampling, written in C. Fun. So far this works well but I am unhappy about the handling of dataset ends, always am. Beware, the ends might be significantly inaccurate. Please rest assured I do try my best and any error is not intended to mislead. I am prepared to risk some ridicule rather than play completely safe and show nothing.

In this instance the output sample rate is easily produced from monthly, time offset 1.5 and divide 4, giving sample points of 1/6th, 1/2, 5/6ths in a year with 1/3rd year between all points.

2006/7 spike?

Now when was the exceptional Arctic ice melt? I think this was the winter before the melt. Perhaps this spike was over a much wider geographic area, weakening season freezing.

I have some surprising news to do with Arctic ice but that is for a later article. Fear not, things seem to be very different from the official idea.

At present no data supplied. Ask if you want it.

Post by Tim Channon, co-moderator.

Comments
  1. Paul Vaughan says:

    “[…] I am unhappy about the handling of dataset ends, always am.”

    A topic worthy of years of intense focus, I agree.

  2. Ninderthana says:

    I have to assume that the original data is a time series of daily (or monthly?) land surface temperatures which have already been spatially filtered to representative of region in central England?

    You seem to be complaining because the Met Office is grouping temperature data into annual bins, effectively smoothing out any features with temporal frequencies that are twice the new annual sampling rate (i.e. = 2 x 1 years). Is that correct?

    You seem to have corrected the problem by re-sampling (digitizing) the data at a thrice-annual rate, in order features that vary at an annual (or longer).

    If this is the case, then you are correct in the case where the Met Office is claiming the hottest ever year.

  3. tchannon says:

    Underlying data is monthly. Story I will be revealing soon about this. (almost tossed a coin on which article to write first)

    What I have done is hard to explain if simple. I’ve kept the time window at 1 year but to do this “legally” this demands two or more sample points per year, hence three is the smallest practical number. .

    Standard math hard limits at calender divisions so it is blind to all variation in time. I know I bang on about this yet the problem is real. A peak on the year crack does not show, in the middle of a year does.

    Doubt this is clear to everyone. Wish I knew how to explain in a few words.

  4. Doug Proctor says:

    Is this not of a similar bend: take lots of regional data or individual proxies which have their individual ways, merge them and create a mathematically “correct” mean, average or mode, and yet have something that is without a shred of accurate representation of what is going on. An artifact of process, not an article of occurrence.

  5. Sparks says:

    Tim,
    I only know enough about sample rates to get by, But, looking at what you presented here on sampling data from timescales gives me the impression that anyone can programmatically run sequences of sample rates on the data and then pick a plot that fits a required result or based on how it looks.
    Basically by the time it takes me to carefully plot one chart, someone with the right set of skills can effortlessly and systematically produce hundreds of model dry runs with mix and match data sets.

  6. wayne says:

    TimC, same principles I try to adhere to — “Please rest assured I do try my best and any error is not intended to mislead. I am prepared to risk some ridicule rather than play completely safe and show nothing.” Put it out there and see if something develops, sets off a lightbulb in someone else’s mind, it often does.

    Been a long long time since I delved into Nyquist, and then it was on two dimensional images. But it has always amazed me sometimes what such a monthly time series looks like just picking each of the twelve months as the starting point for 12 annualized averaged views. Sometimes the difference is quite different. Its like you normally only see 1/12th of the true complete picture.

    Nice article.

  7. Chaeremon says:

    Hello tchannon, good points 😎 In my work I face similar challenges with the peculiarities of math, here’s my short list (not every issue may / can be applicable to temperature or climate).

    1] my “raw” data has no equidistant heart beat (it’s from close angular/spatial approach in astronomy, and only events that can be timed; you may perhaps have 1 max / min temperature at 1 time in 1 day). I therefore plot the data with “x/y scatter” diagrams, either along julian date (incl. hour), or along ecliptic longitude°, on the abscissa (x).

    2] mathematical mean/average can be good for visual comparison of results (e.g. of two diagrams or curves) but not as factor in modelling the unknown (to be sought for) relation. In a model they can only produce fictitious coordinates, which create their own mathematical bias and have no existence in reality. Two plain examples: 29.53059 days cannot be observed for the individual month, 365.24219 days cannot be observed for the individual year.

    3] on the ordinate (y) I want to see the same scale/offset to every other data point, therefore I subtract the column’s median (which b.t.w. is a genuine data point) and then plot the resultant column, also for further analysis (of e.g spectra or frequencies, or with http://formulize.nutonian.com/ ).

    4] I want to see that every sinusoid (of which there can be many that make the shape of the curve) has a beginning and an end in the same diagram, and if not: I want to assess why not. Therefore I require that the sum of the (data point – median) column is as close to zero as possible. This is not easy. I try by dropping data points from the series’ effective begin or end, also try with a [slightly] adjacent median. And if that’s not possible then I know the reason for the first anomaly (it can be that the data series is inappropriate).

    Once this is all done then analysis is rather not about the preconception that can come with mathematical bias.

  8. It’s a good point you make about Dec 2010.

    If Met years were used,( i.e.Dec-Nov), 2010 would not look quite so cold, and 2011 would not look as mild.

  9. tallbloke says:

    OT:
    Met Office ‏@metoffice 28m

    Provisional statistics show March is set to be the coldest in 50 years for the UK http://bit.ly/16jCtw3

  10. grumpydenier says:

    Thanks. I’ll let the Knights of Delingpole know. lol

  11. tchannon says:

    Sparks, If I read that as can produce results to fit an agenda, that is not what I am getting at but it is one of the effects.

    I am trying to show the minimum necessary information which correctly displays what many professionals are claiming to be showing but are not. Any output sample rate higher than the 3 shown will look exactly the same. Now consider what happens if too few are used.

    If I am technically wrong then someone please speak up.

  12. “1996 had a cool spike which because it is within the hard limits of a calendar year is captured by standard math”

    Where has the word ‘math’ come from, surely it’s maths, short for mathematics?

    [Reply] It came from Amerikay (and can go back there for me) -TB 🙂

  13. Roger Andrews says:

    “Winters of both 1947 and 1963 are grossly understated by standard annual CET.”

    Whenever you take an annual mean you will inevitably smooth out cold winters, particularly when they’re followed by warm summers, which was the case in 1947 (1963 still shows up as one of the coldest years on record however you average it.) If you want cold winters and warm summers to show up on the plot you have to plot seasonal or monthly means.

    “the highest temperature in CET at an annual level occurred as a spike at the turn of 2006/2007 but this is hidden by Met Office / Hadley / CRU practice of insisting on annual.”

    You can make warm summers (or cold winters) show up in annual means by plotting a running 12-month average, and when you do this you find that the warmest twelve-month period occurred between the beginning of May 2006 and the end of April 2007. But a calendar year is a calendar year is a calendar year and it’s defined by the Earth’s orbital period, not by the Met Office, and it’s the accepted interval for calculating annual mean temperatures. So I don’t think the Met Office is hiding anything here.

    Nyquist frequencies and sampling intervals must be taken into consideration when selection of too large a sampling interval could create aliasing of an important signal. With the CET monthly readings, however, the only signal you are going to alias is the diurnal variation.

  14. tchannon says:

    I have failed again to get the message across.

    “Whenever you take an annual mean you will inevitably smooth out cold winters …”

    No. In the southern hemisphere the opposite will happen but this gives me an idea for a quick demonstration.

    Maths used here is the standard meteorological, all I have done is shift the centre.

    The result ought to be identical but actually it shows the picket fence, now you see it, now you don’t, which is what I am getting at.

    This

    The second set of points is technically[1] necessary but the two traces have to be combined. Or do it properly in one.

    1. Given the 1 year average it is necessary to have at least two sample points per 1 year to reproduce the original line at that 1 year filter limit, Nyquist means this.

  15. Roger Andrews says:

    Tim:

    presents two figures. The first plots running 12-month CET means since 2000. It shows when the warmest and coldest 12-month periods occurred, no ifs, buts or probables, without regard to month start.

    The second superimposes the January through December and July through June annual mean plots. These plots simply alias the running monthly mean plot in different ways. An April through March plot would give you something different and a September through October plot something different again.

    But this is what you get when you average the data in different monthly batches. It isn’t symptomatic of a data analysis problem.

  16. tchannon says:

    Met Office and others use annual. This is not valid and misleads.

    Yes I agree eg. a running mean which uses a high number of output samples does not suffer this particular problem.

    The correct method if they insist on one point per year would be figure 1 top trace, which uses a necessary 2 year data filter.

    Doesn’t matter how it is cut up what they are doing is wrong. I am suggesting 3 samples a year if they really do want a 1 year filter, otherwise 2 year.

  17. Paul Vaughan says:

    wayne (March 28, 2013 at 6:37 am) wrote: “TimC, same principles I try to adhere to — “Please rest assured I do try my best and any error is not intended to mislead. I am prepared to risk some ridicule rather than play completely safe and show nothing.” Put it out there and see if something develops, sets off a lightbulb in someone else’s mind, it often does.”

    Agree.

    And that’s why it’s unethical to block it from being out there — it could be the trigger for someone else’s important revelation, even if it’s wrong.

    Good to see due consideration being given to aggregation criteria. Same applies to spatiotemporal sampling — an even messier can of worms.

    One of the things I’ve always found simultaneously goofy, tragic, & comical is just how many there are who are willing to completely ignore spatial aliasing (effectively pretending it doesn’t exist and/or it isn’t important) and then proceed to very rigidly mathematically conceptualize climate data analysis as a strictly temporal problem (in rather uppity, self-superior fashion). This puts one on both very firm & very weak ground simultaneously …perhaps ok on average(!) (/warm, light humor).

    …So maybe best to simply let everyone harmoniously have their say every day — and collectively we’ll find a way to sort it all out in the long run.

    Tim: I’m looking forward to articles on sunspot area asymmetry & arctic sea ice. You never know what revelations you might end up triggering, for example at NASA JPL, where solar & climate thinking is a few orders of magnitude clearer than in your typical university climate science department.

    best regards to all

  18. Sparks says:

    Tim,
    Thanks I think I understand a bit better, the further comments on it were very helpful, basically the met should use a better representative sample of their data when they produce their plots. am I getting closer? 🙂

  19. tchannon says:

    Sparks, yes. The trouble is with me although what I really want to say is so long and complex no-one would read.

    One thing I omitted to show but might help with context is showing the usage of annual.


    http://www.cru.uea.ac.uk/cru/info/warming/


    http://www.metoffice.gov.uk/hadobs/hadcet/

    I can reproduce this exactly. I’ve even got a large plot showing the errors, fairly dramatic but adds little here. Don’t though go taking CET too seriously.