Rewriting History, Time and Time Again
By John Goetz
Update:
As noted in the comments below, GISS updated the GLB.Ts+dSST anomalies which show a large 0.67 degC value for March. This addition of March 2008 temperature data to the record caused a corresponding drop in annual average temperature for the years 1946 and 1903. According to GISS, 1946 is now colder than 1960 and 1972, and 1903 dropped into a tie with 1885, 1910 and 1912.
That’s really neat.
End update.
In February I wrote a post asking How much Estimation is too much Estimation? I pointed out that a large number of station records contained estimates for the annual average. Furthermore, the number of stations used to calculate the annual average had been dropping precipitously for the past 20 years. 
One was left to wonder just how accurate the reported global average really was and how meaningful rankings of the warmest years had become.
One question that popped into my mind back then was whether or not - with all of the estimation going on - the historical record was static. One could reasonably expect that the record is static. After all, once an estimate for a given year is calculated there is no reason to change it, correct? That would be true if your estimate did not rely on new data added to the record, in particular temperatures collected at a future date. But in the case of GISStemp, this is exactly what is done.
Last September I noted that an estimate of a seasonal or quarterly temperature when one month is missing from the record depends heavily on averages for all three months in that quarter. This can be expressed by the following equation, where {m}_{a}, {m}_{b}, {m}_{c} are the months in the quarter (in no particular order) and one of the three months {m}_{a} is missing:
{T}_{q,n} = \frac{1}{3}{\overline{T}}_{{m}_{a},N} + \frac{1}{2}\left({T}_{{m}_{b},n} + {T}_{{m}_{c},n}\right) - \frac{1}{6}\left({\overline{T}}_{{m}_{b},N} + \overline{T}}_{{m}_{c},N}\right)
In the above, T is temperature, q is the given quarter, n is the given year, and N is all years of the record.
One can readily see that as new temperatures are added to the record, the average monthly temperatures will change. Because those average monthly temperatures change, the estimated quarterly temperatures will change, as will the estimated annual averages.
Interestingly, application of the “bias method” used to combine a station’s scribal records can have a ripple effect all the way back to the beginning of a station’s history. This is because the first annual average in every scribal record is estimated, and the bias method relies on the overlap between all years of record, estimated or not. Recall that annual averages are calculated from December of the prior year through November of the current year. However, all scribal records begin in January (well, I have not found one that does not begin in January), so that first winter average is estimated due to the missing December value. Thus, with the bias method, at least one of the two records contains estimated annual values.
Of course, it is fair to ask whether or not this ultimately has any effect on the global annual averages reported by GISS. One does not have to look very hard to find out that the answer is “yes”.
On March 29 I downloaded the GLB.Ts.txt file from GISS and compared it to a copy I had from late August 2007. I was surprised to find several hundred differences in monthly temperature. Intrigued, I decided to take a trip back in time via the “Way Back Machine”.
Here I found 
32 versions of GLB.Ts.txt going back to
 September 24, 2005. I was a bit disappointed the record did not go back further, but was later surprised at how many historical changes can occur in a brief 2 1/2 years.The first thing I did was eliminate versions where no changes to the data were made. I then compared the number of monthly differences between the remaining sequential records and built the following table. Here I show the “Prior” record compared to the next sequential record (referred to as “Current”). The number of changes made to the monthly record between Prior and Current is shown in the “Updates” column (this column does not count additions to the record - only changes to existing data are counted). The number of valid months contained in the Prior record is in the “Months” column. “Change” is simply the percent Updates made to Months.
 On average 20% of the historical record was modified 16 times in the last 2 1/2 years.
On average 20% of the historical record was modified 16 times in the last 2 1/2 years. The largest single jump was 0.27 C. This occurred between the Oct 13, 2006 and Jan 15, 2007 records when Aug 2006 changed from an anomoly of +0.43C to +0.70C, a change of nearly 68%.
Wow.
The next question I had was “
how often are the months within specific years modified?” As can be seen in the next chart, a surprising number of the earliest monthly averages are modified time and again.
 
I was surprised at how much of the pre-Y2K temperature record changed! My personal favorite 
change was between the 
August 16, 2007 file and the 
March 29, 2008 file. 
Suddenly, in the later file, the J-D annual temperature for 1880 could now be calculated. In all previous versions the temperature could not be determined.
But some will want to know only how this process affects the rankings for the top 10 warmest years. Because the history goes back to the middle of 2005, I explored this question only for the years before 2005. While the overall ranking from top to bottom does change from one record to the other, the top 10 prior to 2005 does not change much. However, the top two do exchange position frequently, as can be seen from the following table:
 
I will note that the overall trend in changes between now and Sep. 24, 2005 is very close to zero. If one compares the latest file with the one from Sep 24, 2005, it can be seen that the earliest and latest years are adjusted lower today than in 2005, while the middle years are adjusted higher. However, this is purely coincidence. If one compares the file from Aug. 2007 with the latest file, it appears the earliest temperatures have been adjusted downward, leading to an overall upward trend. Surely other comparisons will yield a downward tend. It is by pure chance that we have selected two endpoint datasets that appear to have no effect on the tend.
It is at this point I would like to ask, does anyone have a copy of the GISS monthly and annual temperatures - the equivalent to GLB.Ts.txt - from a date earlier than Sep. 24, 2005?
In the meantime, will the real historical record please stand up?