Tuesday, December 01, 2009

This Global Warming Scandal

Clayton Cramer has combed through the comments at Eugene Volokh's blog and pulled out some interesting bits. (And bytes.)

 
 

Sent to you by Karl via Google Reader:

 
 

via Clayton Cramer's BLOG by Clayton on 12/1/09

This Global Warming Scandal

It just keeps getting worse. Along with statements by the scientists that strongly suggest intentional efforts to suppress alternative points of view and manipulation of the data to get the "right" results, the programs actually used to convert the raw data have a lot of very serious problems. Some of these problems are sloppy programming that raises serious questions as to whether the outputs can be trusted. The comments at Volokh Conspiracy are quite enlightening:

There are four major classes of such bugs known definitively to be in the code.

1) Use of static data or static places: This means the code sometimes stores things in a particular place, but always the same place. If two instances of the code are ever running at the same time, they will step on each other, producing incorrect output. This is the least serious, as someone could simply assure us that they never ran two instances at once. We'd have to trust them on this, but under normal circumstances, that wouldn't be a disaster.

2) Failure to test for error conditions. In many places, the code fails to test for obviously insane error conditions but just goes on processing. That means that if the input is somehow fundamentally broken, the output may be subtly broken. Again, we don't have the input to retest.

3) Reliance on the user to select the correct input sets. The code relies on the user to tell it what data to process and doesn't make sure the data is correct, it must trust the user. CRU had data sets with identical names but fundamentally different data. There's no way now to be sure uncorrected data wasn't used where corrected data was appropriate or that data wasn't corrected twice.

4) Reliance on the user to select the correct run-time options. During the run, many of the programs relied on the user to select the correct options as the run progressed. The options were not embedded in the results. A single mis-key could cause the output to be invalid.

Unfortunately, these types of defects combine in a multiplicative way. A mis-key during a run could result in subtly bad input that could cause an error condition that's not detected resulting in radically bad output that's not detected because of lack of input validation ...

The defenses of this bad programming are enough to make me wonder if I was some sort of weirdo is how I do my job (or did my job, back when I had one):

Minor quibble here: If the program gives garbage output for "obviously insane" or "fundamentally broken" inputs, this may not be the fault of the program. If I'd spent my time doing input validity tests on all the possible inputs to subroutines I wrote, I wouldn't get at real work done. At some time, you have to assume some good nature in others, and get on with the tasks at hand. After all, they're using your code, you aren't using their data..... Sounds cruel, but if you've worked in the field... If they pay me for taking care of their excessive stupidity, then again....

Not to mention, what do you care if the output is wrong for "fundamentally broken" input?

Perhaps if I had not written defensive code that checked inputs for validity, I would still be employed writing bad code? Or are the defenses of this absurdity just ad hoc, to make the Climate "Research" Unit look good?

Along with bad programming without dishonest intent, there seems to be dishonest intent as well. John Lott over at Fox News discusses this problem:
But the CRU's temperature data and all of the research done with it are now in question. The leaked e-mails show that the scientists at the CRU don't know how their data was put together. CRU took individual temperature readings at individual stations and averaged the information out to produce temperature readings over larger areas. The problem comes in how they did the averaging. One of the leaked documents states that "our flagship gridded data product is produced by [a method that] renders the station counts totally meaningless" and "so, we can have a proper result, but only by including a load of garbage!" There were also significant coding errors in the data. Weather stations that are claimed to exist in Canada aren't there -- leading one memo to speculate that the stations "were even invented somewhere other than Canada!"

The computer code used to create the data the CRU has used contains programmer notes that indicate that the aggregated data were constructed to show an increase in temperatures. The programmer notes include: "Apply a VERY ARTIFICAL correction for decline!!" and "Low pass filtering at century and longer time scales never gets rid of the trend -- so eventually I start to scale down the 120-yr low pass time series to mimic the effect of removing/adding longer time scales!" The programmers apparently had to try at least a couple of adjustments before they could get their aggregated data to show an increase in temperatures.

All this could in theory be correctable by going back and starting from scratch with the original "raw" data, but the CRU apparently threw out much of the data used to create their temperature measures. We now only have the temperature measures that they created.
Yet as Lott points out, the vast majority of American news media are barely covering this story, if they are covering it at all.

 
 

Things you can do from here:

 
 

No comments: