In order to be as transparent as possible, here is the actual data that is shown under the climate trends tab:
|Download the raw WeatherSpark climate data|
The data was created by collecting the yearly statistics for each station. For each year and each station we compute the mean temperature, median temperature, and what fraction of the data was missing (FractionNan).
The file consists of a set of records. The records are separated by two newlines (" "). Each record consists of a header line and then several data lines, all separated by a single newline (" ").
The header line consists of entries separated by the pipe character ("|"). The third entry (second zero-based) is the date for the first data point in the data lines, and the fifth entry (fourth zero-based) is the station id. The others should be considered reserved.
The data lines consist of a label ("temperatureC/FractionNan"), a separating colon (":"), and a comma-separated list of numbers ("0.13,0.15,1"). Missing data points are encoded as "NaN" (Not-a-Number).
Header line, with date and station id in bold.
Data lines, with the series label, a colon, and the comma-separated data values.
Extra newline separating the records.
The lines have been shortened to simplify the presentation
We kindly request that you reference WeatherSpark as the source, together with a link to the site if you use this data.
If a station had more than 20 years with more than 20% of total outage (that's 73 days or 1752 hours of outage) before having at least 5 years with less than 20% of total outage, counting from today and going backwards in time, we have excluded it from the set. The thought here is that a station with so much outage for so long would just be so unreliable that it's not meaningful to include it in the data set.
The hourly data is not available for separate download at this time. You can however inspect it for any station you like by using the regular WeatherSpark interface.