Adding average temperatures to recordtemps.in

The most requested change for recordtemps.in (a site visited by tens of people) is that it should show averages alongside the records. A little context goes a long way.

I finally got around to adding this a few weeks ago:

It's not in the screenshot, but if you visit a page on recordtemps.in, you'll see that I added attribution at the bottom that says "Averages from NCDC 1981-2010 U.S. Climate Normals."

If I were starting from scratch, this would have been a lot of effort for an arguably small return. Luckily, I'd done some work previously to massage NCDC normals from their standard .txt form to a mongo database.

With NCDC normals at my fingertips, I made a plan:

  1. Get a list of all the ThreadEx stations.
  2. Find the closest NCDC station for each ThreadEx station.
  3. Query min, max, and average daily temperatures out of mongo for each station.
  4. Insert new "avg" records in the record temps database for the NCDC normals.
  5. Tweak the site to include the new data.

I had lots of options for point one but the easiest is to hit recordtemps.in's .json endpoint.

With all the stations as JSON, I could use mongo's geoNear to find the closest NCDC normal location for each ThreadEx station and use turf's distance() to calculate how far the closest NCDC location was to each ThreadEx location. I printed the results as a table as a sanity check.

After a brief review, I was happy that there was an NCDC location close enough for all ThreadEx stations.

Querying out min, max, and average for each location for was done with another script. I wrote those results to .json files so that I could later load into the record temps database.

Then I hit a wall. The record temps DB schema includes things like rank and whether or not a record was a tie. Neither apply for the temperature normals data. I also realized I'd need an additional query to get the normals data out which meant I'd be slowing things down since sqlite doesn't run queries in parallel.

I gave up on the idea of wedging normals data into the records database and decided to add an additional sqlite DB. All the code to do this ended up looking similar to the stuff in my threadex-to-sqlite repo. I went this route so that I could fire off all sqlite queries and build the page once they all finish. That ended up working pretty well. Pages for station records with average temperature data were coming back consistently in under one second.

There's a single commit in the repo showing the changes I made to run queries across databases as well as incorporate the new info in my express and handlebars code. Of course after that commit I had to tweak a few things and that's what the subsequent commits show.

Averages on every page that shows daily records is a huge improvement but I hope I'm not regretting the addition the next time ThreadEx adds stations. I'll figure that out when the time comes.