How the mirror stats are produced

After I made the URL public on the opensuse mirror list, I have received a couple of emails asking how I produce the stats. There is no black magic, but there is no ready-to-go package you can just install.

Data collection

The Apache server access logs are archived daily. We use GeoIP data (e.g. Maxmind) to look up a country code for the IP address and add that to the logfile. Once a log file has been archived, we extract the data (timestamp, client address, country code, size, url), and load those into a mysql database.

Data extraction

As the mirror_access table grows, the queries get slower, which isn't really suitable for webpages. Instead of running queries straight from the webpage, we keep a separate table mirror_stats with the most pertinent daily numbers and generate the necessary page data once the database has been updated.

Country codes?

I expect someone will ask exactly how we use the GeoIP data. Well, we keep it available internally over DNS, served by rbldnsd. That makes it easily available from the command line (with "dig" or "host"), with a PHP script or even as a service: geoip.jessen.ch. There are many ways to skin a cat, this is merely one that works well for us.

tarball

I've put together a quick tarball with everything I think you'll need: tarball. Some assembly required.
Feel free to write to me if you're having trouble getting it to work.

Have a lot of fun!
Per Jessen