How to access the M-Lab data

All data collected by M-Lab tools is intended to be made publicly available under a Creative Commons Zero license.

The M-Lab data can be accessed in two different ways:

  1. You can download the raw data (i.e. the data as collected by the M-Lab servers).
    • All the M-Lab raw data are organized into tarballs, organized by (1) tool that generated the data, (2) date when the data was collected, (3) server that collected the data. Each tarball contains all the data collected during a single day, by a single tool running on a single M-Lab server. If the data collected during a day, by one tool on one server are more than 1GB (uncompressed), those data are split into multiple tarballs of 1GB size max.
      • For example, the tarball 20090218T000000Z-mlab1-lga01-ndt-0000.tgz contains the first 1GB of data collected by all the NDT tests that were served by the M-Lab server mlab1-lga01 on Feb 18 2009.
    • The M-Lab tarballs are stored on Google StorageGoogle Storage is Google's cloud storage service (similar to Amazon S3) and provides different ways to access the M-Lab tarballs. In particular, it provides
  2. You can execute SQL queries against the M-Lab data using BigQuery, which allows you to efficiently run SQL queries against huge datasets. Currently, BigQuery supports only a portion of the M-Lab dataset (i.e., web100 logs collected by NDT and NPAD).  More details about the M-Lab data in BigQuery and how to query them can be found here. Note that Google BigQuery is a new service that requires sign-up and approval to access. If you're interested  in access to Google BigQuery, in addition to signing up please let us know.

Both the BigQuery and Google Storage M-Lab datasets are updated once a day.

M-Lab is also available to release the data via other tools. If you have alternative suggestions, please contact us.

Every tool logs data in different formats. You can find the information about each dataset and code to parse the raw data in the tool page.