How to access the M-Lab data
All data collected by M-Lab tools is intended to be made publicly available under a Creative Commons Zero license.
The M-Lab data can be accessed in two different ways:
- You can download the raw data (i.e. the data as collected by the M-Lab servers).
- All the M-Lab raw data are organized into tarballs, organized by (1) tool that generated the data, (2) date when the data was collected, (3) server that collected the data. Each tarball contains all the data collected during a single day, by a single tool running on a single M-Lab server. If the data collected during a day, by one tool on one server are more than 1GB (uncompressed), those data are split into multiple tarballs of 1GB size max.
- F
or example, the tarball 20090218T000000Z-mlab1-lga01-ndt-0000.tgz contains the first 1GB of data collected by all the NDT tests that were served by the M-Lab server mlab1-lga01 on Feb 18 2009.
- F
- The M-Lab tarballs are stored on Google Storage. Google Storage is Google's cloud storage service (similar to Amazon S3) and provides different ways to access the M-Lab tarballs. In particular, it provides
- A command line tool (gsutil) to
- List the content of a folder. E.g., gsutil ls -l gs://m-lab/ndt/
- Download one or more tarballs. E.g., gsutil cp gs://m-lab/ndt/2009/02/18/20090218T000000Z-mlab1-lga01-ndt-0000.tgz.
- A web-based interface to
- Browse the repository at https://storage.cloud.google.com/?arg=m-lab#m-lab.
-
Read the complete list of tarballs at https://storage.cloud.google.com/m-lab/list/all_mlab_tarfiles.txt.gz.
- Download each tarball using its public URL https://storage.cloud.google.com/m-lab/<tool name>/<year>/<month/<day>/<tarball name>.
-
For example, the file https://storage.cloud.google.com/m-lab/list/all_mlab_tarfiles.txt.gz contains the row
- gs://m-lab/ndt/2009/02/23/20090223T000000Z-mlab2-lga01-ndt-0000.tgz
-
that corresponds to a tarball whose url is
-
- Note that it is not possible to use wget or curl to download the data, as Google Storage requires authentication.
- A command line tool (gsutil) to
- All the M-Lab raw data are organized into tarballs, organized by (1) tool that generated the data, (2) date when the data was collected, (3) server that collected the data. Each tarball contains all the data collected during a single day, by a single tool running on a single M-Lab server. If the data collected during a day, by one tool on one server are more than 1GB (uncompressed), those data are split into multiple tarballs of 1GB size max.
- You can execute SQL queries against the M-Lab data using BigQuery, which allows you to efficiently run SQL queries against huge datasets. Currently, BigQuery supports only a portion of the M-Lab dataset (i.e., web100 logs collected by NDT and NPAD). More details about the M-Lab data in BigQuery and how to query them can be found here. Note that Google BigQuery is a new service that requires sign-up and approval to access. If you're interested in access to Google BigQuery, in addition to signing up please let us know.
Both the BigQuery and Google Storage M-Lab datasets are updated once a day.
M-Lab is also available to release the data via other tools. If you have alternative suggestions, please contact us.
Every tool logs data in different formats. You can find the information about each dataset and code to parse the raw data in the tool page.

