Data from M-Lab Tools
All data collected through M-Lab is intended to be made publicly available under a Creative Commons Zero license.
How to access the M-Lab data
The M-Lab data can be accessed in two different ways:
- You can download the raw data (i.e. the data as collected by the M-Lab servers).
- All the M-Lab raw data are organized into tarballs, where each tarball contains all the data collected during a single day, by a single tool running on a single M-Lab server. If the data collected during a day, by one tool on one server are more than 1GB (uncompressed), those data are split into multiple tarballs of 1GB size max.
- F
or example, the tarball 20090218T000000Z-mlab1-lga01-ndt-0000.tgz contains the first 1GB of data collected by all the NDT tests that were served by the M-Lab server mlab1-lga01 on Feb 18 2009.
- F
- The M-Lab tarballs are stored on Google Storage. Google Storage is Google's cloud storage service (similar to Amazon EC2) and provides different ways to access the M-Lab tarballs. In particular, it provides
- A command line tool (gsutil) to
- List the content of a folder. E.g., gsutil ls -l gs://m-lab/ndt/
- Download one or more tarballs. E.g., gsutil cp gs://m-lab/ndt/20090218T000000Z-mlab1-lga01-ndt-0000.tgz.
- A web-based interface to
- Browse the repository at http://sandbox.google.com/storage/m-lab. NOTE that it only shows the first 100 objects in each folder.
- Read the complete list of tarballs at http://commondatastorage.googleapis.com/m-lab/list/all_mlab_tarfiles.txt.gz
- Download each tarball using its public URL http://commondatastorage.googleapis.com/m-lab/<tool name>/<tarball name>.
-
For example, the file http://commondatastorage.googleapis.com/m-lab/list/all_mlab_tarfiles.txt.gz contains the rowgs://m-lab/ndt/20090223T000000Z-mlab2-lga01-ndt-0000.tgz
-
that corresponds to a tarball whose url is
-
- A command line tool (gsutil) to
- All the M-Lab raw data are organized into tarballs, where each tarball contains all the data collected during a single day, by a single tool running on a single M-Lab server. If the data collected during a day, by one tool on one server are more than 1GB (uncompressed), those data are split into multiple tarballs of 1GB size max.
- You can execute SQL queries against the M-Lab data using BigQuery, which allows you to efficiently run SQL queries against huge datasets. Currently, BigQuery supports only a portion of the M-Lab dataset (i.e., web100 logs collected by NDT and NPAD). More details about the M-Lab data in BigQuery and how to query them can be found here. Note that Google BigQuery is a new service that requires sign-up and approval to access. If you're interested in access to Google BigQuery, in addition to signing up please let us know.
Both the BigQuery and Google Storage M-Lab datasets are updated once a month.
M-Lab is also available to release the data by other means and more frequently. If you have alternative suggestions or solutions, please contact us.
Data format
Every tool logs data in different formats. You can find the information about each dataset and code to parse the raw data at http://code.google.com/p/m-lab-research/.
















