In our recent roadmap post, we shared a list of milestones that the team is working on this and last quarter. Our Datatype migration and Standardized Columns milestone references the gardener service, which maintains and reprocesses M-Lab data, as well as the UUID annotator, that generates and saves per-connection metadata as annotations to user-conducted measurements. This post provides more detailed information about how these services have annotated measurements with geographic and network information in the past and present, and expands on what current work is happening now as mentioned in our roadmap post.
Since May 2017, the M-Lab team has been working on an updated, open source pipeline, which pulls raw data from our servers, saves it to Google Cloud Storage, and then parses it into our BigQuery tables. The team is particularly excited about this update because it means that the pipeline no longer relies on closed source libraries.
M-Lab data is collected from distributed experiments hosted on servers all over the world, processed in a pipeline, and published for free in both raw and parsed (structured) formats. The back end processing component for this has served us well for many years, but it’s been showing its age recently. As M-Lab collects an increasing amount of data thanks to new partnerships, we have been concerned that it will not be as reliable.