New ETL Pipeline, Transition to New BigQuery Tables
Since May 2017, the M-Lab team has been working on an updated, open source pipeline, which pulls raw data from our servers, saves it to Google Cloud Storage, and then parses it into our BigQuery tables. The team is particularly excited about this update because it means that the pipeline no longer relies on closed source libraries.
Paris Traceroute has a bug, and it causes some bad data
In December 2017, M-Lab was notified of oddities in the Paris Traceroute data. Upon investigation, a bug in the Paris Traceroute code was identified. The bug caused bad measurement data in 2.7% of the traceroutes since July 2016.
Modernizing the M-Lab Platform
When the M-Lab platform was initially launched in 2009, the software and operating system running on our servers used the best available boot management, virtualization, and kernel-level measurement instrumentation available. In the years since M-Lab’s initial launch, the state of system administration has improved dramatically. In 2017, the M-Lab team began work to upgrade the platform to adopt modern and flexible system administration components. This post provides a roadmap of that work.
Monitoring Interconnection Performance Since the Open Internet Order
As a platform committed to producing empirical data for the public, Measurement Lab (M-Lab) has historically supplied regulators and other governmental entities with technical facts pertinent rule-making processes. In our February 2015 submission to the FCC’s Open Internet docket, we committed to research on the state of broadband and performance impact of interconnection in the United States. Earlier this year, the FCC began the process of re-evaluating its authority over broadband Internet services, and opened a Notice of Proposed Rulemaking. This blogpost is a shortened version comments that M-Lab filed in the docket regarding its continued research on the impact of interconnection on consumer broadband. The full filing in the FCC docket includes an elaboration of our research with additional supporting evidence and charts.
Transitioning to a New Backend Pipeline and Data Availability
M-Lab data is collected from distributed experiments hosted on servers all over the world, processed in a pipeline, and published for free in both raw and parsed (structured) formats. The back end processing component for this has served us well for many years, but it’s been showing its age recently. As M-Lab collects an increasing amount of data thanks to new partnerships, we have been concerned that it will not be as reliable.