We deployed the new M-Lab platform to 1/3rd of the M-Lab fleet, and now we need to assess whether or not it is a performance regression, relative to the old platform. As long as we can be sure the performance of the new platform does not constitute a regression, then we can roll out the new platform and be confident that we have not made anything worse.
- Stock Linux 4.19 LTS kernels with modern TCP and Cubic congestion control
- Standard instrumentation for all experiments using tcp-info
- Virtualization and container management using Kubernetes and Docker
- Reimplementation of the NDT server
Last year, we outlined our plans to Modernize the M-Lab Platform. This year, we’re bringing them to life. Here’s a summary of why the platform update is so valuable and what you can expect throughout the year.
In December 2017, M-Lab was notified of oddities in the Paris Traceroute data, which we then wrote about in January 2018. Upon investigation, a bug in the Paris Traceroute code was identified. The bug caused bad measurement data in 2.7% of the traceroutes since July 2016.
In December 2017, M-Lab was notified of oddities in the Paris Traceroute data. Upon investigation, a bug in the Paris Traceroute code was identified. The bug caused bad measurement data in 2.7% of the traceroutes since July 2016.
When the M-Lab platform was initially launched in 2009, the software and operating system running on our servers used the best available boot management, virtualization, and kernel-level measurement instrumentation available. In the years since M-Lab’s initial launch, the state of system administration has improved dramatically. In 2017, the M-Lab team began work to upgrade the platform to adopt modern and flexible system administration components. This post provides a roadmap of that work.
M-Lab data is collected from distributed experiments hosted on servers all over the world, processed in a pipeline, and published for free in both raw and parsed (structured) formats. The back end processing component for this has served us well for many years, but it’s been showing its age recently. As M-Lab collects an increasing amount of data thanks to new partnerships, we have been concerned that it will not be as reliable.
In February 2017, M-Lab was notified of issues with the M-Lab data available in BigQuery. Upon investigation, a problem was identified with the Paris Traceroute collection daemon which resulted in a reduction in Paris Traceroute measurements beginning in June 2016. At the peak of the outage, fourth quarter 2016 - January 2017, approximately 5% of NDT tests had an associated Paris Traceroute test. Additionally, an issue within the data processing pipeline resulted in Paris Traceroute data that was measured and collected, not being inserted into the BigQuery tables and therefore available for use.
In August 2015, M-Lab was notified of potential degradation of site performance by a measurement partner based on discrepancies compared to results for their own servers. After a full investigation these patterns were found to have been caused by the unique confluence of several specific conditions. Interim remediation measures were taken in early October 2015, and the resolution of the degradation was confirmed by the partner and others. Due to these administrative actions, the episode, which we are calling the “switch discard issue,” has not affected testing conducted in the United States (the region impacted by this problem) since October 11, 2015, and thus measurements after this period are not affected by the incident. M-Lab has also conducted an evaluation of data collected during the time period in which the issue occurred, and has taken steps to remove affected measurements from its dataset. This incident will not affect use of its dataset, past or present, as a result.