Updates to Geo Filters

Changes to our geographic annotations have resulted in changes to geographic filters.

Takeaways

  • As of 2022-03-16, M-Lab client geographic annotations now refer to the ISO3166-2 Subdivision standard for geographic annotations for all NDT data types.
  • Queries that previously used client.Geo.Region to identify US states, should now use client.Geo.Subdivision1ISOCode instead. Queries filtering on client.Geo.Region will return “no data.”
  • If you ran queries prior to 2022-03-17 that used client.Geo.Region for dates between 2020-02 and 2022-03-17, you should rerun these queries using client.Geo.Subdivsion1ISOCode to get results that include ndt7.
  • These changes are part of a long-term effort to normalize and improve our geographic annotations by standardizing on ISO 3166-2 subdivision used by maxmind geo2 (vs FIPS-10-4 region coding used by maxmind geo1 formats).

Background

Since we began collecting data, M-Lab has used MaxMind’s free GeoLite geoip databases to annotate measurements. We call them client or server annotations, or “geographic annotations”. Until 2017-09, MaxMind published a “Geo1 format” that used the FIPS-10-4 standard for region codes. After 2017-09, Maxmind introduced the “Geo2 format” that adopted the ISO3166-2 Subdivision standard. These databases included field names that corresponded to these conventions. So, Geo1 included a “Region” field (with values from the FIPS standard), and Geo2 included “Subdivision1ISOCode” field (ISO3166-2 standard, e.g. Subdivision1ISOCode, Subdivision2ISOCode, etc). These standards are very different for some locales.

M-Lab’s Geo Annotations

geo-annotations-table

Since M-Lab data spans 2009 to the present, we had to make a choice when developing our Unified Views - should we mix conventions (i.e. use both “.Region” and “.Subdivision1ISOCode”) or standardize on one? We chose to standardize on the Geo2 ISO3166 conventions. However, our published data has used a mixture of these conventions since 2017-09, when the “Geo2 format” was introduced.

The introduction of ndt7, the Unified Views, and this mixture of annotation conventions resulted in only ndt5 data being returned when users ran queries that used client.Geo.Region. To ensure data includes all relevant results, queries run prior to 2022-03-17 for dates between 2020-02 and 2022-03-17 and used client.Geo.Region should be rerun.

On March 16, 2022 we released a version of etl-schema after which both ndt5 and ndt7, and therefore the unified views, use the Geo2 format based on ISO 3166-2. As a result, queries that previously used client.Geo.Region to identify US states should now use the filter client.Geo.Subdivision1ISOCode. Queries that use client.Geo.Region will return “no data.”

Back to Top