Skip to content

Latest commit

 

History

History

data-formats

OONI Data Formats

Authors Arturo Filastò et al.
Version 0.2.0
Maintainer Simone Basso

Overview

The output of OONI experiments (also known as nettests or simply tests) consists of a series of JSON documents separated by newline characters, also known as JSONL. Every JSON document within the JSONL MUST be a JSON object with a specific toplevel structure, also referred to as the base data format. Such data format provides for a place where experiments and test templates could write their own keys. (A test template is a routine that performs functionality common across several OONI experiments, e.g., fetching a web page using HTTP). Test templates have their own data format. Experiments have their own data format. Thus, the output of any experiment consists of the base data format, plus the data format of zero or more test templates, plus zero of more fields generated by the experiment itself. That is:

{
    "data_format_version": "0.2.0",
    "test_keys": {}
}

Of course, experiments MUST NOT use test_keys that conflict with the test keys reserved by the test templates. However, keys starting with x_ are always permitted anywhere. They are experimental and should not be relied upon. As a general rule, data consumers MUST be prepared for any field being null or missing; data producers SHOULD NOT omit fields (or emit nulls) unless this has been explicitly documented in the field description.

Data format version

The current data_format_version is 0.2.0. This applies only to the keys in the external envelope. Since 2020-04-06, the extensions top-level key describes the data formats contained inside the test_keys (see below).

Between November 2019 and April 2020, experimental versions of OONI probe had version numbers ranging from 0.2.1 to 0.4.0. Since 2020-04-06, the version is back again to 0.2.0. Because such larger version numbers were used by experimental versions of OONI the next major data format version will be 0.5.0.

Example

The following is a valid JSON that was edited for brevity.

{
  "annotations": {
    "platform": "macos",
  },
  "data_format_version": "0.2.0",
  "extensions": {
    "dnst": 0,
    "httpt": 0,
    "tcpconnect": 0
  },
  "input": null,
  "measurement_start_time": "2020-01-10 17:25:19",
  "probe_asn": "AS30722",
  "probe_cc": "IT",
  "probe_ip": "127.0.0.1",
  "report_id": "20200110T172519Z_AS30722_5UdG13d6rEfOVCTHEdMjuXGah8vF6dpShA0jditnrHCmH10o1K",
  "resolver_asn": "AS15169",
  "resolver_ip": "172.217.34.2",
  "resolver_network_name": "Google LLC",
  "software_name": "miniooni",
  "software_version": "0.1.0-dev",
  "test_keys": {
    "agent": "redirect",
    "queries": [
      {
        "answers": [
          {
            "answer_type": "A",
            "ipv4": "149.154.167.99",
            "ttl": null
          }
        ],
        "engine": "system",
        "failure": null,
        "hostname": "web.telegram.org",
        "query_type": "A",
        "resolver_hostname": null,
        "resolver_port": null,
        "resolver_address": ""
      }
    ],
    "requests": [
      {
        "failure": null,
        "request": {
          "body": "",
          "body_is_truncated": false,
          "headers_list": [[
              "Host", "149.154.171.5"
            ], [
              "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
            ], [
              "Content-Length", "0"
            ], [
              "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
            ], [
              "Accept-Language", "en-US;q=0.8,en;q=0.5"
            ]
          ],
          "headers": {
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US;q=0.8,en;q=0.5",
            "Content-Length": "0",
            "Host": "149.154.171.5",
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
          },
          "method": "POST",
          "tor": {
            "exit_ip": null,
            "exit_name": null,
            "is_tor": false
          },
          "url": "http://149.154.171.5/"
        },
        "response": {
          "body": "<html>\r\n<head><title>501 Not Implemented</title></head>\r\n<body bgcolor=\"white\">\r\n<center><h1>501 Not Implemented</h1></center>\r\n<hr><center>nginx/0.3.33</center>\r\n</body>\r\n</html>\r\n",
          "body_is_truncated": false,
          "code": 501,
          "headers_list": [[
              "Content-Length", "181"
            ], [
              "Server", "nginx/0.3.33"
            ], [
              "Date", "Fri, 10 Jan 2020 17:25:20 GMT"
            ], [
              "Content-Type", "text/html"
            ]
          ],
          "headers": {
            "Content-Length": "181",
            "Content-Type": "text/html",
            "Date": "Fri, 10 Jan 2020 17:25:20 GMT",
            "Server": "nginx/0.3.33"
          }
        }
      }
    ],
    "tcp_connect": [
      {
        "ip": "149.154.171.5",
        "port": 80,
        "status": {
          "failure": null,
          "success": true
        }
      }
    ],
    "telegram_http_blocking": false,
    "telegram_tcp_blocking": false,
    "telegram_web_failure": null,
    "telegram_web_status": "ok"
  },
  "test_name": "telegram",
  "test_runtime": 4.426603178,
  "test_start_time": "2020-01-10 17:25:19",
  "test_version": "0.0.4"
}

In this example:

  • all toplevel keys belong to the base data format.

  • the agent and requests keys within the test_keys belong to the HTTP data format, which is declared as httpt in of the extensions map.

  • the queries key within the test_keys belongs to the DNS data format, which is declared as dnst in the extensions map.

  • the tcp_connect key within the test_keys belongs to the TCPConnect data format, which is declared as tcpconnect in the extensions map.

  • all the other keys within test_keys are generated by the telegram experiment.

Index

This directory contains the specification of the base data format as well as of the following extensions:

See the nettests directory for the experiments' specs.

History

  • 0.1.0 [2013-02-01]: original YAML format. New code MUST NOT use that.

  • 0.2.0 [2016-01-27]: the new JSON format. OONI Probe CLI v2.x and OONI Probe Mobile when using Measurement Kit as the measurement engine.

Between 2019-11-11 and 2020-04-06, experimental versions of OONI have used the following versions 0.2.1, 0.3.0, 0.3.1, 0.3.2, 0.3.3, and 0.4.0. Since 2020-04-06, 0.2.0 is again used by experimental and stable versions of OONI Probe.