Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/2 2020 queries #1098

Merged
merged 61 commits into from
Sep 23, 2020
Merged
Show file tree
Hide file tree
Changes from 59 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
989e103
SQL
gregorywolf Jul 22, 2020
58fcb45
SQL for 2020 Web Almanac HTTP/2
gregorywolf Jul 22, 2020
6a583a4
2020 SQL
gregorywolf Jul 23, 2020
14655bf
HTTP/2 2020
gregorywolf Jul 26, 2020
ec6719e
Update measure_number_of_tcp_connections_per_site.sql
gregorywolf Jul 29, 2020
5561308
Create new query - TLS 1.3 Adoption for H2
gregorywolf Aug 4, 2020
ac0848e
Updated SQL
gregorywolf Sep 10, 2020
35c7ebc
Update percent_of_sites_affected_by_cdn_prioritization_issues.sql
gregorywolf Sep 11, 2020
d9e573a
Updated SQL Queries
gregorywolf Sep 11, 2020
2de2d93
Merge branch 'main' of https://github.com/gregorywolf/almanac.httparc…
gregorywolf Sep 11, 2020
a52ed08
Updated Queries
gregorywolf Sep 12, 2020
24a27cc
Update sql/2020/22_HTTP_2/adoption_of_http_2_by_site_and_requests.sql
gregorywolf Sep 18, 2020
95a9f43
Update sql/2020/22_HTTP_2/number_of_h2_pushed_resources_and_avg_bytes…
gregorywolf Sep 18, 2020
be82e44
Update sql/2020/22_HTTP_2/measure_of_all_http_versions_for_main_page_…
gregorywolf Sep 18, 2020
acb47de
Update sql/2020/22_HTTP_2/measure_number_of_tcp_connections_per_site.sql
gregorywolf Sep 18, 2020
c07c48f
Update sql/2020/22_HTTP_2/detailed_upgrade_headers.sql
gregorywolf Sep 18, 2020
545a43b
Update sql/2020/22_HTTP_2/detailed_alt_svc_headers.sql
gregorywolf Sep 18, 2020
80c51d8
Update sql/2020/22_HTTP_2/count_of_non_h2_sites_grouped_by_server.sql
gregorywolf Sep 18, 2020
62bd9de
Update sql/2020/22_HTTP_2/count_of_h2_sites_grouped_by_server.sql
gregorywolf Sep 18, 2020
bd0f2ee
Update sql/2020/22_HTTP_2/count_of_preload_http_headers_with_nopush_a…
gregorywolf Sep 18, 2020
5876cb7
Update sql/2020/22_HTTP_2/count_of_h2_sites_using_h2_push.sql
gregorywolf Sep 18, 2020
e34f3a7
Update percent_of_sites_affected_by_cdn_prioritization_issues.sql
gregorywolf Sep 18, 2020
37b4d69
Update percent_of_resources_loaded_over_h2_or_h1_1_per_site.sql
gregorywolf Sep 18, 2020
a4eed40
Update detailed_upgrade_headers.sql
gregorywolf Sep 18, 2020
85baf32
Update number_of_h2_pushed_resources_and_avg_bytes_by_content_type.sql
gregorywolf Sep 20, 2020
2c95cbd
Update number_of_http_sites_returning_upgrade_http_header_containing_…
gregorywolf Sep 20, 2020
f726246
Update number_of_http_sites_returning_upgrade_http_header_containing_…
gregorywolf Sep 20, 2020
f48a281
Rename number_of_http_sites_returning_upgrade_http_header_containing_…
gregorywolf Sep 20, 2020
55a68a1
Update number_of_http_requests_returning_upgrade_http_header_containi…
gregorywolf Sep 20, 2020
9dbce2e
Update and rename number_of_https_sites_using_h2_returning_upgrade_ht…
gregorywolf Sep 20, 2020
9275b06
Update and rename number_of_https_sites_not_using_h2_returning_upgrad…
gregorywolf Sep 20, 2020
28926e7
Update and rename percent_of_resources_loaded_over_h2_or_h1_1_per_sit…
gregorywolf Sep 20, 2020
0b2ec2e
Update percent_of_sites_affected_by_cdn_prioritization_issues.sql
gregorywolf Sep 20, 2020
09dfeef
Update number_of_http_requests_returning_upgrade_http_header_containi…
gregorywolf Sep 20, 2020
d0168e7
Update number_of_https_requests_not_using_h2_returning_upgrade_http_u…
gregorywolf Sep 20, 2020
7a95413
Update number_of_https_requests_using_h2_returning_upgrade_http_heade…
gregorywolf Sep 20, 2020
79c6b89
Update number_of_http_requests_returning_upgrade_http_header_containi…
gregorywolf Sep 21, 2020
c94f12b
Update number_of_https_requests_not_using_h2_returning_upgrade_http_u…
gregorywolf Sep 21, 2020
12087f2
Update and rename tls_1_3_adoption_for_h2.sql to tls_adoption_by_http…
gregorywolf Sep 21, 2020
0914a0b
Update count_of_h2_sites_grouped_by_server.sql
gregorywolf Sep 21, 2020
ef55f96
Update and rename count_of_h2_sites_grouped_by_server.sql to count_of…
gregorywolf Sep 21, 2020
81e1cd8
Update and rename count_of_h2_sites_using_h2_push.sql to count_of_h2_…
gregorywolf Sep 21, 2020
139b2fc
Update and rename count_of_h2_and_h3_sites_using_h2_push.sql to count…
gregorywolf Sep 21, 2020
df8643e
Update count_of_h2_and_h3_sites_using_push.sql
gregorywolf Sep 21, 2020
51c6f4d
Update and rename count_of_non_h2_sites_grouped_by_server.sql to coun…
gregorywolf Sep 21, 2020
30c6bb9
Update and rename number_of_h2_pushed_resources_and_avg_bytes.sql to …
gregorywolf Sep 21, 2020
a3334cb
Update and rename number_of_h2_pushed_resources_and_avg_bytes_by_cont…
gregorywolf Sep 21, 2020
05c918b
Update and rename percent_of_sites_affected_by_cdn_prioritization_iss…
gregorywolf Sep 21, 2020
2fc25e2
Update and rename number_of_https_requests_not_using_h2_returning_upg…
gregorywolf Sep 22, 2020
68f64ef
Update and rename percent_of_resources_loaded_over_HTTP_by_version_pe…
gregorywolf Sep 22, 2020
3f149ed
Update detailed_upgrade_headers.sql
gregorywolf Sep 22, 2020
ab8f7f3
Update detailed_upgrade_headers.sql
gregorywolf Sep 22, 2020
d4ba697
Update number_of_http_requests_returning_upgrade_http_header_containi…
gregorywolf Sep 22, 2020
f3f3ce1
Update tls_adoption_by_http_version.sql
gregorywolf Sep 22, 2020
57e02d0
Update sql/2020/22_HTTP_2/number_of_h2_and_h3_pushed_resources_and_by…
gregorywolf Sep 22, 2020
d54b21e
Update sql/2020/22_HTTP_2/number_of_http_requests_returning_upgrade_h…
gregorywolf Sep 22, 2020
df6bf86
Update sql/2020/22_HTTP_2/number_of_https_requests_not_using_h2_or_h3…
gregorywolf Sep 22, 2020
3d76b85
Update sql/2020/22_HTTP_2/number_of_https_requests_using_h2_returning…
gregorywolf Sep 22, 2020
6e1f281
Update percentage_of_resources_loaded_over_HTTP_by_version_per_site.sql
gregorywolf Sep 22, 2020
9010146
Update sql/2020/22_HTTP_2/number_of_h2_and_h3_pushed_resources_and_by…
gregorywolf Sep 22, 2020
03e9bd1
Update sql/2020/22_HTTP_2/percentage_of_resources_loaded_over_HTTP_by…
gregorywolf Sep 22, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions sql/2020/22_HTTP_2/adoption_of_http_2_by_site_and_requests.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# standardSQL
# Adoption of HTTP/2 by site and requests
SELECT
client,
firstHtml,
JSON_EXTRACT_SCALAR(payload, '$._protocol') AS http_version,
COUNT(0) AS num_requests,
ROUND(COUNT(0) / SUM(COUNT(0)) OVER (PARTITION BY client), 4) AS pct
FROM
`httparchive.almanac.requests`
WHERE
date = '2020-08-01'
GROUP BY
client,
firstHtml,
http_version
ORDER BY
pct DESC
28 changes: 28 additions & 0 deletions sql/2020/22_HTTP_2/count_of_h2_and_h3_sites_grouped_by_server.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# standardSQL
# Count of H2 and H3 Sites Grouped By Server
SELECT
client,
JSON_EXTRACT_SCALAR(payload, '$._protocol') as http_version,
# Omit server version
NORMALIZE_AND_CASEFOLD(REGEXP_EXTRACT(resp_server, r'\s*([^/]*)\s*')) AS server_header,
COUNT(0) AS num_pages,
SUM(COUNT(0)) OVER (PARTITION BY client) AS total,
ROUND(COUNT(0) / SUM(COUNT(0)) OVER (PARTITION BY client), 4) AS pct
FROM
`httparchive.almanac.requests`
WHERE
date = '2020-08-01' AND
firstHtml AND
(LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "http/2" OR
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "%quic%" OR
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "h3%" OR
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "http/3%"
)
GROUP BY
client,
http_version,
server_header
HAVING
num_pages >= 100
ORDER BY
num_pages DESC
26 changes: 26 additions & 0 deletions sql/2020/22_HTTP_2/count_of_h2_and_h3_sites_using_push.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# standardSQL
# Count of H2 and H3 Sites using Push
SELECT
client,
http_version,
COUNT(DISTINCT IF(was_pushed, page, NULL)) AS num_pages_with_push,
COUNT(DISTINCT page) AS total,
COUNT(DISTINCT IF(was_pushed, page, NULL)) / COUNT(DISTINCT page) AS pct
FROM (
SELECT
client,
page,
JSON_EXTRACT_SCALAR(payload, '$._protocol') as http_version,
JSON_EXTRACT_SCALAR(payload, '$._was_pushed') = '1' AS was_pushed
FROM
`httparchive.almanac.requests`
WHERE
date = '2020-08-01' AND
(LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "http/2" OR
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "%quic%" OR
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "h3%" OR
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "http/3%")
)
GROUP BY
client,
http_version
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# standardSQL
# Count of non H2 and H3 Sites Grouped By Server
SELECT
client,
JSON_EXTRACT_SCALAR(payload, '$._protocol') as http_version,
# Omit server version
NORMALIZE_AND_CASEFOLD(REGEXP_EXTRACT(resp_server, r'\s*([^/]*)\s*')) AS server_header,
COUNT(0) AS num_pages,
SUM(COUNT(0)) OVER (PARTITION BY client) AS total,
ROUND(COUNT(0) / SUM(COUNT(0)) OVER (PARTITION BY client), 4) AS pct
FROM
`httparchive.almanac.requests`
WHERE
date = '2020-08-01' AND
firstHtml AND
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) NOT LIKE "http/2" AND
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) NOT LIKE "%quic%" AND
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) NOT LIKE "h3%" AND
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) NOT LIKE "http/3%"
GROUP BY
client,
http_version,
server_header
HAVING
num_pages >= 100
ORDER BY
num_pages DESC
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# standardSQL
# Count of preload HTTP Headers with nopush attribute set. Once off stat for last crawl
CREATE TEMPORARY FUNCTION getLinkHeaders(payload STRING)
RETURNS ARRAY<STRING> LANGUAGE js AS """
try {
var $ = JSON.parse(payload);
var headers = $.response.headers;
return headers.filter(h => h.name.toLowerCase() == 'link').map(h => h.value);
} catch (e) {
return [];
}
""";

SELECT
client,
COUNTIF(link_header LIKE '%nopush%') as num_nopush,
COUNT(0) AS total_preload,
ROUND(COUNTIF(link_header LIKE '%nopush%') / COUNT(0), 4) AS pct_nopush
FROM (
SELECT
client,
getLinkHeaders(payload) AS link_headers
FROM
`httparchive.almanac.requests`
WHERE
date = '2020-08-01' AND
firstHtml),
UNNEST(link_headers) AS link_header
WHERE
link_header LIKE '%preload%'
GROUP BY
client
36 changes: 36 additions & 0 deletions sql/2020/22_HTTP_2/detailed_alt_svc_headers.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# standardSQL
# Detailed alt-svc headers
CREATE TEMPORARY FUNCTION getUpgradeHeader(payload STRING)
RETURNS STRING LANGUAGE js AS """
try {
var $ = JSON.parse(payload);
var headers = $.response.headers;
return headers.find(h => h.name.toLowerCase() === 'alt-svc').value.trim();
} catch (e) {
return '';
}
""";

SELECT
client,
firstHtml,
JSON_EXTRACT_SCALAR(payload, '$._protocol') AS protocol,
IF(url LIKE 'https://%', 'https', 'http') AS http_or_https,
NORMALIZE_AND_CASEFOLD(getUpgradeHeader(payload)) AS upgrade,
COUNT(0) AS num_requests,
SUM(COUNT(0)) OVER (PARTITION BY client) AS total,
COUNT(0) / SUM(COUNT(0)) OVER (PARTITION BY client) AS pct
FROM
`httparchive.almanac.requests`
WHERE
date = '2020-08-01'
GROUP BY
client,
firstHtml,
protocol,
http_or_https,
upgrade
HAVING
num_requests >= 100
ORDER BY
num_requests DESC
25 changes: 25 additions & 0 deletions sql/2020/22_HTTP_2/detailed_upgrade_headers.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# standardSQL
# Detailed upgrade headers for 20.04, 20.05 and 20.06
SELECT
client,
firstHtml,
JSON_EXTRACT_SCALAR(payload, '$._protocol') AS protocol,
IF(url LIKE 'https://%', 'https', 'http') AS http_or_https,
regexp_extract(regexp_extract(respOtherHeaders, r'(?is)Upgrade = (.*)'), r'(?im)^([^=]*?)(?:, [a-z-]+ = .*)') IS NOT NULL AS upgrade,
COUNT(0) AS num_requests,
SUM(COUNT(0)) OVER (PARTITION BY client) AS total,
COUNT(0) / SUM(COUNT(0)) OVER (PARTITION BY client) AS pct
FROM
`httparchive.almanac.requests`
WHERE
date = '2020-08-01'
GROUP BY
client,
firstHtml,
protocol,
http_or_https,
upgrade
HAVING
num_requests >= 100
ORDER BY
num_requests DESC
36 changes: 36 additions & 0 deletions sql/2020/22_HTTP_2/measure_number_of_tcp_connections_per_site.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# standardSQL
# Measure number of TCP Connections per site.
SELECT
percentile,
client,
protocol,
COUNT(0) AS num_pages,
APPROX_QUANTILES(_connections, 1000)[OFFSET(percentile * 10)] AS connections
FROM (
SELECT
client,
page,
JSON_EXTRACT_SCALAR(payload, '$._protocol') AS protocol,
FROM
`httparchive.almanac.requests`
WHERE
date = '2020-08-01' AND
firstHtml)
JOIN (
SELECT
_TABLE_SUFFIX AS client,
url AS page,
_connections,
FROM
`httparchive.summary_pages.2020_08_01_*`)
USING
(client, page),
UNNEST([10, 25, 50, 75, 90]) AS percentile
GROUP BY
percentile,
client,
protocol
ORDER BY
percentile,
client,
protocol
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# standardSQL
# Measure of all HTTP versions (0.9, 1.0, 1.1, 2, QUIC) for main page of all sites, and for HTTPS sites. Table for last crawl.
SELECT
client,
JSON_EXTRACT_SCALAR(payload, '$._protocol') AS protocol,
COUNT(0) AS num_pages,
SUM(COUNT(0)) OVER (PARTITION BY client) AS total,
COUNTIF(url LIKE 'https://%') AS num_https_pages,
COUNT(0) / SUM(COUNT(0)) OVER (PARTITION BY client) AS pct_pages,
COUNTIF(url LIKE 'https://%') / SUM(COUNT(0)) OVER (PARTITION BY client) AS pct_https
FROM
`httparchive.almanac.requests`
WHERE
date = '2020-08-01' AND
firstHtml
GROUP BY
client,
protocol
ORDER BY
num_pages / total DESC
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# standardSQL
# Number of H2 and H3 Pushed Resources and bytes transferred
SELECT
percentile,
client,
http_version,
COUNT(DISTINCT page) AS num_pages,
APPROX_QUANTILES(num_requests, 1000)[OFFSET(percentile * 10)] AS pushed_requests,
APPROX_QUANTILES(kb_transfered, 1000)[OFFSET(percentile * 10)] AS kb_transfered,
FROM (
SELECT
client,
page,
JSON_EXTRACT_SCALAR(payload, '$._protocol') as http_version,
SUM(CAST(JSON_EXTRACT_SCALAR(payload, '$._bytesIn') AS INT64) / 1024) AS kb_transfered,
COUNT(0) AS num_requests
FROM
`httparchive.almanac.requests`
WHERE
date = '2020-08-01' AND
JSON_EXTRACT_SCALAR(payload, '$._was_pushed') = '1' AND
(LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "http/2" OR
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "%quic%" OR
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "h3%" OR
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "http/3%")
gregorywolf marked this conversation as resolved.
Show resolved Hide resolved
GROUP BY
client,
http_version,
page),
UNNEST([10, 25, 50, 75, 90]) AS percentile
gregorywolf marked this conversation as resolved.
Show resolved Hide resolved
GROUP BY
percentile,
client,
http_version
ORDER BY
percentile,
client
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# standardSQL
# Number of H2 and H3 Pushed Resources and Bytes by Content type
SELECT
percentile,
client,
http_version,
content_type,
COUNT(DISTINCT page) AS num_pages,
APPROX_QUANTILES(num_requests, 1000)[OFFSET(percentile * 10)] AS pushed_requests,
APPROX_QUANTILES(kb_transfered, 1000)[OFFSET(percentile * 10)] AS kb_transfered,
FROM (
SELECT
client,
page,
JSON_EXTRACT_SCALAR(payload, '$._protocol') as http_version,
JSON_EXTRACT_SCALAR(payload, "$._contentType") as content_type,
SUM(CAST(JSON_EXTRACT_SCALAR(payload, "$._bytesIn") AS INT64)/1024) AS kb_transfered,
COUNT(0) AS num_requests
FROM
`httparchive.almanac.requests`
WHERE
date='2020-08-01' AND
JSON_EXTRACT_SCALAR(payload, "$._was_pushed") = "1" AND
(LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "http/2" OR
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "%quic%" OR
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "h3%" OR
LOWER(JSON_EXTRACT_SCALAR(payload, "$._protocol")) LIKE "http/3%")
GROUP BY
client,
http_version,
page,
content_type),
UNNEST([10, 25, 50, 75, 90]) AS percentile
GROUP BY
percentile,
client,
http_version,
content_type
ORDER BY
percentile,
client
gregorywolf marked this conversation as resolved.
Show resolved Hide resolved
gregorywolf marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# standardSQL
# Number of HTTP (not HTTPS) requests which return upgrade HTTP header containing h2.
CREATE TEMPORARY FUNCTION getUpgradeHeader(payload STRING)
RETURNS STRING
LANGUAGE js AS """
try {
var $ = JSON.parse(payload);
var headers = $.response.headers;
var st = headers.find(function(e) {
return e['name'].toLowerCase() === 'upgrade'
});
return st['value'];
} catch (e) {
return '';
}
""";
SELECT
client,
firstHtml,
JSON_EXTRACT_SCALAR(payload, '$._protocol') as http_version,
COUNTIF(getUpgradeHeader(payload) LIKE "%h2%") AS num_requests,
COUNT(0) AS total
FROM
`httparchive.almanac.requests`
WHERE
date='2020-08-01' AND
url LIKE "http://%"
GROUP BY
client,
firstHtml,
http_version
Loading