Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/2 2020 queries #1098

Merged
merged 61 commits into from
Sep 23, 2020
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
989e103
SQL
gregorywolf Jul 22, 2020
58fcb45
SQL for 2020 Web Almanac HTTP/2
gregorywolf Jul 22, 2020
6a583a4
2020 SQL
gregorywolf Jul 23, 2020
14655bf
HTTP/2 2020
gregorywolf Jul 26, 2020
ec6719e
Update measure_number_of_tcp_connections_per_site.sql
gregorywolf Jul 29, 2020
5561308
Create new query - TLS 1.3 Adoption for H2
gregorywolf Aug 4, 2020
ac0848e
Updated SQL
gregorywolf Sep 10, 2020
35c7ebc
Update percent_of_sites_affected_by_cdn_prioritization_issues.sql
gregorywolf Sep 11, 2020
d9e573a
Updated SQL Queries
gregorywolf Sep 11, 2020
2de2d93
Merge branch 'main' of https://github.com/gregorywolf/almanac.httparc…
gregorywolf Sep 11, 2020
a52ed08
Updated Queries
gregorywolf Sep 12, 2020
24a27cc
Update sql/2020/22_HTTP_2/adoption_of_http_2_by_site_and_requests.sql
gregorywolf Sep 18, 2020
95a9f43
Update sql/2020/22_HTTP_2/number_of_h2_pushed_resources_and_avg_bytes…
gregorywolf Sep 18, 2020
be82e44
Update sql/2020/22_HTTP_2/measure_of_all_http_versions_for_main_page_…
gregorywolf Sep 18, 2020
acb47de
Update sql/2020/22_HTTP_2/measure_number_of_tcp_connections_per_site.sql
gregorywolf Sep 18, 2020
c07c48f
Update sql/2020/22_HTTP_2/detailed_upgrade_headers.sql
gregorywolf Sep 18, 2020
545a43b
Update sql/2020/22_HTTP_2/detailed_alt_svc_headers.sql
gregorywolf Sep 18, 2020
80c51d8
Update sql/2020/22_HTTP_2/count_of_non_h2_sites_grouped_by_server.sql
gregorywolf Sep 18, 2020
62bd9de
Update sql/2020/22_HTTP_2/count_of_h2_sites_grouped_by_server.sql
gregorywolf Sep 18, 2020
bd0f2ee
Update sql/2020/22_HTTP_2/count_of_preload_http_headers_with_nopush_a…
gregorywolf Sep 18, 2020
5876cb7
Update sql/2020/22_HTTP_2/count_of_h2_sites_using_h2_push.sql
gregorywolf Sep 18, 2020
e34f3a7
Update percent_of_sites_affected_by_cdn_prioritization_issues.sql
gregorywolf Sep 18, 2020
37b4d69
Update percent_of_resources_loaded_over_h2_or_h1_1_per_site.sql
gregorywolf Sep 18, 2020
a4eed40
Update detailed_upgrade_headers.sql
gregorywolf Sep 18, 2020
85baf32
Update number_of_h2_pushed_resources_and_avg_bytes_by_content_type.sql
gregorywolf Sep 20, 2020
2c95cbd
Update number_of_http_sites_returning_upgrade_http_header_containing_…
gregorywolf Sep 20, 2020
f726246
Update number_of_http_sites_returning_upgrade_http_header_containing_…
gregorywolf Sep 20, 2020
f48a281
Rename number_of_http_sites_returning_upgrade_http_header_containing_…
gregorywolf Sep 20, 2020
55a68a1
Update number_of_http_requests_returning_upgrade_http_header_containi…
gregorywolf Sep 20, 2020
9dbce2e
Update and rename number_of_https_sites_using_h2_returning_upgrade_ht…
gregorywolf Sep 20, 2020
9275b06
Update and rename number_of_https_sites_not_using_h2_returning_upgrad…
gregorywolf Sep 20, 2020
28926e7
Update and rename percent_of_resources_loaded_over_h2_or_h1_1_per_sit…
gregorywolf Sep 20, 2020
0b2ec2e
Update percent_of_sites_affected_by_cdn_prioritization_issues.sql
gregorywolf Sep 20, 2020
09dfeef
Update number_of_http_requests_returning_upgrade_http_header_containi…
gregorywolf Sep 20, 2020
d0168e7
Update number_of_https_requests_not_using_h2_returning_upgrade_http_u…
gregorywolf Sep 20, 2020
7a95413
Update number_of_https_requests_using_h2_returning_upgrade_http_heade…
gregorywolf Sep 20, 2020
79c6b89
Update number_of_http_requests_returning_upgrade_http_header_containi…
gregorywolf Sep 21, 2020
c94f12b
Update number_of_https_requests_not_using_h2_returning_upgrade_http_u…
gregorywolf Sep 21, 2020
12087f2
Update and rename tls_1_3_adoption_for_h2.sql to tls_adoption_by_http…
gregorywolf Sep 21, 2020
0914a0b
Update count_of_h2_sites_grouped_by_server.sql
gregorywolf Sep 21, 2020
ef55f96
Update and rename count_of_h2_sites_grouped_by_server.sql to count_of…
gregorywolf Sep 21, 2020
81e1cd8
Update and rename count_of_h2_sites_using_h2_push.sql to count_of_h2_…
gregorywolf Sep 21, 2020
139b2fc
Update and rename count_of_h2_and_h3_sites_using_h2_push.sql to count…
gregorywolf Sep 21, 2020
df8643e
Update count_of_h2_and_h3_sites_using_push.sql
gregorywolf Sep 21, 2020
51c6f4d
Update and rename count_of_non_h2_sites_grouped_by_server.sql to coun…
gregorywolf Sep 21, 2020
30c6bb9
Update and rename number_of_h2_pushed_resources_and_avg_bytes.sql to …
gregorywolf Sep 21, 2020
a3334cb
Update and rename number_of_h2_pushed_resources_and_avg_bytes_by_cont…
gregorywolf Sep 21, 2020
05c918b
Update and rename percent_of_sites_affected_by_cdn_prioritization_iss…
gregorywolf Sep 21, 2020
2fc25e2
Update and rename number_of_https_requests_not_using_h2_returning_upg…
gregorywolf Sep 22, 2020
68f64ef
Update and rename percent_of_resources_loaded_over_HTTP_by_version_pe…
gregorywolf Sep 22, 2020
3f149ed
Update detailed_upgrade_headers.sql
gregorywolf Sep 22, 2020
ab8f7f3
Update detailed_upgrade_headers.sql
gregorywolf Sep 22, 2020
d4ba697
Update number_of_http_requests_returning_upgrade_http_header_containi…
gregorywolf Sep 22, 2020
f3f3ce1
Update tls_adoption_by_http_version.sql
gregorywolf Sep 22, 2020
57e02d0
Update sql/2020/22_HTTP_2/number_of_h2_and_h3_pushed_resources_and_by…
gregorywolf Sep 22, 2020
d54b21e
Update sql/2020/22_HTTP_2/number_of_http_requests_returning_upgrade_h…
gregorywolf Sep 22, 2020
df6bf86
Update sql/2020/22_HTTP_2/number_of_https_requests_not_using_h2_or_h3…
gregorywolf Sep 22, 2020
3d76b85
Update sql/2020/22_HTTP_2/number_of_https_requests_using_h2_returning…
gregorywolf Sep 22, 2020
6e1f281
Update percentage_of_resources_loaded_over_HTTP_by_version_per_site.sql
gregorywolf Sep 22, 2020
9010146
Update sql/2020/22_HTTP_2/number_of_h2_and_h3_pushed_resources_and_by…
gregorywolf Sep 22, 2020
03e9bd1
Update sql/2020/22_HTTP_2/percentage_of_resources_loaded_over_HTTP_by…
gregorywolf Sep 22, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions sql/2020/22_HTTP_2/adoption_of_http_2_by_site_and_requests.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#standardSQL
# 22_HTTP_2 - Adoption of HTTP/2 by site and requests
gregorywolf marked this conversation as resolved.
Show resolved Hide resolved
gregorywolf marked this conversation as resolved.
Show resolved Hide resolved
SELECT
client,
firstHtml,
JSON_EXTRACT_SCALAR(payload, "$._protocol") AS http_version,
COUNT(*) AS num_requests,
gregorywolf marked this conversation as resolved.
Show resolved Hide resolved
ROUND(COUNT(0) * 100 / SUM(COUNT(0)) OVER (PARTITION BY client), 2) AS pct
gregorywolf marked this conversation as resolved.
Show resolved Hide resolved
FROM
`httparchive.almanac.requests`
WHERE
date='2020-08-01'
GROUP BY
client,
firstHtml,
http_version
ORDER BY
client,
firstHtml,
http_version
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#standardSQL
# 22_HTTP_2 - Average number of HTTP/2 Pushed Resources and Average Bytes
SELECT
client,
COUNT(DISTINCT page) AS num_pages,
ROUND(AVG(num_requests),2) AS avg_pushed_requests,
gregorywolf marked this conversation as resolved.
Show resolved Hide resolved
ROUND(AVG(kb_transfered),2) AS avg_kb_transfered
FROM (

SELECT
gregorywolf marked this conversation as resolved.
Show resolved Hide resolved
client,
page,
SUM(CAST(JSON_EXTRACT_SCALAR(payload, "$._bytesIn") AS INT64)/1024) AS kb_transfered,
COUNT(*) AS num_requests
gregorywolf marked this conversation as resolved.
Show resolved Hide resolved
FROM
`httparchive.almanac.requests`
WHERE
JSON_EXTRACT_SCALAR(payload, "$._protocol") = "HTTP/2"
AND JSON_EXTRACT_SCALAR(payload, "$._was_pushed") = "1"
AND date='2020-08-01'
GROUP BY
client,
page
)
GROUP BY
client
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#standardSQL
# 22_HTTP_2 - Average number of HTTP/2 Pushed Resources and Average Bytes by Content type
SELECT
client,
content_type,
COUNT(DISTINCT page) AS num_pages,
ROUND(AVG(num_requests),2) AS avg_pushed_requests,
ROUND(AVG(kb_transfered),2) AS avg_kb_transfered
FROM (

SELECT
client,
page,
JSON_EXTRACT_SCALAR(payload, "$._contentType") as content_type,
SUM(CAST(JSON_EXTRACT_SCALAR(payload, "$._bytesIn") AS INT64)/1024) AS kb_transfered,
COUNT(*) AS num_requests
FROM
`httparchive.almanac.requests`
WHERE
JSON_EXTRACT_SCALAR(payload, "$._protocol") = "HTTP/2"
AND JSON_EXTRACT_SCALAR(payload, "$._was_pushed") = "1"
AND date='2020-08-01'
GROUP BY
client,
page,
content_type
)
GROUP BY
client,
content_type
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#standardSQL
# 22_HTTP_2 - Average percentage of resources loaded over HTTP/2 or HTTP/1.1 per site
SELECT
client,
ROUND(AVG(http_1_1/num_requests) * 100,2) avg_pct_http_1_1,
ROUND(AVG(http_2/num_requests) * 100,2) avg_pct_http_2
FROM (
SELECT
client,
page,
COUNT(*) AS num_requests,
SUM(IF(JSON_EXTRACT_SCALAR(payload, "$._protocol") ="http/0.9",1,0)) AS http_0_9,
SUM(IF(JSON_EXTRACT_SCALAR(payload, "$._protocol") ="http/1.0",1,0)) AS http_1_0,
SUM(IF(JSON_EXTRACT_SCALAR(payload, "$._protocol") ="http/1.1",1,0)) AS http_1_1,
SUM(IF(JSON_EXTRACT_SCALAR(payload, "$._protocol") ="HTTP/2",1,0)) AS http_2
FROM
`httparchive.almanac.requests`
WHERE
date='2020-08-01'
GROUP BY
client,
page
)
GROUP BY
client
35 changes: 35 additions & 0 deletions sql/2020/22_HTTP_2/count_of_h2_sites_grouped_by_server.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#standardSQL
# 22_HTTP_2 - Count of HTTP/2 Sites Grouped By Server
CREATE TEMPORARY FUNCTION getServerHeader(payload STRING)
RETURNS STRING
LANGUAGE js AS """
try {
var $ = JSON.parse(payload);
var headers = $.response.headers;
// Find server header
var st = headers.find(function(e) {
return e['name'].toLowerCase() === 'server'
});
// Remove everything after / in the server header value and return
return st['value'].split("/")[0];
} catch (e) {
return '';
}
""";

SELECT
client,
getServerHeader(payload) AS server_header,
COUNT(*) AS num_pages,
ROUND(COUNT(0) * 100 / SUM(COUNT(0)) OVER (PARTITION BY client), 2) AS pct
FROM
`httparchive.almanac.requests`
WHERE
firstHtml
AND JSON_EXTRACT_SCALAR(payload, "$._protocol") = "HTTP/2"
AND date='2020-08-01'
GROUP BY
client,
server_header
ORDER BY
num_pages DESC
19 changes: 19 additions & 0 deletions sql/2020/22_HTTP_2/count_of_h2_sites_using_h2_push.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#standardSQL
# 22_HTTP_2 - Count of HTTP/2 Sites using HTTP/2 Push
SELECT
client,
COUNT(DISTINCT page) AS num_pages
FROM (

SELECT
client,
page
FROM
`httparchive.almanac.requests`
WHERE
JSON_EXTRACT_SCALAR(payload, "$._protocol") = "HTTP/2"
AND JSON_EXTRACT_SCALAR(payload, "$._was_pushed") = "1"
AND date='2020-08-01'
)
GROUP BY
client
35 changes: 35 additions & 0 deletions sql/2020/22_HTTP_2/count_of_non_h2_sites_grouped_by_server.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#standardSQL
# 22_HTTP_2 - Count of non-HTTP/2 Sites Grouped By Server
CREATE TEMPORARY FUNCTION getServerHeader(payload STRING)
RETURNS STRING
LANGUAGE js AS """
try {
var $ = JSON.parse(payload);
var headers = $.response.headers;
// Find server header
var st = headers.find(function(e) {
return e['name'].toLowerCase() === 'server'
});
// Remove everything after / in the server header value and return
return st['value'].split("/")[0];
} catch (e) {
return '';
}
""";

SELECT
client,
getServerHeader(payload) AS server_header,
COUNT(*) AS num_pages,
ROUND(COUNT(0) * 100 / SUM(COUNT(0)) OVER (PARTITION BY client), 2) AS pct
FROM
`httparchive.almanac.requests`
WHERE
firstHtml
AND JSON_EXTRACT_SCALAR(payload, "$._protocol") != "HTTP/2"
AND date='2020-08-01'
GROUP BY
client,
server_header
ORDER BY
num_pages DESC
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#standardSQL
# 22_HTTP_2 - Count of preload HTTP Headers with nopush attribute set. Once off stat for last crawl
CREATE TEMPORARY FUNCTION getLinkHeaders(payload STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
var $ = JSON.parse(payload);
var headers = $.response.headers;
var preload=[];

for (i in headers) {
if (headers[i].name.toLowerCase() === 'link')
preload.push(headers[i].value);
}
return preload;

""";

SELECT
client,
firstHtml,
COUNT(*) as num_requests,
ROUND(COUNT(0) * 100 / SUM(COUNT(0)) OVER (PARTITION BY client), 2) AS pct
FROM (
SELECT
client,
firstHtml,
getLinkHeaders(payload) AS link_headers
FROM
`httparchive.almanac.requests`
WHERE
date='2020-08-01'
)
CROSS JOIN
UNNEST(link_headers) AS link_header
WHERE
link_header LIKE '%preload%'
AND link_header LIKE '%nopush%'
GROUP BY
client,
firstHtml
34 changes: 34 additions & 0 deletions sql/2020/22_HTTP_2/detailed_alt_svc_headers.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#standardSQL
# 22_HTTP_2 - Detailed alt-svc headers
CREATE TEMPORARY FUNCTION getUpgradeHeader(payload STRING)
RETURNS STRING
LANGUAGE js AS """
try {
var $ = JSON.parse(payload);
var headers = $.response.headers;
var st = headers.find(function(e) {
return e['name'].toLowerCase() === 'alt-svc'
});
return st['value'];
} catch (e) {
return '';
}
""";

SELECT
client,
firstHtml,
JSON_EXTRACT_SCALAR(payload, "$._protocol") AS protocol,
IF(url LIKE "https://%","https","http") AS http_or_https,
getUpgradeHeader(payload) AS upgrade,
COUNT(*) AS num_requests
FROM
`httparchive.almanac.requests`
WHERE
date='2020-08-01'
GROUP BY
client,
firstHtml,
protocol,
http_or_https,
upgrade
34 changes: 34 additions & 0 deletions sql/2020/22_HTTP_2/detailed_upgrade_headers.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#standardSQL
# 22_HTTP_2 - Detailed upgrade headers for 20.04, 20.05 and 20.06
CREATE TEMPORARY FUNCTION getUpgradeHeader(payload STRING)
RETURNS STRING
LANGUAGE js AS """
try {
var $ = JSON.parse(payload);
var headers = $.response.headers;
var st = headers.find(function(e) {
return e['name'].toLowerCase() === 'upgrade'
});
return st['value'];
} catch (e) {
return '';
}
""";

SELECT
client,
firstHtml,
JSON_EXTRACT_SCALAR(payload, "$._protocol") AS protocol,
IF(url LIKE "https://%","https","http") AS http_or_https,
getUpgradeHeader(payload) AS upgrade,
COUNT(*) AS num_requests
FROM
`httparchive.almanac.requests`
WHERE
date='2020-08-01'
GROUP BY
client,
firstHtml,
protocol,
http_or_https,
upgrade
41 changes: 41 additions & 0 deletions sql/2020/22_HTTP_2/measure_number_of_tcp_connections_per_site.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#standardSQL
# 22_HTTP_2 - Measure number of TCP Connections per site.
SELECT
"mobile" AS client,
JSON_EXTRACT_SCALAR(payload, "$._protocol") AS protocol,
COUNT(*) AS num_pages,
APPROX_QUANTILES(_connections, 100)[SAFE_ORDINAL(50)] AS median,
APPROX_QUANTILES(_connections, 100)[SAFE_ORDINAL(75)] AS p75,
APPROX_QUANTILES(_connections, 100)[SAFE_ORDINAL(95)] AS p95
FROM
`httparchive.requests.2020_08_01_mobile` AS requests
INNER JOIN
`httparchive.summary_pages.2020_08_01_mobile` AS summary
gregorywolf marked this conversation as resolved.
Show resolved Hide resolved
ON
requests.url = summary.url
WHERE
JSON_EXTRACT_SCALAR(payload, "$._is_base_page") = "true"
gregorywolf marked this conversation as resolved.
Show resolved Hide resolved
GROUP BY
client,
protocol

UNION ALL

SELECT
"desktop" AS client,
JSON_EXTRACT_SCALAR(payload, "$._protocol") AS protocol,
COUNT(*) AS num_pages,
APPROX_QUANTILES(_connections, 100)[SAFE_ORDINAL(50)] AS median,
APPROX_QUANTILES(_connections, 100)[SAFE_ORDINAL(75)] AS p75,
APPROX_QUANTILES(_connections, 100)[SAFE_ORDINAL(95)] AS p95
FROM
`httparchive.requests.2020_08_01_desktop` AS requests
INNER JOIN
`httparchive.summary_pages.2020_08_01_desktop` AS summary
ON
requests.url = summary.url
WHERE
JSON_EXTRACT_SCALAR(payload, "$._is_base_page") = "true"
GROUP BY
client,
protocol
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#standardSQL
# 22_HTTP_2 - Measure of all HTTP versions (0.9, 1.0, 1.1, 2, QUIC) for main page of all sites, and for HTTPS sites. Table for last crawl.
SELECT
client,
JSON_EXTRACT_SCALAR(payload, "$._protocol") AS protocol,
COUNT(0) AS num_pages,
SUM(COUNT(0)) OVER (PARTITION BY client) AS total,
COUNTIF(url LIKE "https://%") AS num_https_pages,
ROUND(COUNT(0) * 100 / SUM(COUNT(0)) OVER (PARTITION BY client), 2) AS pct_pages,
ROUND(COUNTIF(url LIKE "https://%") * 100 / SUM(COUNT(0)) OVER (PARTITION BY client), 2) AS pct_https
FROM
`httparchive.almanac.requests`
WHERE
firstHtml
AND date='2020-08-01'
GROUP BY
client,
protocol
ORDER BY
num_pages / total DESC
Loading