Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance 2020 queries #1091

Merged
merged 41 commits into from
Oct 12, 2020
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
8dd8a92
Selecting LH performance score. WIP
Jul 19, 2020
8892bb1
Selecting LH performance score. WIP
Jul 20, 2020
5ec7860
Selecting LH performance score. WIP
Jul 20, 2020
8e94984
Delta for LH5 and LH6 performance scores
Jul 22, 2020
a67e276
Delta for LH5 and LH6 performance scores
Jul 22, 2020
b111b5a
Delta for LH5 and LH6 performance scores
Jul 23, 2020
38580fa
web vitals aggregated
max-ostapenko Jul 23, 2020
7d007b7
Web Vitals query suggested in the discussion
Jul 24, 2020
aff681b
cwv per device, ect and country
max-ostapenko Jul 24, 2020
61a1093
offline origins
max-ostapenko Jul 24, 2020
4e61180
webvitals distribution by device
max-ostapenko Jul 25, 2020
d46e1b5
lcp score by month
max-ostapenko Jul 25, 2020
83235de
Merge branch 'main' of https://github.com/HTTPArchive/almanac.httparc…
max-ostapenko Jul 25, 2020
8e13885
Removed unused query
Jul 25, 2020
32e6a42
review updates
max-ostapenko Jul 26, 2020
d80737b
query name updated
max-ostapenko Jul 26, 2020
a5cf826
performance observer
max-ostapenko Jul 27, 2020
078c503
Distribution of LH5 and LH6 permormance scores on mobile
dooman87 Aug 9, 2020
605154e
FIX: Using abs() to calculate average change between LH5 and LH6
Aug 13, 2020
acc0b86
Updated tables and dates to use August data
Aug 17, 2020
41c760c
Update sql/2020/09_Performance/lh5_performance_score_distribution.sql
dooman87 Sep 21, 2020
d4ce1af
Update sql/2020/09_Performance/lh5_performance_score_distribution.sql
dooman87 Sep 21, 2020
6f683e4
Removed unused variables, formatting and using September table in que…
dooman87 Sep 21, 2020
818ab01
Formatting
dooman87 Sep 21, 2020
1ae70c0
updated dates
max-ostapenko Oct 2, 2020
0ecea02
median function update
max-ostapenko Oct 2, 2020
f52241f
* Using percentage to compare performance scores between LH5 and LH6.
dooman87 Oct 4, 2020
aeb4f15
Using raw distribution for performance score and percentiles for LH5 …
dooman87 Oct 4, 2020
5b0f2f9
review updates + lighthouse performance audits
max-ostapenko Oct 6, 2020
05b2ce2
year series fix
max-ostapenko Oct 6, 2020
62035e7
fix median precision
max-ostapenko Oct 6, 2020
0b46b3e
Update sql/2020/09_Performance/lighthouse_performace_audits.sql
max-ostapenko Oct 6, 2020
c18af83
Update sql/2020/09_Performance/web_vitals_distribution_by_device.sql
max-ostapenko Oct 6, 2020
9f5529a
use August data from crux tables
max-ostapenko Oct 6, 2020
e1880d1
merge changes from review suggestions
max-ostapenko Oct 6, 2020
706e358
Update sql/2020/09_Performance/lh6_vs_lh5_performance_score_01.sql
max-ostapenko Oct 10, 2020
2df9022
Update sql/2020/09_Performance/lh6_vs_lh5_performance_score_02.sql
max-ostapenko Oct 10, 2020
b84733d
Update sql/2020/09_Performance/lh6_vs_lh5_performance_score_01.sql
max-ostapenko Oct 10, 2020
b6c5aef
Update sql/2020/09_Performance/lh6_vs_lh5_performance_score_01.sql
max-ostapenko Oct 10, 2020
4f6598a
added yesr comparison
max-ostapenko Oct 10, 2020
b91ce5c
linting
max-ostapenko Oct 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions sql/2020/09_Performance/lh5_performance_score_distribution.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#standardSQL
# Distribution of LH5 performance score.

SELECT
JSON_EXTRACT_SCALAR(report, '$.categories.performance.score') AS score,
COUNT(0) AS freq,
SUM(COUNT(0)) OVER () AS total,
COUNT(0) / SUM(COUNT(0)) OVER () AS pct
FROM
`httparchive.lighthouse.2019_07_01_mobile`
GROUP BY
score
ORDER BY
score
14 changes: 14 additions & 0 deletions sql/2020/09_Performance/lh6_performance_score_distribution.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#standardSQL
# Distribution of LH6 performance score.

SELECT
dooman87 marked this conversation as resolved.
Show resolved Hide resolved
JSON_EXTRACT_SCALAR(report, '$.categories.performance.score') AS score,
COUNT(0) AS freq,
SUM(COUNT(0)) OVER () AS total,
COUNT(0) / SUM(COUNT(0)) OVER () AS pct
FROM
`httparchive.lighthouse.2020_09_01_mobile`
GROUP BY
score
ORDER BY
score
21 changes: 21 additions & 0 deletions sql/2020/09_Performance/lh6_vs_lh5_performance_score_01.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#standardSQL
# Calculates percentile for delta of LH5 and LH6 performance score for mobile

SELECT
percentile,
APPROX_QUANTILES(abs_perf_score_delta, 1000)[OFFSET(percentile * 10)] AS abs_perf_score_delta
FROM (
SELECT
ABS(perf_score_lh6 - perf_score_lh5) AS abs_perf_score_delta
max-ostapenko marked this conversation as resolved.
Show resolved Hide resolved
FROM
(
SELECT
CAST(JSON_EXTRACT(lh6.report, '$.categories.performance.score') AS NUMERIC) AS perf_score_lh6,
CAST(JSON_EXTRACT(lh5.report, '$.categories.performance.score') AS NUMERIC) AS perf_score_lh5
FROM `httparchive.lighthouse.2020_09_01_mobile` lh6
JOIN `httparchive.lighthouse.2020_05_01_mobile` lh5 ON lh5.url=lh6.url
max-ostapenko marked this conversation as resolved.
Show resolved Hide resolved
)
),
UNNEST([10, 25, 50, 75, 90]) AS percentile
max-ostapenko marked this conversation as resolved.
Show resolved Hide resolved
GROUP BY percentile
ORDER BY percentile
30 changes: 30 additions & 0 deletions sql/2020/09_Performance/lh6_vs_lh5_performance_score_02.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#standardSQL

# Calculates percentage of sites where the performance score changed low ( < 10), medium (10-30) or big (> 30) between
# LH 5 and 6 versions.

SELECT
SAFE_DIVIDE(small_change, small_change + mid_change + big_change) AS small,
SAFE_DIVIDE(mid_change, small_change + mid_change + big_change) AS avg,
SAFE_DIVIDE(big_change, small_change + mid_change + big_change) AS big
FROM (
SELECT
COUNTIF(perf_score_delta <= 0.1) AS small_change,
COUNTIF(perf_score_delta > 0.1 AND perf_score_delta <= 0.3) AS mid_change,
COUNTIF(perf_score_delta > 0.3) AS big_change
FROM
(
SELECT
perf_score_lh6,
perf_score_lh5,
(perf_score_lh6 - perf_score_lh5) AS perf_score_delta
FROM
(
SELECT
CAST(JSON_EXTRACT(lh6.report, '$.categories.performance.score') AS NUMERIC) AS perf_score_lh6,
CAST(JSON_EXTRACT(lh5.report, '$.categories.performance.score') AS NUMERIC) AS perf_score_lh5
FROM `httparchive.lighthouse.2020_09_01_mobile` lh6
JOIN `httparchive.lighthouse.2020_05_01_mobile` lh5 ON lh5.url=lh6.url
max-ostapenko marked this conversation as resolved.
Show resolved Hide resolved
)
)
)
42 changes: 42 additions & 0 deletions sql/2020/09_Performance/lighthouse_performace_audits.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#standardSQL
# Get summary of all lighthouse scores for a category
# Note scores, weightings, groups and descriptions may be off in mixed months when new versions of Lighthouse roles out

CREATE TEMPORARY FUNCTION getAudits(report STRING, category STRING)
RETURNS ARRAY<STRUCT<id STRING, weight INT64, audit_group STRING, title STRING, description STRING, score INT64>> LANGUAGE js AS '''
var $ = JSON.parse(report);
var auditrefs = $.categories[category].auditRefs;
var audits = $.audits;
$ = null;
var results = [];
for (auditref of auditrefs) {
results.push({
id: auditref.id,
weight: auditref.weight,
audit_group: auditref.group,
description: audits[auditref.id].description,
score: audits[auditref.id].score
});
}
return results;
''';

SELECT
audits.id AS id,
COUNTIF(audits.score > 0) AS num_pages,
COUNT(0) AS total,
COUNTIF(audits.score IS NOT NULL) AS total_applicable,
SAFE_DIVIDE(COUNTIF(audits.score > 0), COUNTIF(audits.score IS NOT NULL)) AS pct,
APPROX_QUANTILES(audits.weight, 1000)[OFFSET(500)] AS median_weight,
MAX(audits.audit_group) AS audit_group,
MAX(audits.description) AS description
FROM
`httparchive.lighthouse.2020_09_01_mobile`,
UNNEST(getAudits(report, "performance")) AS audits
WHERE
LENGTH(report) < 20000000 # necessary to avoid out of memory issues. Excludes 16 very large results
GROUP BY
audits.id
ORDER BY
median_weight DESC,
id
16 changes: 16 additions & 0 deletions sql/2020/09_Performance/median_lcp_score_by_month.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#standardSQL
# Median of Largest contentful paint 75% score by month

SELECT
date,
device,
APPROX_QUANTILES(p75_lcp, 1000 RESPECT NULLS)[OFFSET(500)] AS median_p75_lcp,
FROM
`chrome-ux-report.materialized.device_summary`
WHERE
date >= '2019-09-01' AND
date <= '2020-09-01'
device IN ('desktop','phone')
GROUP BY
date,
device
16 changes: 16 additions & 0 deletions sql/2020/09_Performance/offline_origins.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#standardSQL
# Offline origins

SELECT
total_origins,
offline_origins,
offline_origins/total_origins AS pct_offline_origins
FROM (
SELECT
COUNT(DISTINCT origin) AS total_origins,
COUNT(DISTINCT IF(offlineDensity > 0, origin, NULL)) AS offline_origins,
FROM
`chrome-ux-report.materialized.metrics_summary`
WHERE
date = '2020-09-01'
)
13 changes: 13 additions & 0 deletions sql/2020/09_Performance/performance_observer.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#standardSQL
# Percent of pages using Performance observer

SELECT
client,
num_urls,
total_urls,
pct_urls
FROM
`httparchive.blink_features.usage`
WHERE
yyyymmdd = '20200901' AND
feature = 'PerformanceObserverForWindow'
154 changes: 154 additions & 0 deletions sql/2020/09_Performance/web_vitals_by_country.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
#standardSQL
# Core WebVitals by country

CREATE TEMP FUNCTION IS_GOOD (good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS (
SAFE_DIVIDE(good, (good + needs_improvement + poor)) >= 0.75
);

CREATE TEMP FUNCTION IS_POOR (good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS (
SAFE_DIVIDE(poor, (good + needs_improvement + poor)) >= 0.25
);

CREATE TEMP FUNCTION IS_NI (good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS (
NOT IS_GOOD(good, needs_improvement, poor) AND
NOT IS_POOR(good, needs_improvement, poor)
);

CREATE TEMP FUNCTION IS_NON_ZERO (good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS (
good + needs_improvement + poor > 0
);

WITH
base AS (
SELECT
  origin,
country_code,

SUM(fast_fid) / SUM(fast_fid + avg_fid + slow_fid) AS fast_fid,
SUM(avg_fid) / SUM(fast_fid + avg_fid + slow_fid) AS avg_fid,
SUM(slow_fid) / SUM(fast_fid + avg_fid + slow_fid) AS slow_fid,

SUM(fast_lcp) / SUM(fast_lcp + avg_lcp + slow_lcp) AS fast_lcp,
SUM(avg_lcp) / SUM(fast_lcp + avg_lcp + slow_lcp) AS avg_lcp,
SUM(slow_lcp) / SUM(fast_lcp + avg_lcp + slow_lcp) AS slow_lcp,

SUM(small_cls) / SUM(small_cls + medium_cls + large_cls) AS small_cls,
SUM(medium_cls) / SUM(small_cls + medium_cls + large_cls) AS medium_cls,
SUM(large_cls) / SUM(small_cls + medium_cls + large_cls) AS large_cls,

SUM(fast_fcp) / SUM(fast_fcp + avg_fcp + slow_fcp) AS fast_fcp,
SUM(avg_fcp) / SUM(fast_fcp + avg_fcp + slow_fcp) AS avg_fcp,
SUM(slow_fcp) / SUM(fast_fcp + avg_fcp + slow_fcp) AS slow_fcp,

SUM(fast_ttfb) / SUM(fast_ttfb + avg_ttfb + slow_ttfb) AS fast_ttfb,
SUM(avg_ttfb) / SUM(fast_ttfb + avg_ttfb + slow_ttfb) AS avg_ttfb,
SUM(slow_ttfb) / SUM(fast_ttfb + avg_ttfb + slow_ttfb) AS slow_ttfb,

FROM
`chrome-ux-report.materialized.country_summary`
max-ostapenko marked this conversation as resolved.
Show resolved Hide resolved
WHERE
yyyymm = 202009
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

202009 doesn't exist yet, so this query returns 0 results. For consistency please use 202008 (August)

Suggested change
yyyymm = 202009
yyyymm = 202008

GROUP BY
origin,
country_code
)

SELECT
  `chrome-ux-report`.experimental.GET_COUNTRY(country_code) AS country,

COUNT(DISTINCT origin) AS total_origins,

SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_GOOD(fast_fid, avg_fid, slow_fid) AND
IS_GOOD(fast_lcp, avg_lcp, slow_lcp) AND
IS_GOOD(small_cls, medium_cls, large_cls), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(fast_fid, avg_fid, slow_fid) AND
IS_NON_ZERO(fast_lcp, avg_lcp, slow_lcp) AND
IS_NON_ZERO(small_cls, medium_cls, large_cls), origin, NULL))) AS pct_cwv_good,

SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_GOOD(fast_lcp, avg_lcp, slow_lcp), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(fast_lcp, avg_lcp, slow_lcp), origin, NULL))) AS pct_lcp_good,
SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_NI(fast_lcp, avg_lcp, slow_lcp), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(fast_lcp, avg_lcp, slow_lcp), origin, NULL))) AS pct_lcp_ni,
SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_POOR(fast_lcp, avg_lcp, slow_lcp), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(fast_lcp, avg_lcp, slow_lcp), origin, NULL))) AS pct_lcp_poor,

SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_GOOD(fast_fid, avg_fid, slow_fid), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(fast_fid, avg_fid, slow_fid), origin, NULL))) AS pct_fid_good,
SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_NI(fast_fid, avg_fid, slow_fid), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(fast_fid, avg_fid, slow_fid), origin, NULL))) AS pct_fid_ni,
SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_POOR(fast_fid, avg_fid, slow_fid), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(fast_fid, avg_fid, slow_fid), origin, NULL))) AS pct_fid_poor,

SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_GOOD(small_cls, medium_cls, large_cls), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(small_cls, medium_cls, large_cls), origin, NULL))) AS pct_cls_good,
SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_NI(small_cls, medium_cls, large_cls), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(small_cls, medium_cls, large_cls), origin, NULL))) AS pct_cls_ni,
SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_POOR(small_cls, medium_cls, large_cls), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(small_cls, medium_cls, large_cls), origin, NULL))) AS pct_cls_poor,

SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_GOOD(fast_fcp, avg_fcp, slow_fcp), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(fast_fcp, avg_fcp, slow_fcp), origin, NULL))) AS pct_fcp_good,
SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_NI(fast_fcp, avg_fcp, slow_fcp), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(fast_fcp, avg_fcp, slow_fcp), origin, NULL))) AS pct_fcp_ni,
SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_POOR(fast_fcp, avg_fcp, slow_fcp), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(fast_fcp, avg_fcp, slow_fcp), origin, NULL))) AS pct_fcp_poor,

SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_GOOD(fast_ttfb, avg_ttfb, slow_ttfb), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(fast_ttfb, avg_ttfb, slow_ttfb), origin, NULL))) AS pct_ttfb_good,
SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_NI(fast_ttfb, avg_ttfb, slow_ttfb), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(fast_ttfb, avg_ttfb, slow_ttfb), origin, NULL))) AS pct_ttfb_ni,
SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_POOR(fast_ttfb, avg_ttfb, slow_ttfb), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(fast_ttfb, avg_ttfb, slow_ttfb), origin, NULL))) AS pct_ttfb_poor,

FROM
base
GROUP BY
  country
Loading