Prepare for migration of report queries #30

max-ostapenko · 2024-11-16T23:38:35Z

We want to replace legacy reports script https://github.com/HTTPArchive/bigquery/tree/master/sql
This is the implementation that follows previous discussion.

TODO list:

trigger a tag that kicks out reports preparation on crawl_complete
run aggregated table updates with fresh reports data (dataset reports)
create reports configuration file with one timeseries and one histogram
trigger GCS upload whenever data is updated in BQ

Supports features:

Run monthly histograms SQLs when crawl is finished
[?] Run longer term time series SQLs when crawl is finished
Be able to run the time series in an incremental fashion
[?] Handle different lenses (Top X, WordPress, Drupal, Magento)
[?] Handle CrUX reports (monthly histograms and time series) having to run later.
Be able to upload to cloud storage in GCP to allow it to be hosted on our CDN
Be able to run and only run reports missing (histograms) or missing dates (time series)
Be able to force rerun (to override any existing reports).
Be able to run a subset of reports.

…o reports

max-ostapenko · 2024-11-17T00:44:55Z

includes/reports_config.js

+    bytesTotal: {
+      name: 'Total Kilobytes',


I found reports config file - seems a good idea to keep all the configs in one place (more transparent for future contributors).

I copied it over here (to experiment with) and added the queries.
I wouldn't be able to add the queries unless the format supports multiline strings - so just saved in JS.
Actually it is required to be readable with python - YAML?

max-ostapenko · 2024-11-17T00:46:04Z

definitions/output/reports/dynamic_publisher.js

+    publish(sql.type, {
+      type: 'table',
+      schema: 'reports',
+      tags: ['crawl_reports']
+    }).query(ctx => constants.fillTemplate(sql.query, params))


In reports dataset we could store intermediate aggregated data - it's easier to check for data issues in BQ than in GCS.
Cloud Function then will pick fresh rows and save them to GCS.

I think we could be fine with a table per chart type, e.g httparchive.reports.timeseries:

date (partition)

metric (cluster)

timestamp

client (cluster)

p10

p25

p50

p75

p90

max-ostapenko · 2024-11-17T01:01:33Z

definitions/output/reports/dynamic_publisher.js

+const params = {
+  date: constants.currentMonth,
+  rankFilter: constants.devRankFilter
+}


Query parameters.
I found only date.
Please list all the required and add the queries to test them with.

max-ostapenko · 2024-11-17T01:27:03Z

@tunetheweb here is a demo version that needs to be discussed.
Once we see that it covers all the requirements and agree on feasibility of the 3 topics in comments above - I'll finalise the part with uploading to GCS.

And I have no idea what to do with lenses and 2 more requests (see in description)..

max-ostapenko and others added 3 commits November 16, 2024 23:22

demo report

48a3b35

fix local package

f752d9f

crawl reports tag triggered

172e120

max-ostapenko changed the title ~~Preparing data for reports~~ Prepare for migration of report queries Nov 16, 2024

max-ostapenko added 2 commits November 17, 2024 01:04

Merge branch 'reports' of https://github.com/HTTPArchive/dataform int…

39ae950

…o reports

timeseries added

eb76476

max-ostapenko commented Nov 17, 2024

View reviewed changes

max-ostapenko added 2 commits November 17, 2024 02:21

split tables

4a2d145

lint

a8e2137

max-ostapenko and others added 7 commits November 20, 2024 01:06

tech report tables

4250677

check tech report sql

607c6b2

Merge branch 'main' into main

aba1af4

Merge branch 'reports' into reports

791c0e8

missing declaration

3fff267

formatting

3be0274

Merge branch 'reports' into reports

4c361a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare for migration of report queries #30

Prepare for migration of report queries #30

max-ostapenko commented Nov 16, 2024 •

edited

Loading

max-ostapenko Nov 17, 2024

max-ostapenko Nov 17, 2024

max-ostapenko Nov 17, 2024

max-ostapenko commented Nov 17, 2024

Prepare for migration of report queries #30

Are you sure you want to change the base?

Prepare for migration of report queries #30

Conversation

max-ostapenko commented Nov 16, 2024 • edited Loading

max-ostapenko Nov 17, 2024

Choose a reason for hiding this comment

max-ostapenko Nov 17, 2024

Choose a reason for hiding this comment

max-ostapenko Nov 17, 2024

Choose a reason for hiding this comment

max-ostapenko commented Nov 17, 2024

max-ostapenko commented Nov 16, 2024 •

edited

Loading