Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sustainability 2024: Queries #3736

Merged
merged 34 commits into from
Nov 4, 2024

Conversation

Falafelqueen
Copy link
Contributor

@Falafelqueen Falafelqueen commented Aug 17, 2024

Progress on #3611

Hosting

  • % of "green hosted" sites -> ! missing updated data ?
  • CDN usage

General

Cache

  • Cache adoption

JS & CSS

  • Compression
  • Minification
  • Unused code

Fonts

  • Requests per page
  • Format adoption

Video

  • Preload
  • Autoplay

Platform Summary

  • CMS
  • eCommerce
  • Jamstack

Scripts

  • JS scripts inline vs external
  • CSS inline vs external

Other

  • User preferences (dark mode)
  • Print stylesheet
  • Obsolete code
  • Unused assets

@tunetheweb tunetheweb added the analysis Querying the dataset label Aug 21, 2024
@tunetheweb tunetheweb added this to the 2024 Analysis milestone Aug 21, 2024
@mgifford
Copy link
Contributor

We should look to see if there is an overlap in Fonts too #3667 too

@mgifford
Copy link
Contributor

These use the carbon emissions of "one byte" model. Could we also do a calculation with the Sustainable Web Design Model (SWDM)?

I don't know how much of the data we'd be able to find here, but wanted to see if it were possible.

  • OPDC – Operational Emissions Data Centers.
  • Green Hosting Factor – The portion of hosting services powered by renewable or zero-carbon energy, between 0 and 1 (see FAQ).
  • EMDC – Embodied Emissions Data Centers.
  • OPN – Operational Emissions Networks.
  • EMN – Embodied Emissions Networks.
  • OPUD – Operational Emissions User Devices.
  • EMUD – Embodied Emissions User Devices.
  • New Visitor Ratio – The portion of first time visitors to a web page, between 0 and 1.
  • Return Visitor Ratio – The portion of returning visitors to a web page, between 0 and 1. (What is this? See FAQ)
  • Data Cache Ratio – The portion of data that is loaded from cache for returning visitors, between 0 and 1. (What is this? See FAQ)

FROM
`httparchive.all.requests`
WHERE
date = '2024-06-01'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should filter by is_root_page = TRUE.
I'm still not sure when we should analyze data per page level or site level. I see that some of our queries filter that to analyze site level but other like this don't.
If we don't wanna filter it we might need to rephrase a few things in the Google doc to say "pages" instead of "websites" like what was suggested in this slack message

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this particular query is requests I think this is OK. Resolving.

Adding in a query to track the size of the query.
Updating the docs
@lebreRafael lebreRafael marked this pull request as ready for review October 25, 2024 02:42
Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM with one comment on ranking and another on 3P date.

Let me know if this is good to merge.

sql/2024/sustainability/green_third_party_requests.sql Outdated Show resolved Hide resolved
sql/2024/sustainability/green_web_hosting.sql Outdated Show resolved Hide resolved
sql/2024/sustainability/green_third_party_requests.sql Outdated Show resolved Hide resolved
Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to merge this. We can add more queries if needed.

@lebreRafael
Copy link
Contributor

@tunetheweb yes, we are good to merge it

@tunetheweb tunetheweb merged commit 6f4be9c into HTTPArchive:main Nov 4, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Querying the dataset
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants