Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMS 2024 #3608

Open
7 of 10 tasks
nrllh opened this issue Mar 2, 2024 · 23 comments · May be fixed by #3832
Open
7 of 10 tasks

CMS 2024 #3608

nrllh opened this issue Mar 2, 2024 · 23 comments · May be fixed by #3832
Assignees
Labels
2024 chapter Tracking issue for a 2024 chapter

Comments

@nrllh
Copy link
Collaborator

nrllh commented Mar 2, 2024

CMS 2024

CMS illustration

If you're interested in contributing to the CMS chapter of the 2024 Web Almanac, please reply to this issue and indicate which role or roles best fit your interest and availability: author, reviewer, analyst, and/or editor. You might be interested in exploring the changes to this year's version here.

Content team

Lead Authors Reviewers Analysts Editors Coordinator
@sirjonathan @sirjonathan, @LoraRaykova, @niko-kaleev @raewrites, @karmatosed @sirjonathan, @nrllh - @turban1988
Expand for more information about each role 👀
  • The content team lead is the chapter owner and responsible for setting the scope of the chapter and managing contributors' day-to-day progress.
  • Authors are subject matter experts and lead the content direction for each chapter. Chapters typically have one or two authors. Authors are responsible for planning the outline of the chapter, analyzing stats and trends, and writing the annual report.
  • Reviewers are also subject matter experts and assist authors with technical reviews during the planning, analyzing, and writing phases.
  • Analysts are responsible for researching the stats and trends used throughout the Almanac. Analysts work closely with authors and reviewers during the planning phase to give direction on the types of stats that are possible from the dataset, and during the analyzing/writing phases to ensure that the stats are used correctly.
  • Editors are technical writers who have a penchant for both technical and non-technical content correctness. Editors have a mastery of the English language and work closely with authors to help wordsmith content and ensure that everything fits together as a cohesive unit.
  • The section coordinator is the overall owner for all chapters within a section like "User Experience" or "Page Content" and helps to keep each chapter on schedule.

Note: The time commitment for each role varies by the chapter's scope and complexity as well as the number of contributors.

For an overview of how the roles work together at each phase of the project, see the Chapter Lifecycle doc.

Milestone checklist

0. Form the content team

  • 📆 April 15 Complete program and content committee - 🔑 Organizing committee
    • The content team has at least one author, reviewer, and analyst.

1. Plan content

  • 📆 May 1 First meeting to outline the chapter contents - 🔑 Content team
    • The content team has completed the chapter outline.

2. Gather data

  • 📆 June 1 Custom metrics completed - 🔑 Analysts
  • 📆 June 1 HTTP Archive Crawl - 🔑 HA Team
    • HTTP Archive runs the June crawl.

3. Validate results

  • 📆 August 15 Query Metrics & Save Results - 🔑 Analysts
    • Analysts have queried all metrics and saved the output.

4. Draft content

  • 📆 September 15 First Draft of Chapter - 🔑 Authors
    • Authors has written the chapter.
  • 📆 October 10 Review & Edit Chapter - 🔑 Reviewers & Editors
    • Reviewers and Editors has processed the the chapter.

5. Publication

  • 📆 October 15 Chapter Publication (Markdown & PR) - 🔑 Authors
    • Authors has converted the chapter to markdown and drafted a PR.
  • 📆 November 1 Launch of 2024 Web Almanac 🚀 - 🔑 Organizing committee

6. Virtual conference

  • 📆 November 20 Virtual Conference - 🔑 Content Team

Chapter resources

Refer to these 2024 CMS resources throughout the content creation process:
📄 Google Docs for outlining and drafting content
🔍 SQL files for committing the queries used during analysis
📊 Google Sheets for saving the results of queries
📝 Markdown file for publishing content and managing public metadata
💻 Collab notebook for collaborative coding in Python - if needed
💬 #web-almanac-cms on Slack for team coordination

@nrllh nrllh added help wanted: reviewers This chapter is looking for reviewers help wanted: analysts This chapter is looking for data analysts help wanted: coauthors This chapter is looking for coauthors 2024 chapter Tracking issue for a 2024 chapter labels Mar 2, 2024
@sirjonathan
Copy link

I'm happy that the project is back again! I'd love to return and contribute as either author or co-author of this year's chapter.

@nrllh
Copy link
Collaborator Author

nrllh commented Apr 9, 2024

Hey @alexdenning @dknauss @alonkochba @honzasladek @csliva @dknauss - awesome contributors from previous years 🙂 Are you interested in joining us again this year?

@raewrites
Copy link

I'm interested in reviewing. I'm an experienced writer and editor and have previously worked with @sirjonathan. I'll be away most of September but will be back on the 27th, so it would be good to co-review with someone else.

@karmatosed
Copy link

I am also interested in reviewing. I have experience within the area and can be a subject matter reviewer along with having worked with @sirjonathan to aid collaboration.

@nrllh nrllh removed help wanted: reviewers This chapter is looking for reviewers help wanted: analysts This chapter is looking for data analysts help wanted: coauthors This chapter is looking for coauthors labels Apr 19, 2024
@turban1988
Copy link

Hi @sirjonathan,
Thank you very much for volunteering to lead the writing of this chapter! Could you please organize a kick-off meeting for this chapter (example: #3603 (comment)) to organize the writing of the chapter?

Furthermore, it would be helpful if you and all other contributors (@LoraRaykova, @niko-kaleev, @raewrites @karmatosed ) could join the slack channel of the HTTPArchive (https://join.slack.com/t/httparchive/shared_invite/zt-2hfkn28ts-~uXN4UGS0mXsKpzzhtZcow)

Thanks!

@sirjonathan
Copy link

@turban1988 I've reached out to the team and am planning to hold the kickoff meeting next week.

@sirjonathan sirjonathan self-assigned this May 16, 2024
@sirjonathan
Copy link

@LoraRaykova, @niko-kaleev, and I met today. We discussed the previous years efforts and ideas for improving / expanding this year's chapter, including references to the Speculation Rules API and tracking themes within the WordPress section.

Our plan is to start by pulling over the 2022 outline and expand it with our ideas for this year. @niko-kaleev will take the first pass at that and we'll work on it together async.

We're meeting again on the 28th to finalize the outline after which I'll follow-up on any analyst related tasks.

/cc @turban1988

@niko-kaleev
Copy link

As discussed with @sirjonathan and @LoraRaykova on the kick-off meeting, the chapter outline is ready:

https://docs.google.com/document/d/13CxAp7HCcxHHCSuEnXS2rolKskLSlUvLQuqUD6QADYc/edit

@sirjonathan will review it next week, and we'll finalize it on the 28th.

/cc @turban1988

@sirjonathan
Copy link

@niko Kaleev @lora Raykova and I met on Tuesday for a sync. The outline is in good shape and they're going to start on the parts of the chapter they can while we wait for data.

@nrllh has generously agreed to tackle the analyst work and replicate the analysis from the 2022 edition.

@sirjonathan
Copy link

@niko-kaleev and I met today for a quick check-in. We discussed next steps and scheduled a follow-up for August 20, once we have results validated.

@dknauss
Copy link
Contributor

dknauss commented Aug 26, 2024

@sirjonathan Do you still need a hand? I have time to help if you're still working through the reviewing and editing.

@niko-kaleev
Copy link

You can find the first draft at the bottom of the document: https://docs.google.com/document/d/13CxAp7HCcxHHCSuEnXS2rolKskLSlUvLQuqUD6QADYc/edit

What still needs to be done:

  1. @sirjonathan to write the Page Builders section (under WordPress 2024), as discussed during our kick-off meeting
  2. @sirjonathan to draw and write the conclusion

Keep in mind that a lot of the data presented in 2022 was missing from this year's data export. All the missing data points are highlighted in red or in a comment.

@sirjonathan, would you mind extracting the missing data points and writing a paragraph or two for each? We believe that will make the whole chapter even more valuable to our readers.

@sirjonathan
Copy link

@niko-kaleev Thank you for the update! I'm planning to set aside some time during WordCamp US next week to work on 1 and 2. Regarding the missing data points, I'll take another look and see what's possible - I just may need technical help.

@sirjonathan
Copy link

sirjonathan commented Oct 30, 2024

First off, a huge thanks to @niko-kaleev for all the heavy lifting that got us here. Here's a quick summary of what I've completed over the past few days:

  • Added the missing Lighthouse YoY data points, thanks to help from @kevinfarrugia
  • Completed a chapter review, making minor edits and cuts where we didn't have data
  • Incorporated @dknauss and @raewrites's feedback and editing work so far
  • Updated the chapter outline to match content included
  • Wrote a conclusion

Pending Decisions

There are 3 sections that I'd like input on before we finalize:

CMS Adoption share (Google Doc link) - This was something new we tried this year. On closer inspection at the data, I realized that it's a combination of desktop and mobile that while it could be interesting, according to our methodology there's a lot of overlap in the dataset as "most websites are included in both the mobile and desktop subsets." . It feels like it could be misleading without clarity and upon clarifying it doesn't feel that useful.

My recommendation is that we cut this section.

CMS Adoption by geography (Google Doc link) - This section feels a bit messy, but with some cleanup could be interesting and useful o include.

A few items I noticed:

  • The US and UK are missing from the chart, which I am guessing is simply because of the length of their names as they are both in the actual data sets. Is this something @kevinfarrugia could help us with?
  • If we resolve the missing data, then the section just needs some minor clean-up as it currently attempts to address the absence of the US and UK
  • "by geography" seems confusing, I suggest we update it to "by country"

My recommendation is that if we can get the US and UK added back to the results that we do so and clean up the section as outlined above.

CMS Adoption by rank (Google Doc Link) - I question how useful this section is as a whole. "rank" is subjective and, presumably is a reference to Google's index, but isn't introduced as such. Also, if I understand the spreadsheet right, "All" actually means the top 10M, not every site in the index, which is confusing. Last, the current analysis is focused on "headless" as an explanation in the jump whereas I think a more plausible explanation is the fact that our dataset nearly doubled in size from 2022 to 2024.

My recommendation is that unless others feel strongly about this section that we go ahead and cut it.

Next Steps

I'd love feedback on the above items over the next few days. From there, once @raewrites and @dknauss have given their all clear on reviewing, we can move forward to publishing.

@nrllh can you or someone else tackle the markdown generation and prepare the PR?

@LoraRaykova
Copy link

LoraRaykova commented Oct 30, 2024 via email

@sirjonathan
Copy link

Hi Lora, of course! My mistake. Thank you for clarifying and for all your work on it.

@tunetheweb
Copy link
Member

"by geography" seems confusing, I suggest we update it to "by country"

“Country” causes some people some concerns based on certain geo-political areas (e.g. Taiwan). Hence we say by “geographical reason”.

CMS Adoption by rank (Google Doc Link) - I question how useful this section is as a whole. "rank" is subjective and, presumably is a reference to Google's index, but isn't introduced as such.

Rank refers to popularity by Chrome page views in the previous month, rather than any SEO ranking.

IMHO (and without having read the draft), this is useful to see the split. E.g. top ranked sites for more commercial sites typically can afford paid-for CMSs, while the long tail might prefer free ones. So there can be interesting insights here.

Also, if I understand the spreadsheet right, "All" actually means the top 10M, not every site in the index, which is confusing.

Since 2022 we have increased our dataset to 16 Million or so sites. So the final category should be top 100m rather than top 10m used in 2022. So if query wasn’t changed to include 100m then it should be and should be rerun.

And yes you are correct that 100m should be labelled “all” rather than “100m” (as we use < rank in our queries.

Last, the current analysis is focused on "headless" as an explanation in the jump whereas I think a more plausible explanation is the fact that our dataset nearly doubled in size from 2022 to 2024.

Not sure what this refers to but we typically report on % of pages exactly to avoid any issues with changes in dataset. That’s not to say that changes in popularity can’t also change percentages.

My recommendation is that unless others feel strongly about this section that we go ahead and cut it.

As I say I’ve not read it yet, so will leave others to decide on this but hopefully that helps with some context.

@kevinfarrugia
Copy link
Contributor

kevinfarrugia commented Oct 31, 2024

The US and UK are missing from the chart, which I am guessing is simply because of the length of their names as they are both in the actual data sets. Is this something @kevinfarrugia could help us with?

@sirjonathan I updated the Google Sheet to include the United States and the United Kingdom of Great Britain and Northern Ireland. I am not sure why they were excluded, but the filter was specifically omitting both countries. The results are sorted by the total number of mobile sites using a CMS, descending.

@kevinfarrugia
Copy link
Contributor

@sirjonathan I have updated the Google Sheet for top_cms_by_rank and cms_adoption_by_rank to also include sites where rank > 10,000,000.

TBH, I didn't fully understand the difference between the two queries but top_cms_by_rank seems to be the correct query and results.

@kevinfarrugia
Copy link
Contributor

@sirjonathan Following up on the above, I have removed cms_adoption_by_rank as the query had a bug and once fixed, it returns the same data as top_cms_by_rank. Let me know if you were looking for something else with that query.

@sirjonathan
Copy link

Rank refers to popularity by Chrome page views in the previous month, rather than any SEO ranking.

Thank you for clarifying that @tunetheweb. That changes my perspective. Given that and @kevinfarrugia's work, I'll reverse my recommendation and work to include the section.

Thank you for your updates @kevinfarrugia. Per Barry's comment:

Since 2022 we have increased our dataset to 16 Million or so sites. So the final category should be top 100m rather than top 10m used in 2022. So if query wasn’t changed to include 100m then it should be and should be rerun.

When I looked at the results, the legend says 10,000,000. If I understand correctly, though, the query is including all sites up to and over 10M. If that's the case, could we update the legend to read 10,000,000+?

Also, when I look at top cms by rank it's currently listing XpressEngine, XOOPS, Wuilt, and Woltlab along with WordPress, which appears to be alphabetical. What I would be expecting is probably WordPress, Wix, Joomla, Drupal, and Squarespace.

@tunetheweb
Copy link
Member

When I looked at the results, the legend says 10,000,000. If I understand correctly, though, the query is including all sites up to and over 10M. If that's the case, could we update the legend to read 10,000,000+?

I don't think that's correct. But there is a 100M category (also called "ALL") that shows that now that @kevinfarrugia added it.

Also, when I look at top cms by rank it's currently listing XpressEngine, XOOPS, Wuilt, and Woltlab along with WordPress, which appears to be alphabetical. What I would be expecting is probably WordPress, Wix, Joomla, Drupal, and Squarespace.

I've changed the sort order for you and it looks correct now.

@kevinfarrugia
Copy link
Contributor

kevinfarrugia commented Nov 4, 2024

@sirjonathan Following up on the above, I have removed cms_adoption_by_rank as the query had a bug and once fixed, it returns the same data as top_cms_by_rank. Let me know if you were looking for something else with that query.

@sirjonathan I'm not sure if you saw my comment above and if you're referring to this sheet. I didn't delete the results for cms_adoption_by_rank for posterity, but the results are incorrect and should not be used. Use top_cms_by_rank. This includes ALL and Barry kindly fixed the sort order. LMK.

@tunetheweb tunetheweb linked a pull request Nov 7, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024 chapter Tracking issue for a 2024 chapter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants