Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize appearance in Google Search #810

Closed
rviscomi opened this issue May 9, 2020 · 5 comments
Closed

Optimize appearance in Google Search #810

rviscomi opened this issue May 9, 2020 · 5 comments
Labels
SEO SEO related
Milestone

Comments

@rviscomi
Copy link
Member

rviscomi commented May 9, 2020

image

We're in the top position for "web almanac" queries, which is perfect. The home page result contains a few secondary page results:

  • Table of Contents
    • JavaScript - Performance - CSS - ...
  • Chapter of a third-party
    • Third Parties chapter of the 2019 Web Almanac covering data of ...
  • JavaScript
    • JavaScript is a scripting language that makes it possible to build ...
  • Learn about our Methodology
    • Describes how the 2019 Web Almanac was put together: The ...
  • Performance
    • Performance chapter of the 2019 Web Almanac covering First ...
  • Chapter 10 SEO
    • Search Engine Optimization (SEO) isn't just a hobby or a side ...

A few ways we could make this better:

  • The TOC description should do a better job of explaining what the contents are, eg "See all 20 chapters at a glance" or similar.
  • The Third Parties chapter title is being plucked from the body text for some reason. Investigate why this isn't being set by the metadata.
  • "Learn about our Methodology" should be replaced by the title page, just "Methodology"
  • "Chapter 10 SEO" I don't mind having the chapter number in the title, it actually works well. We should aim to make all chapter results consistent in their appearance. The description is also plucked from the body text rather than the metadata.
  • It might make sense for these results to have their publication dates so it's clear they relate to the 2019 edition. I'm not sure how multiple years' worth of results would coexist in this search result UI.

Any other thoughts or ideas?

@rviscomi rviscomi added the SEO SEO related label May 9, 2020
@tunetheweb
Copy link
Member

tunetheweb commented May 9, 2020

Google doesn't really let you change any of this. Setting the Meta Description is the best you can do, but it will use a quote from the page if it thinks it's more accurate/useful than that.

We do include breadcrumbs structured data from the year page to the section (see example here), but could enhance that to have base page, and language.

A bigger concern for me is that Google is being very slow to index some of our translations. See the Coverage section of Google Search Console. The sitemap has 69 pages, but Google only has 56 with the others being excluded due to redirects (3 pages now available and no longer redirecting), pages discovered and not indexing (8), pages not indexed (2 - including the French home page!) and pages not selected as canonical (3 Japanese pages recently translated). Tried a few things (manually resubmitting pages and whole sitemap and also temporarily removing lastmod in sitemap as not accurate - before fixing it properly so I could bring it back), but no luck nudging it to do it properly...

@tunetheweb
Copy link
Member

@ymschaap , @rachellcostello , @AVGP and @AymenLoukil you got anything to add to this?

@catalinred
Copy link
Member

catalinred commented May 10, 2020

It's a well-known thing that Google often rewrites meta descriptions in the SERPs in response to the searcher's query, as it's pointed out in the SEO Chapter.
So, as @bazzadp also pointed out, this action of plucking content is due to this matter.


While content updates (title/description/headings) can definitely improve the search appearance for most of these 2019 almanac pages, I found more interesting the search appearance strategy considering the upcoming 2020.

So, here's are my thoughts on that,

  • The root (https://almanac.httparchive.org/en/) will contain the current Web Almanac, therefore the 2020 content. So, without the year in the URL.

  • All the previous years remain archived, in subdirectories, as it is now (https://almanac.httparchive.org/en/2019/), so no redirects needed at first sight.

  • The 2019 pages will become "secondary/leaf" pages and each chapter will link to the latest (in this case, 2020), but not the other way around. This way, the signal sent to the search engines says which are the main website pages and which ones are less important when it comes to pages hierarchy.

@rviscomi rviscomi added this to the 2019 Backlog milestone Jun 16, 2020
@tunetheweb
Copy link
Member

tunetheweb commented Jun 20, 2020

@catalinred I've been thinking over your proposal, and can see the advantage of that in terms of boosting the signal of the current year.

However I would have big concerns about breaking any linking as soon as we archive the 2020 chapter into https://almanac.httparchive.org/en/2020/ when launching 2021 edition. The intention is to make this resource a quotable resources and we've worked hard to make it it deep linkable, so really would prefer to keep the current year structure and just change https://almanac.httparchive.org to redirect to the best language and year available for that user - as we do now.

As always, while SEO is important, IMHO it shouldn't be prioritised over users and I fear this proposal may do that long term.

What do you think about that concern?

@tunetheweb
Copy link
Member

tunetheweb commented Jun 20, 2020

@rviscomi as discussed above we only have limited influence over how we appear in SERPs and when I search for an incognito window I get better results than you did previously:

SERPs for term "Web Almanac"

I do think we should add the language and year to the breadcrumbs (see example of how we currently breadcrumb it here). Will raise a separate issue for that (edit - #885).

In regards to the missing chapters that I raised, we still have 11 chapters not indexed:

Google Search Console screenshot showing 0 error pages, 0 warning pages, 61 valid pages and 11 not indexed pages

Google is aware of them but stubbornly doesn't seem to want to index them:

Google Search Console screenshot with 8 pages discovered but not indexed and 3 pages not selected as canonical

The canonical ones are old versions and resubmitting them doesn't seem to update them. Frustrating.

However I did discover that the French SEO chapter (for example) seems to be mapping to the app engine URL despite the canonical tag pointing to the real URL that should be used:

French SEO chapter is indexed under wrong URL

We could 301 redirect them when host is wrong to give another signal. Will raise a separate issue for that (edit - #884)

So if we close out above discussion with @catalinred and raise the other two issues as separate issues, then I think we can just close this issue. Let me know if you disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SEO SEO related
Projects
None yet
Development

No branches or pull requests

3 participants