Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: either work with an existing jekyll sitemap plugin or generate sitemap #29

Closed
scgupta opened this issue Jul 23, 2016 · 9 comments
Labels
has solution to be documented someone posted a solution in the thread, and it should be documented in a later release

Comments

@scgupta
Copy link

scgupta commented Jul 23, 2016

Currently I am using jekyll-sitemap for generating sitemap, see example. And polyglot helps generating sitemap.xml for default and other languages in their respective lang dirs. But the links in all generated sitemap.xml files are for default languages. So either there is some setting that I don't understand something and doing wrong, or polyglot currently doesn't target generating sitemap.xml

Ideally, I wish there was sitemap as explained in this google webmaster tip instead of multiple sitemap.xml (as it is not for visitor, but crawl bot). But even if there are separate sitemap.xml being generated, I wish somehow links were correct.

I see two possibilities to achieve it:

  1. polyglot works with a common jekyll sitemap plugin
  2. generates sitemap itself (I am not sure if feasible, but considering it probably knows about all pages being generated and cares about I18n_Headers etc., it might be something to think about).

Thanks @untra for such wonderful support and help that I got from you for the two issues I faced.
+satish

@aensidhe
Copy link

aensidhe commented Apr 9, 2017

I would like to see correct sitemap.xml too.

@lukaszolek
Copy link

Me too

@jerturowetz
Copy link

I'm pretty sure polyglot is running after jekyll-sitemaps, and is copying the sitemaps.xml file to the other language folder roots without any processing as if it was any other file.

As this has been a floating issue since 2016, I'm going to resolve it in my project by removing the sitemaps plugin & building a sitemap file using polyglot vars. I'll post my example once finished (maybe we could put in a guide or in the example documents).

If you've already created anything which could give me a head start feel free to share.

@MPJHorner
Copy link

Anything get resolved with this? @jerturowetz interested to see what you have created.

@jerturowetz
Copy link

jerturowetz commented Nov 12, 2018

EDIT: Wrapped html comments in liquid comment syntax {% comment %} to avoid messy code

--
@MPJHorner @scgupta check it out!

Just wrapped it up now! I've ditched using a sitemap plugin and just built the sitemap manually.

There's a few items to note:

  • For cleanliness, sitemap.xml is listed in the exclude_from_localization array in _config.yml
  • I did not specify any hreflang attributes in the sitemap as my posts/pages have hreflang specified in their <head>. I do it manually, but polyglot includes {{ I18n_Headers }} which built the appropriate tags for you.
  • You have to include some empty yaml at the top of sitemap.xml in order to get jekyll to process the file

Here's the contents of my sitemap.xml which is located in the root of my source folder:

---
layout:
---

{% comment %}<!-- I am using hreflang attributes on a page-by-page basis so no need to include them per url here -->{% endcomment %}
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
{% for lang in site.languages %}

    {% comment %}<!-- It would be better to use the where_exp filter in the first loop but I dont think the unless expression is supported -->{% endcomment %}
    {% for node in site.pages %}
        {% comment %}<!-- very lazy check to see if page is in the exclude list - this means excluded pages are not gonna be in the sitemap at all, write exceptions as necessary -->{% endcomment %}
        {% unless site.exclude_from_localization contains node.path %}
            {% comment %}<!-- I am assuming if there's not layout assigned, then not include the page in the sitemap, you may want to change this -->{% endcomment %}
            {% if node.layout %}
                <url>
                    <loc>{% if lang == site.default_lang %}{{ node.url | absolute_url }}{% else %}{{ node.url | prepend: lang | prepend: '/' | absolute_url }}{% endif %}</loc>
                </url>
            {% endif %}
        {% endunless %}
    {% endfor %}

    {% comment %}<!-- This loops through all site collections including posts -->{% endcomment %}
    {% for collection in site.collections %}
        {% for node in site[collection.label] %}
            <url>
                <loc>{% if lang == site.default_lang %}{{ node.url | absolute_url }}{% else %}{{ node.url | prepend: lang | prepend: '/' | absolute_url }}{% endif %}</loc>
            </url>
        {% endfor %}
    {% endfor %}

{% endfor %}
</urlset>

@MPJHorner
Copy link

@jerturowetz looks ideal. You should put this on the Readme.md

@hacketiwack
Copy link
Contributor

@jerturowetz, the proposed method works great. However, as the plugin jekyll-sitemap is removed, no robots.txt is generated anymore.

@boamaod
Copy link

boamaod commented Feb 25, 2024

In my custom sitemap similar to the one above I am having hard time with excluding from the sitemap document nodes without translation, that is pages of the documents that are rendered as untranslated fallback pages. It might be technically not wrong to have them listed, but since sitemap is used to index web sites, it is not really useful to index fallback pages without actual translation, since they would be just duplicates of the originals and the metadata stating the language would be incorrect. It's better not to index them at all and exclude them from sitemap, despite the fact that they show up on the web resulting from the fallback mechanism. This would be also the most coherent way to do it, if sitemap indexes referring to language specific sitemaps are used.

Currently only decent option to solve this is to create placeholder pages for documents without translation with the warning "this document is not translated yet, please refer to the original document". This is a viable option and with a special template one could include the original or default language page with a separate language tag in HTML, but it seems to overcomplicate the website presentation and not listing those dummy fallback pages would still make sense.

I think having a variable to list available translations suggested here might be useful for solving these kinds of issues. Or is there another way to solve this? I think similar problems appear also when rendering menus and site archives, also in language switcher, where one might want to indicate if there is an actual translation available.

@untra untra added the has solution to be documented someone posted a solution in the thread, and it should be documented in a later release label Feb 26, 2024
@untra
Copy link
Owner

untra commented Mar 18, 2024

https://github.com/untra/polyglot?tab=readme-ov-file#sitemap-generation
Thank you @jerturowetz for your solution! added to the readme

@untra untra closed this as completed Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has solution to be documented someone posted a solution in the thread, and it should be documented in a later release
Projects
None yet
Development

No branches or pull requests

8 participants