Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zola Builds Broken RSS/atom feed #2024

Closed
whoisYoges opened this issue Nov 15, 2022 · 20 comments
Closed

Zola Builds Broken RSS/atom feed #2024

whoisYoges opened this issue Nov 15, 2022 · 20 comments
Labels
done in pr Already done in a PR good first issue

Comments

@whoisYoges
Copy link

Environment

Zola 0.16.1 running in Arch-Based Linux Distribution

Configuration

generate_feed = true
feed_filename = "atom.xml"

Steps

Enable the above configurations in config.toml
Run zola build or zola serve

Current Behavior

When I try to add the output feed atom.xml in my rss reader (I am using Thunderbird), It gives an error saying It's not a valid Feed.
Validating the feed with w3c feed validator, it shows following errors:

Missing entry element: author
</entry>
title should not be blank
<title></title>

Expected Behavior

The feed should have been validated without any error.

Detected Problem

It generates the feed (atom.xml) with Empty title.

...
...
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<title></title>
...
...

Remark

Using feed_filename = "rss.xml" in place generates rss.xml with empty title and description.

...
...
<channel>
<title></title>
<link>http://127.0.0.1:1111</link>
<description></description>
<generator>Zola</generator>
<language>en</language>
...
...

Temporary/Manual Solution

Adding the title manually in atom.xml solved the problem for me or adding title and description manually if you're using rss.xml.

...
...
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<title>My Blogs</title>
...
...

Permanent Solution

Zola should detect the default title and description in following priority order or similar appropriate one and use it:

  1. /content/blog/_index.md
  2. /templates/blog.html
  3. config.toml

Reference Project

https://github.com/whoisYoges/website

@Keats
Copy link
Collaborator

Keats commented Nov 15, 2022

The title can potentially be fixed (eg use url of the site if we have nothing) but what do we do for the author?

@whoisYoges
Copy link
Author

The title can potentially be fixed (eg use url of the site if we have nothing) but what do we do for the author?

@Keats Thanks for the quick reply!

After a little research I found out that you'll need to place the following inside each <entry> tag.

<author>
   <name>Blog Author Name</name>
</author>

Currently produced output:

...
...
<entry xml:lang="en">
<title>Create and delete GPG keypair | Introduction to GnuPG </title>
<published>2022-10-30T00:00:00+00:00</published>
<updated>2022-10-30T00:00:00+00:00</updated>
<link rel="alternate" href="https://castorisdead.xyz/blog/create-and-delete-gpg-key-pair/" type="text/html"/>
<id>https://castorisdead.xyz/blog/create-and-delete-gpg-key-pair/</id>
<content type="html">
...
...
</content>
</entry>
...
...

Required Output

...
...
<entry xml:lang="en">
<title>Create and delete GPG keypair | Introduction to GnuPG </title>
<published>2022-10-30T00:00:00+00:00</published>
<updated>2022-10-30T00:00:00+00:00</updated>
<author><name>Castor</name></author>
<link rel="alternate" href="https://castorisdead.xyz/blog/create-and-delete-gpg-key-pair/" type="text/html"/>
<id>https://castorisdead.xyz/blog/create-and-delete-gpg-key-pair/</id>
<content type="html">
...
...
</content>
</entry>
...
...

Conclusion

Possible ways I could think of are below:

  1. Need to find out a way to detect content from <meta> tag i.e, <meta name="author" content="Castor, [email protected]"> (generally in base.html)
  2. use some section variable in each blog posts md files and fetch it. i.e.
+++
...
...
author = "Blog Author Name"
...
...
+++
  1. Use some variable in config.toml similar to above no. 2 and fetch it.

  2. use title of the site if nothing found

There could be better ways btw

@whoisYoges
Copy link
Author

Or, simply use
<author>[email protected] (Author Name)</author>
in place of

<author>
   <name>Blog Author Name</name>
</author>

@Keats
Copy link
Collaborator

Keats commented Nov 15, 2022

The issue is that we don't force the user to put any of that data. I guess we could default to something like "unknown" and let people do their thing with a template if they want to.

@whoisYoges
Copy link
Author

The issue is that we don't force the user to put any of that data. I guess we could default to something like "unknown" and let people do their thing with a template if they want to.

That's okay. But yeah, at least there should be some default value to it and there should be an option for user to provide a custom one as well, and the default one would be overwritten if used custom.

@sangsatori
Copy link

4. use title of the site if nothing found

Seems a sensible fallback to me. It's preferable to serving a non-working feed.

sangsatori added a commit to sangsatori/helix-editor-website that referenced this issue Dec 3, 2022
* Adjustments found while investigating getzola/zola#2024
* Zola uses both values when generating atom.xml, despite being marked optional.
* Have base.html also use the same metadata.
@Keats
Copy link
Collaborator

Keats commented Dec 3, 2022

It should work even without the title in most readers I think?

That's a good first issue if someone wants to contribute

archseer pushed a commit to helix-editor/website that referenced this issue Dec 4, 2022
* Adjustments found while investigating getzola/zola#2024
* Zola uses both values when generating atom.xml, despite being marked optional.
* Have base.html also use the same metadata.
@Keats
Copy link
Collaborator

Keats commented Jan 12, 2023

@whoisYoges / @sangsatori just to be clear, does adding the author field makes Thunderbird happy with both RSS/atom feed?

For title/description we could error if someone asks to generate a feed but doesn't fill both.

@Keats
Copy link
Collaborator

Keats commented Jan 12, 2023

Issue already existing #1223
Maybe we should have an authors: Vec<String> field to pages that people can fill and an optional author: String in the config for the default author to use if none are present in a page. If not found in those 2 places, we can just show an empty author or "Unknown".

@whoisYoges
Copy link
Author

@whoisYoges / @sangsatori just to be clear, does adding the author field makes Thunderbird happy with both RSS/atom feed?

For title/description we could error if someone asks to generate a feed but doesn't fill both.

Yeah @Keats, It does and validates the W3C Feed Validation as well for both RSS and atom feeds.

@Keats
Copy link
Collaborator

Keats commented Jan 17, 2023

Anyone interested in implementing #2024 (comment) ? Or if anyone has objections on that.

@sethm
Copy link
Contributor

sethm commented Jan 17, 2023

Anyone interested in implementing #2024 (comment) ? Or if anyone has objections on that.

I ran into this issue myself and worked around it with a custom atom.xml template, so I would love to give try to fix it. It would be my first contribution to the project, but I do have Rust experience.

@sethm
Copy link
Contributor

sethm commented Jan 18, 2023

Do we really want a Vec<String> authors field on pages, or is an Option<String> enough? Neither RSS nor Atom support multiple authors on the same feed item, but maybe we want multiple authors for other reasons.

@Keats
Copy link
Collaborator

Keats commented Jan 18, 2023

I would go with a Vec<String> because multiple authors is something very common and we shouldn't limit Zola to what is possible with RSS/Atom.

I think in most cases, there will be more author information in the extra section of config.toml as a dict since people will want to be able to modify something only in one place. So the Vec is more likely to contain some kind of key for the dict, like authors = ["vincent"] and then in the template you do config.extra.authors[page.author[0]].full_name etc

I could be wrong though so let's think this through

@sethm
Copy link
Contributor

sethm commented Jan 18, 2023

It's an interesting question. In Atom, author information is more structured than in RSS 2.0, and they are potentially contradictory.

  • RSS 2.0 has a single <author>...</author> element. The specification says that this should contain the email address of the author, and optionally a name, in the format [email protected] or [email protected] (Author Name).
  • On the other hand, Atom 1.0 has a structured <author>...</author> element that must contain at least <name>...</name>, but may also contain <email>...</email> and <uri>...</uri> elements

Perhaps Zola authors should be structured in a similar way? For example, in YAML we could permit:

authors:
  - name: Author One
    email: [email protected]
  - name: Author Two
    email: [email protected]
    uri: https://example.com/user2/
  - name: Author Three

equivalent TOML:

[[authors]]
name = "Author One"
email = "[email protected]"

[[authors]]
name = "Author Two"
email = "[email protected]"
uri = "https://example.com/user2"

[[authors]]
name = "Author Three"

One downside is that RSS 2.0 feeds would not validate unless the author contains an email address, and Atom 1.0 feeds would not validate unless the author contains a name. It's a bit messy and complex.

@Keats
Copy link
Collaborator

Keats commented Jan 19, 2023

I would rather have something like that in config.toml (but with no way of enforcing it/up to the user):

[extra]

[extra.authors]
[extra.authors.vincent]
name = "Vincent"
email = "..."
avatar = "vincent.jpg"

The table is going to be more flexible. Either way, I don't want to impose anything there

As for the pages themselves, let's go for a basic Vec where people can put whatever they want. We will use that value for the Atom feed and just put a note for the RSS feed that people will need to implement their own template to provide the email.

@sethm
Copy link
Contributor

sethm commented Jan 19, 2023

As for the pages themselves, let's go for a basic Vec where people can put whatever they want. We will use that value for the Atom feed and just put a note for the RSS feed that people will need to implement their own template to provide the email.

OK, that makes sense to me. I have a potential solution ready to go in https://github.com/sethm/zola/commit/17c74de2ec8489f5b7f08abe0b2254ccca35a821, it uses no special structuring at all.

I'll open a pull request if you think it's ready for one.

@Keats
Copy link
Collaborator

Keats commented Jan 19, 2023

It seems ok to me, just need to add some words about it in the docs.

Anyone disagree with the current approach for authors?

@sangsatori
Copy link

Looks good to me. Is this ready to progress to PR?

@sethm
Copy link
Contributor

sethm commented Feb 4, 2023

I think so. I've created #2092.

Keats pushed a commit that referenced this issue Feb 11, 2023
The W3C feed validator fails to validate RSS 2.0 and Atom 1.0 feed
elements that do not contain a valid author. This change adds an
`authors: Vec<String>` to pages, as well as an `author: Option<String>`
to Config that will act as a default to use in RSS and Atom templates if
no page-level authors are specified.
@Keats Keats added the done in pr Already done in a PR label Feb 12, 2023
Keats pushed a commit that referenced this issue Feb 16, 2023
The W3C feed validator fails to validate RSS 2.0 and Atom 1.0 feed
elements that do not contain a valid author. This change adds an
`authors: Vec<String>` to pages, as well as an `author: Option<String>`
to Config that will act as a default to use in RSS and Atom templates if
no page-level authors are specified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
done in pr Already done in a PR good first issue
Projects
None yet
Development

No branches or pull requests

4 participants