Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement full post/item content included in generated @astrojs/rss feed as opt-in feature #5366

Merged
merged 1 commit into from
Dec 6, 2022

Conversation

smithbm2316
Copy link
Contributor

Changes

What does this change?

This PR implements a proof of concept for including the full content of a user's posts/items as a separate XML tag for each <item>. The use case for this is so that RSS readers will be able to get not only the <title> and <description> of each post, but also the full <content> of the article/post/item as HTML that can be rendered directly in the RSS reader itself.

Testing

How was this change tested?

I've updated the rss.test.js file to include 1 extra test that is specific to the new implementation, as well as updated 1 existing test for the default RSS use case where the user just supplies import.meta.glob(...) to the items key of the rss() function to test the new implementation properly.

Docs

Could this affect a user’s behavior?

This will definitely need some updates to the docs, I will be more than happy to contribute a PR to handle the documentation for this feature. I didn't have time today to update the docs, and I didn't really want to take the time to write them until I got some feedback on whether or not this PR would even be considered!

/cc @withastro/maintainers-docs for feedback!

More Details and Explanation of PR

I had assumed that the @astrojs/rss integration had included this feature already, but found after implementing it on my site that it currently only has the ability to generate the title and description by default. After spending a few hours going down the RSS v2.0 spec rabbit hole, trying to shoehorn the <content> into my implementation, and reading a Discord thread and blog post from an another Astro-naut about why including MDX content by default in the description or content is currently blocked, I finally had an implementation that worked for my blog (you can see the final product here). While it wasn't too hard to implement once I wrapped my head around some of the issues with including full content separately from the description, I do think there's room for Astro to potentially consider improvement here.

The main challenge here is that the <content> tag is technically not in the RSS v2.0 spec. So the reality may just be that it shouldn't be Astro's burden to maintain an implementation that is not exactly spec-compliant. However, I'll make the case that this is a pretty common thing to be implemented in a lot of blogs from web developers across the indieweb/front-end space and is a prime spot for Astro to make a big difference in helping developers get their RSS feed to just work™. The nice thing about the RSS v2.0 spec is that it's pretty forgiving like HTML, so while this opt-in implementation I've put together deviates from the spec slightly, it does still conform to a common specification used across many RSS feeds on the internet currently (especially since RSS v2.0 is a very old spec and will not be updated for the modern web anytime soon).

I'll attach some further reading in regards to the <content:encoding> implementation that I've put together below so that the work that I've done can be verified against the RSS best practices guide and W3C RSS feed validator. Thank you for taking the time to read and consider this PR Astro maintainers, and thank you for all the awesome work that you've done on this project thus far!

@smithbm2316 smithbm2316 requested a review from a team as a code owner November 11, 2022 10:03
@changeset-bot
Copy link

changeset-bot bot commented Nov 11, 2022

🦋 Changeset detected

Latest commit: a99affc

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@delucis delucis requested a review from bholmesdev November 11, 2022 14:18
@@ -69,6 +71,7 @@ function mapGlobResult(items: GlobResult): Promise<RSSFeedItem[]> {
title: frontmatter.title,
pubDate: frontmatter.pubDate,
description: frontmatter.description,
content: compiledContent(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @smithbm2316!

Looks like a test is failing because this approach is not compatible with MDX files which don’t provide a compiledContent getter. Not sure we have a good story that would let us resolve this just yet, but @bholmesdev looked at this recently so might be able to say if there’s a path forward here or if another approach is needed.

Copy link
Contributor Author

@smithbm2316 smithbm2316 Nov 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, apologies. I forgot to mention that this definitely is not compatible with MDX. The RSS integration currently only works with MDX files if you keep all of your data for each post inside the frontmatter at the moment if I'm not mistaken? If the Astro team would consider this use case I've added here (with an updated implementation to support MDX), I'd be very interested in helping @bholmesdev explore some options for how to resolve the MDX issues as well! Perhaps I can drop him a message in Discord and see if there's anything I can do to help with that. Thanks for the speedy comment and review @delucis! 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smithbm2316 Yep, spent a pretty long time investigating options yesterday. Funny how timing works out 😅 Supporting compiled content in MDX is a separate issue that will need a couple months to get right. In the meantime, I'm considering a rework for the RSS integration that pulls from the production build of your site instead to get the compiled HTML body of your posts, similar to the sitemap integration. This will bust dev server previews and SSR, but will unlock using post contents in your RSS feeds. I don't feel great about this solution but I'm happy to prototype by Monday.

@bholmesdev
Copy link
Contributor

Thanks for the super thorough issue and PR! As noted above, this solution would be MD-specific, which could be confusing for anyone doing an MDX refactor. I'd like to investigate a more sweeping rework of RSS to pull your built HTML files instead. I'll keep this branch open in the meantime though!

@smithbm2316
Copy link
Contributor Author

smithbm2316 commented Nov 11, 2022

Thanks for the thorough response @bholmesdev. Feel free to close this PR if you'd prefer to. It sounds like it doesn't align with the solution that you want to implement in order to support both markdown and MDX (which I think is a much better end goal anyways). I ended up implementing this on my own site since I'm just using plain old markdown for the time being and figured I'd throw up a PR as I didn't think it would be very time-consuming to implement (it wasn't). If there's anything I can do to help you work on implementing the MDX updates that will allow for this use case, please let me know! I'd love to help bike shed ideas, help with research, or contribute code or docs to an implementation if that would be helpful to you.

For now perhaps I can write up a blog post detailing my solution for folks who have a similar use case to myself with just markdown only. My new site that I re-wrote with Astro still doesn't have any blog posts, so perhaps that could be the first one 😄

Copy link
Member

@natemoo-re natemoo-re left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes total sense! Understandably this doesn't work with MDX, but @astrojs/rss should accept a content prop anyway—it's not the RSS package's job to generate the HTML content, just to add it to the RSS feed.

IMO adding support for MDX is something we need to tackle outside of this, but should not block this PR from landing. I just ran into a case where I needed this feature myself over the weekend.

@smithbm2316
Copy link
Contributor Author

smithbm2316 commented Nov 14, 2022

I think this makes total sense! Understandably this doesn't work with MDX, but @astrojs/rss should accept a content prop anyway—it's not the RSS package's job to generate the HTML content, just to add it to the RSS feed.

IMO adding support for MDX is something we need to tackle outside of this, but should not block this PR from landing. I just ran into a case where I needed this feature myself over the weekend.

I'd be happy to open a PR to the Astro documentation repo later this week to add some additional help surrounding how this PR works, if we are interested in moving forward with this! I think there should definitely be a section on sanitizing the HTML that you pass to the content prop, as I ran into issues myself with the <pre><code>...</code></pre> sections that Astro generates when using the Shiki theme of css-variables, since a lot of the quotes and tags are not properly escaped when just passing the raw output of compiledContent directly to the rss package. Running the output of compiledContent through a package like sanitize-html or ultrahtml's sanitizer works great though and is currently implemented on my work-in-progress personal site. I've actually got a work-in-progress blog post detailing how to replicate my site's setup that I was planning on refining and posting later this week on the topic.

TLDR: I'll plan on opening a PR on the docs repo to add some additional information on how this feature works. I'll also plan on removing the default configuration of generating the content tags if the user passes us the direct result of an import.meta.glob(), making the <content> tags an opt-in feature that we can document if the user wants to use the more advanced configuration setup detailed in option 2 of the docs right now. Should be able to have some time in the next couple of days to work on those in the evenings.

@@ -95,6 +95,8 @@ type RSSFeedItem = {
pubDate: Date;
/** Item description */
description?: string;
/** Full content of the item, should be valid HTML */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On behalf of Astro Docs Maintainers, if nothing else changes in the README except this, we're cool!

I'll look forward to seeing a PR to docs in the future. I know a lot of people will be happy to see full post content in RSS, even if only for Markdown files right now.

@backflip
Copy link
Contributor

My current workaround to insert the content of my MDX-based posts into the feed:

import { URL } from "url";
import { DOMParser, parseHTML } from "linkedom";

/**
 * Insert full post content into `<description>` elements in RSS feed
 *
 * How to use:
 * - Install `linkedom` npm package
 * - Add `insertContentIntoFeed({ feedFileName: 'REPLACE_ME.xml` })` to `integrations` in `astro.config.mjs`
 *
 * @param {Object} options
 * @param {string} options.feedFileName - Where to find the XML file in `dir`
 * @returns {import('astro').AstroIntegration}
 */
function insertContentIntoFeed({ feedFileName } = {}) {
  return {
    name: "insert-content-into-feed",
    hooks: {
      "astro:build:done": async ({ dir, pages }) => {
        if (!feedFileName) {
          console.error(
            `[insertContentIntoFeed] Please specify "feedFileName" inintegration option`
          );

          return;
        }

        // Parse feed
        const feedFilePath = `${dir.pathname}${feedFileName}`;

        if (!existsSync(feedFilePath)) {
          console.error(
            `[insertContentIntoFeed] No feed found in ${feedFilePath}`
          );

          return;
        }

        const feedFile = readFileSync(feedFilePath, "utf-8");
        const feedDocument = new DOMParser().parseFromString(
          feedFile,
          "text/xml"
        );

        // Loop through feed items
        feedDocument.querySelectorAll("item").forEach((item) => {
          // Find page by slug
          const url = new URL(item.querySelector("link").innerHTML);
          const slug = url.pathname.replace(/^\//, "");
          const page = pages.find((page) => page.pathname === slug);

          if (!page) {
            console.error(
              `[insertContentIntoFeed] No page found with pathname "${slug}"`
            );

            return;
          }

          // Parse post
          const postFilePath = `${dir.pathname}${page.pathname}index.html`;

          if (!existsSync(postFilePath)) {
            console.log(
              `[insertContentIntoFeed] No post found in ${postFilePath}`
            );

            return;
          }

          const postFile = readFileSync(postFilePath, "utf-8");
          const { document: postDocument } = parseHTML(postFile);

          // Replace item description with post content
          const postContent = postDocument.querySelector("main");
          const itemDescription = item.querySelector("description");

          itemDescription.innerHTML = `<![CDATA[${postContent.innerHTML}]]>`;
        });

        writeFileSync(feedFilePath, feedDocument.toString());
      },
    },
  };
}

Copy link
Contributor

@bholmesdev bholmesdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this doesn't try to access compiledContent internally, I'm +1 on this! I agree that docs should include a note on passing content with proper sanitization before we merge. Let me know if you need help with this @smithbm2316

@smithbm2316
Copy link
Contributor Author

Sorry for the delay on this, thanks to everyone for the approvals! I meant to write up the docs for this over Thanksgiving break but just added a PR to the Astro docs repo that adds explanations and warnings for this new feature 😃

@sarah11918
Copy link
Member

sarah11918 commented Dec 2, 2022

I'll review https://github.com/withastro/docs/pull/2129/files today, and since it looks like this PR is at least accepted-in-theory and is intended to eventually merge, I'm adding the label merge-on-release to the Docs PR so that documentation doesn't merge before the feature.

Reminder to ping the Docs PR when this feature is intended to be released!

@bholmesdev
Copy link
Contributor

Looking good. Thanks again!

@bholmesdev bholmesdev merged commit 081e0a9 into withastro:main Dec 6, 2022
@astrobot-houston astrobot-houston mentioned this pull request Dec 6, 2022
@patrick91
Copy link

Thanks so much for this feature! Is there an issue for MDX support? or shall I create one :)

@mattstein
Copy link
Contributor

I don’t see an issue for MDX support, @patrick91. If you created one, I’d appreciate it and eagerly subscribe.

@bholmesdev
Copy link
Contributor

@patrick91 @mattstein Alright, spun up an RFC discussion around compiledContent() support! I linked this thread for prior art as well. Give it a follow if this interests you 🚀
withastro/roadmap#419

@patrick91
Copy link

@bholmesdev thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants