Skip to content

Commit ef1c9e0

Browse files
nieblesodcambc
authored andcommitted
fixed indexing of external posts (alshedivat#2983)
This should fix several issues with indexing external posts, including alshedivat#1828. In short, I found that the issue with indexing was that the index builder was receiving 'empty' documents. To fix that, I'm setting the document content to be the post content as retrieved from the rss feed or the text extracted from the external page. I've tested with various blog sources and it seems to be working as expected now.
1 parent d5fe00b commit ef1c9e0

File tree

1 file changed

+7
-2
lines changed

1 file changed

+7
-2
lines changed

_plugins/external-posts.rb

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ def create_document(site, source_name, url, content)
6262
doc.data['description'] = content[:summary]
6363
doc.data['date'] = content[:published]
6464
doc.data['redirect'] = url
65+
doc.content = content[:content]
6566
site.collections['posts'].docs << doc
6667
end
6768

@@ -90,8 +91,12 @@ def fetch_content_from_url(url)
9091
parsed_html = Nokogiri::HTML(html)
9192

9293
title = parsed_html.at('head title')&.text.strip || ''
93-
description = parsed_html.at('head meta[name="description"]')&.attr('content') || ''
94-
body_content = parsed_html.at('body')&.inner_html || ''
94+
description = parsed_html.at('head meta[name="description"]')&.attr('content')
95+
description ||= parsed_html.at('head meta[name="og:description"]')&.attr('content')
96+
description ||= parsed_html.at('head meta[property="og:description"]')&.attr('content')
97+
98+
body_content = parsed_html.search('p').map { |e| e.text }
99+
body_content = body_content.join() || ''
95100

96101
{
97102
title: title,

0 commit comments

Comments
 (0)