Support of including blog in the website search #2136

tim-schilling · 2025-07-23T22:14:25Z

The blog results should have a property of whether it is included in the search results. We should also limit the blogs that are searchable for a version of Django based on the support end. This will allow us to limit the inclusion of blog posts in the search based on the time the entry was created, keeping the search results relevant to that version of Django.

~~This is a WIP yet. I'm opening the draft PR to solicit early feedback on the approach. I'm thinking we'll re-use the DocumentationCategory.WEBSITE in place of "weblog".~~

tim-schilling · 2025-07-31T12:44:32Z

I was given tacit approval (lack of pushback) from the website WG to use this approach. I plan on working on this at the DjangoCon Africa sprints (Aug 14 & 15), but if anyone wants to complete it before that, please do!

tim-schilling · 2025-09-01T17:49:54Z

I'm going to try to clean up the commits shortly here.

tim-schilling · 2025-09-01T19:15:49Z

Cleaned up the commits. Please let me know if any other changes are necessary.

blog/migrations/0008_alter_entry_is_searchable.py

tim-schilling · 2025-09-03T19:02:46Z

docs/models.py

+                path = reverse_path(url_name, kwargs=kwargs)
+                request = RequestFactory().get(path, HTTP_HOST=www_host)
+                body = resolve(path).func(request).render().text
+                # Need to parse the body element.


Forgot this TODO yet.

@bmispelon do you have any good ideas here?

bmispelon

This is a fantastic PR, thanks for submitting it! It addresses a real issue that's been plaguing the site for a long time, and as far as PR go, it's a really high quality one, kudos 👏🏻

Of course because I'm me I'll always find something to complain about so I've left a few comments ranging from improvement suggestions to open questions.

I'll note that I have not tested the code locally (yet). I'm still working on getting the preview server up and running and I think this PR would be a great candidate for it.

blog/models.py

docs/tests/test_models.py

bmispelon · 2025-09-03T19:22:19Z

docs/tests/test_models.py

+        cls.document_index, cls.document_view, cls.document_detail = (
+            cls.release.documents.order_by("path")
+        )


This will break if/when more views are added to _sync_views_to_db(), right?

I'd suggest breaking the single test_document_url() method into three separate tests then.

Judging from the rest of the tests, it seems like there's an effort to keep them efficient. It may make more sense as two separate tests, one for the old document_url and then a new one for the DocumentationCategory.WEBSITE branch. Let me know if you disagree.

blog/models.py

bmispelon · 2025-09-03T19:50:55Z

docs/models.py

+            ]
+            if not www_hosts or not (www_host := www_hosts[0]):
+                return
+            synced_views = [


I think this should live in a constant at the top level of a module (I'll let you decide which module makes more sense 😁 )

If we're moving this ~~from the code~~ away from the usage logic, then we should use a dataclass so it's not relying on positional arguments. That gets a bit more verbose and likely could go in an entirely separate module.

May be side-stepped due to #2136 (comment)

bmispelon · 2025-09-03T19:59:13Z

docs/models.py

+                path = reverse_path(url_name, kwargs=kwargs)
+                request = RequestFactory().get(path, HTTP_HOST=www_host)
+                body = resolve(path).func(request).render().text


That's a lot of stuff going on in such a few lines: you make it look deceptively simple 😁

I'm curious: did you try other approaches before settling on this? I'm a bit worried that the whole www_host logic could be a bit brittle. If we're calling the view function directly anyway, can we skip setting the host on the request object?

This is definitely for a reason, but let me try once more and better document the weirdness.

Okay, so the simplest we can get it is to use get_template("aggregator/ecosystem.html").render(). However, that does sort of open us to the problem of the view (which is defined in the urls.py) of using a different template or implementation than expected. However, using this approach eliminates the need to get the host which eliminates having the search the ALLOWED_HOSTS setting. So I think it's worth it. Let me know what you think.

docs/models.py

bmispelon

I like the move to a dataclass for SearchableView, nice one 👍🏻

I think we can go further and delegate the nitty-gritty details of getting a "document"'s URL and content to that class.

I've got grand ideas of generalizing this and having a collection of searchable/indexable models (sphinx document, blog, template view, flat page, ...) but that'd make the scope of this PR explode and I'd rather see this shipped. We can always iterate.

Other than that I left a question about the new support_end. I think we should be conservative with adding new fields and would like to explore reusing existing ones, to minimize the combinatorial explosion of possible model states. If we have to keep support_end, would it be possible for the existing DocumentRelease to get a sensible default assigned to them with a data migration (we have 100 of them in production, it'd be a shame if we had to go and back-populate them by hand).

Thanks again for your work on this!

bmispelon · 2025-09-08T12:57:49Z

docs/models.py

+    support_end = models.DateField(
+        null=True,
+        blank=True,
+        help_text="The end of support for this release of Django.",
+    )


Do we need a dedicated field for this, why couldn't we get the information from release.eol_date instead?

This is why we do code reviews. Let me get this switched over.

docs/models.py

docs/search.py

pauloxnet · 2025-10-13T13:11:15Z

Can you review this PR based on this latest change in the search docs #2192

tim-schilling · 2025-10-13T14:56:59Z

@pauloxnet I don't see anything obvious that would cause a problem. As long as #2192 didn't change how the search data needs to be stored, then this PR won't be impacted. The hack that is this PR is to insert search-able data and point it to the blog or ecosystem page.

pauloxnet · 2025-10-13T15:00:32Z

I don't see anything obvious that would cause a problem. As long as #2192 didn't change how the search data needs to be stored, then this PR won't be impacted. The hack that is this PR is to insert search-able data and point it to the blog or ecosystem page.

I was indirectly suggesting using the approach outlined in the documentation, which involves delegating the generation of search vectors to a generated field. I'm sorry I don't have the time to do a more thorough review of this PR (I'm recuperating from my Django sprints). 😅

tim-schilling · 2025-10-13T15:17:18Z

@pauloxnet I think you can agree there's value in shipping functionality rather than perfecting the approach. If that suggestion is more of a, "hey we need to conform to this new standard", I can make the changes. If it's a "it'd be cool if it did this" type suggestion, I'm going to push for shipping something first.

pauloxnet · 2025-10-13T15:25:57Z

I think you can agree there's value in shipping functionality rather than perfecting the approach. If that suggestion is more of a, "hey we need to conform to this new standard", I can make the changes. If it's a "it'd be cool if it did this" type suggestion, I'm going to push for shipping something first.

My comment was, rather, I saw that you started this PR before the docs search improvement with GeneratedField, and, not having time to do a full review of the PR, I tried to point it out in this PR. No intent of perfectionism.

docs/models.py

blog/models.py

The blog results should have a property of whether it is included in the search results. We should also limit the blogs that are searchable for a version of Django based on the support end. This will allow us to limit the inclusion of blog posts in the search based on the time the entry was created, keeping the search results relevant to that version of Django. * Added is_searchable and made published accept datetime cut-off parameter. * Used Entry.get_absolute_url to encapsulate www host. * Extracted get_search_config helper function. Co-authored-by: Baptiste Mispelon <[email protected]>

tim-schilling · 2025-10-15T23:05:58Z

@sarahboyce identified a problem outside of the PR in that the update_docs command is only run periodically which means we wouldn't be introducing any searchable website content until that next run.

sarahboyce · 2025-10-16T09:15:42Z

the update_docs command is only run periodically which means we wouldn't be introducing any searchable website content until that next run.

Want to add that there is a little bit more here. We only update the docs for the search if there are changes to the docs. This means only when the django/django repo has a new commit on the branch for that release (e.g. main = dev, stable/5.2.x = 5.2). This happens quite frequently for main, versions in pre-release and the version in active support, but would take a security release that might happen every other month for versions like 4.2

So we might want to have that the blog/website entries can be created without changes being required on the django/django repo

tim-schilling · 2025-10-16T15:37:57Z

@sarahboyce and I discussed this PR. We arrived at the following conclusions:

We should remove the .published(as_of=) parameter because it's backwards and going to cause more problems than it's worth down the line. We should let users decide what content is relevant for them and eventually we can apply search rankings to the date of the blog post.
We're going to stick with the is_searchable: bool property, though in the future this may change as the site wide search feature is implemented. (the idea of categories was floated for the Entry / blog post model)
This approach has the downside of a blog post being created and not being immediately searchable. However, given that most documentation is rebuilt within a week, this seems acceptable to release this. A follow-up issue and PR should be created to fix this though.

tim-schilling · 2025-10-16T18:36:20Z

This is ready for a hopefully final review.

Can I provide a script to run on production to batch update blog posts?

bmispelon · 2025-10-16T18:39:55Z

Can I provide a script to run on production to batch update blog posts?

You mean a data migration? Sure you can add one to the PR 😁
A custom management command could work too (and is easier to test).
A script would be a last resort option I would say.

I'm hoping to be able to review this tomorrow during Mr Triplett's office hours, so you have time to push a few more commits.

sarahboyce

Thank you for all the work on this Tim!
I did some testing and have reviewed - this looks great ⭐

I created a separate issue around the UI: #2266

It's likely something like update_docs --force --language=en needs to be run in order to have some older versions get the blog entries and ecosystem page (once the blogs are marked as searchable)

bmispelon

What an excellent PR, outstanding work Tim! 💚

I feel you deserve a prize for this 🙃

Anyway, I only found a nothingburger of a nitpick, the PR is good to go as far as I'm concerned.

Let's 🚢 this!

docs/tests/test_models.py

Co-authored-by: Baptiste Mispelon <[email protected]>

tim-schilling · 2025-10-17T16:32:45Z

@bmispelon I don't agree with using a data migration to bulk update the entries. I have primary keys and if we put it in a data migration, we run the risk of updating data in non-production environments unexpectedly causing weird issues that are difficult to debug.

Edit: Considering there are only 22, I can do them manually after it's been deployed.

bmispelon · 2025-10-17T19:57:34Z

Oh, so the script is just:

Entry.objects.filter(pk__in=[805, 804, 801, 796, 792, 789, 788, 783, 781, 774, 773, 744, 739, 738, 721, 707, 702, 655, 650, 559, 451, 442, 418, 391]).update(is_searchable=True)

?
Yeah, I can run that after deploy. I agree that a migration is definitely not the way to go. I thought maybe there was some more complicated logic based on dates and/or content.

tim-schilling · 2025-10-17T20:05:53Z

Oh, no. It was just "Tim's picks". If others want to add more, that's fine, but figured we could start with those.

bmispelon

I've tested this locally and it works great!

I'm happy to merge, would you prefer to squash all the commits, or do you think it makes sense to keep some of them separate?

Also, do you want to add a couple dozen "Co-authored: Tim Schilling" to the commit message for good measure? 😁

tim-schilling · 2025-10-18T00:00:11Z

I was assuming you'd use the squash and merge option. I can squash them in a bit though.

Co-Authored-By Tim

sarahboyce · 2025-10-19T12:50:47Z

Unsure how much of an issue this would be but be aware that blog items that currently exist in https://www.djangoproject.com/sitemap.xml would now also exist in https://docs.djangoproject.com/sitemap-en.xml

Fixing this should be relatively simple 🤔
I think we should exclude from the docs sitemap as this would also duplicate the same urls per version of Django within the sitemap

sarahboyce · 2025-10-22T13:47:20Z

Can cherry-pick 932d8ce onto this PR 👍

tim-schilling force-pushed the searchable-blog branch from 83133b1 to 4f9e4bf Compare July 24, 2025 12:29

tim-schilling force-pushed the searchable-blog branch from 4f9e4bf to 2bf75cd Compare September 1, 2025 16:07

tim-schilling marked this pull request as ready for review September 1, 2025 17:48

tim-schilling force-pushed the searchable-blog branch from a9a4f8a to 837e808 Compare September 1, 2025 19:15

adamzap reviewed Sep 1, 2025

View reviewed changes

blog/migrations/0008_alter_entry_is_searchable.py Outdated Show resolved Hide resolved

tim-schilling commented Sep 3, 2025

View reviewed changes

bmispelon reviewed Sep 3, 2025

View reviewed changes

bmispelon reviewed Sep 8, 2025

View reviewed changes

sarahboyce reviewed Oct 15, 2025

View reviewed changes

docs/models.py Outdated Show resolved Hide resolved

blog/models.py Show resolved Hide resolved

tim-schilling force-pushed the searchable-blog branch from 396050c to ad94998 Compare October 15, 2025 22:54

tim-schilling and others added 2 commits October 15, 2025 17:59

Support dev releases which have no DocumentRelease.release set.

ec0d063

tim-schilling force-pushed the searchable-blog branch from d9d40ab to ec0d063 Compare October 15, 2025 22:59

Remove blog filtering on published date for searchable content.

5a025ca

tim-schilling requested review from bmispelon and sarahboyce October 16, 2025 18:35

sarahboyce approved these changes Oct 17, 2025

View reviewed changes

bmispelon approved these changes Oct 17, 2025

View reviewed changes

docs/tests/test_models.py Outdated Show resolved Hide resolved

Update docs/tests/test_models.py

ee63cfc

Co-authored-by: Baptiste Mispelon <[email protected]>

tim-schilling force-pushed the searchable-blog branch from 2ef9631 to ee63cfc Compare October 17, 2025 16:16

bmispelon approved these changes Oct 17, 2025

View reviewed changes

sarahboyce mentioned this pull request Oct 19, 2025

Added static views to sitemap. #2276

Open

sarahboyce mentioned this pull request Oct 21, 2025

Synced webpages from sitemap to Document model for search. #2284

Open

sarahboyce added blog search labels Oct 21, 2025

Uh oh!

Support of including blog in the website search #2136

Are you sure you want to change the base?

Support of including blog in the website search #2136

Uh oh!

Conversation

tim-schilling commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tim-schilling commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tim-schilling commented Sep 1, 2025

Uh oh!

tim-schilling commented Sep 1, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bmispelon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tim-schilling Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tim-schilling Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bmispelon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pauloxnet commented Oct 13, 2025

Uh oh!

tim-schilling commented Oct 13, 2025

Uh oh!

pauloxnet commented Oct 13, 2025

Uh oh!

tim-schilling commented Oct 13, 2025

Uh oh!

pauloxnet commented Oct 13, 2025

Uh oh!

Uh oh!

Uh oh!

tim-schilling commented Oct 15, 2025

Uh oh!

sarahboyce commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tim-schilling commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tim-schilling commented Jul 23, 2025 •

edited

Loading

tim-schilling commented Jul 31, 2025 •

edited

Loading

tim-schilling Sep 4, 2025 •

edited

Loading

tim-schilling Sep 4, 2025 •

edited

Loading

sarahboyce commented Oct 16, 2025 •

edited

Loading

tim-schilling commented Oct 16, 2025 •

edited

Loading

sarahboyce left a comment •

edited

Loading

tim-schilling commented Oct 17, 2025 •

edited

Loading

tim-schilling commented Oct 18, 2025 •

edited

Loading

sarahboyce commented Oct 19, 2025 •

edited

Loading