-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing Cachebust when using index_format = "elasticlunr_json", language code is hard coded, and will not work from subdomain #2167
Comments
I am motivated to work on this, just waiting for a little free time. hopefully within the next week or two. |
For the cachebust, how do you see it working? I don't really want people to have to manually update the hash of the imported filename everytime in the template they add/touch a post. |
First add a couple variables to search.js:var base_url="";
var sha256=""; These variable would be used as part of the fetch. so instead of this: index = fetch("/search_index.en.json") we would have this: index = fetch( base_url + '/search_index.' + lang + '.json?h=' + sha256 ) Next add a field to
|
Looking at the proposed solution, i think it's almost better to have people copy the hash manually tbh. An alternative would be to always cachebust by appending a random string as a query param to the url loading the index but that's not super efficient. In practice the index is big enough that you probably only want to use it for small enough sites where the index would not be too big. |
What about the proposed solution do you not like? Is it that I am proposing we update values in an existing js file? (I don't like the idea of using a random string at all, waste of bandwidth, I would never use it if it did that.) Also I was considering eventually implementing pagefind I think it might be a good solution for larger sites. demo: https://pagefind.app/ edit: Apparently pagefind runs after the SSG builds the site... so it would be handled independently of zola, I am not sure what this will entail in practice, but I am interested in trying it with abridge, once I do I will document the steps. I think for any site under 1,000 posts its probably better to use elasticlunr or tinysearch, because then the entire index is loaded, and so search is instantaneous, but once a site gets to a certain point in size I think pagefind would make a lot of sense. |
I thought of another solution, it would involve being able to create a template for a javascript file, and zola being able to write the javascript file. so basically you would create so in the search.js template file, instead of this: index = fetch("/search_index.en.json") we would have this: index = fetch( {{ get_url(path=search_index.en.json, trailing_slash=false, cachebust=true) | safe }} ) This is assuming zola can create a javascript file from a template as easily as its able to create an html file from a template... One obvious catch with this is that Zola would need to build the html pages and json index first, then the js files. (at least this is what I am assuming, in order to get the hash of the json index) edit1: If Zola was able to create js files from templates then you could actually get the hash of Other js files for doing on demand loading like this. (currently I have been using npm to update these hashes, which is less than ideal) edit2: In a JS template we could check config.toml for feature flags to determine whether or not to include a block of js code in the output js file..... It would not be as compact as what you get from uglifyjs but it would work. |
Bundling JS is out of scope, there's a lot to it involved. We can easily template a JS file though. It's a bit weird to run search on the generated output of a SSG for pagefind, you might want to fine tune what you're including rather than just looking at HTML. I'm still thinking about to handle that, it's not easy! |
👋 Popping in to drop a few thoughts!
One thing I'm working on right now is a Node.js API which can take in raw content or files and build an index, which allows Pagefind to be integrated into the development server of SSGs. It also allows you to pass direct records in, rather than indexing HTML. Since Pagefind is a binary under the hood, this is actually a generic stdio/out communication system that could be re-implemented fairly trivially from any language. But also, Pagefind is Rust-based, so it's totally within reason to expose a lib interface for other Rust packages 👀
Pagefind should ideally expose a |
That is Awesome! I want to try it out!
That sounds Awesome!
Sounds like a great idea! I hope pagefind would also have a buildAll, so that it would build a single chunk. The reason being is if a site is still reasonably small and you have decided to just fetch the entire index, then it would be best in my opinion to have it be a single chunk, this would reduce the number of server requests to 1. Basically if my index is 400kb or less then I would rather just fetch a single index file, then if the index ever grows beyond that I would rather chunk it. |
We can template it but not right now, since Zola is only loading .html files. |
Yes, I understood that, planning to add the ability to load js files, or at least try to, unless you beat me to it. |
I just tested this using a .html and .md file, the output was exactly correct other than having a .html file extension. I performed a Diff on the original search.js and the one that zola generated from a template, and it was a perfect match other than there is now the cachebust hash which was generated without issue. I am going to look at the Zola code now and try to find the spot where it loads .html file templates and see if I can add js as a valid template extension. |
That's an easy change but we need to think about templated files like that and see how other SSGs handle it. |
It seems Hugo has a features called js.Build... but it seems a lot more complicated than it needs to be, possibly they are trying to do more than simply creating a file from a template: https://gohugo.io/hugo-pipes/js/ hugo can create a js file from a template using ExecuteAsTemplate: https://gohugo.io/hugo-pipes/resource-from-template/ ( I have never tried, its been years since I have used Hugo, and back then I did not write much javascript code ) It looks like eleventy//11ty can template a js file, I have never used this tool before, so the documentation is not very clear to me, but here is the page: https://www.11ty.dev/docs/languages/javascript/ It appears they name their templates |
I tried implementing this: https://github.com/Jieiku/zola/commit/995a0d39ac96b7a93ea2a8f52862062c2c8775ce but now I am trying to figure out how to handle the chicken and the egg dilemma I setup a couple test sites on this repo, for more info check the README but basically the site builds and the search.js is created properly with the cachebust... unless I try to reference search.js in a template, eg in the head of my index with get_url: https://github.com/Jieiku/zola-test-sites/blob/18ccc95248fe68bb6b329d14a06e29bb6f4de80f/fails/templates/base.html#L6
I am going to give this more thought, I am not yet sure how best to resolve this. (This is only the second time I have worked with rust code, the first time was when I made a small change to the tinysearch library, help from anyone is appreciated if you think you have an idea.) |
You'll need to update the paths where |
I would have multiple uses for this feature beyond the cachebust for the search.js Appreciate the tip on what to look for, going to have time to work on this again tomorrow :) |
Can you describe them? |
These are just the ones I immediately could make use of, there could also be other benefits to being able to use Tera Templates for JS or JSON files that I am not yet thinking of... First reason is to be able to add the cachebust, baseurl, and language when loading the json search indexSecond reason is to be able to generate json data that can later be consumed for other search engine tools to create their index. (tinysearch, stork, etc.)I can allow not only js, but also json. Tinysearch builds its search index from a json file. Currently because Zola can ouput html, I basically dump the json data into an html file: https://github.com/Jieiku/abridge/blob/master/templates/tinysearch_json.html Stork is similar but uses a toml file instead of json, I do the same here, dump it into an html file: https://github.com/Jieiku/abridge/blob/master/templates/stork_toml.html This works but because its an html file it ends up in the site index: Third reason is that you could then use a facade to delay the loading of certain js features.Lets say you have a fairly heavy javascript feature, it might be a search engine, or some other js tool/script. Similar to this: https://github.com/Jieiku/abridge/blob/master/static/search_facade.js Currently to do anything like that you would have to use 3rd party tools such as npm to build the hashes. Fourth reason is I can consolidate some of my javascript code to a single file or fewer files.I can include blocks of javascript based on config values in the themes config.toml: mytheme.js: {%- if config.extra.themeswitcher %}
document.getElementById('mode').addEventListener('click', () => {
document.documentElement.classList.toggle('light');
localStorage.setItem('theme', document.documentElement.classList.contains('light') ? 'light' : 'dark');
});
{%- endif %}
{%- if config.extra.protect_email %}
(function() {
// Find all the elements on the page that use class="m-protected"
var allElements = document.getElementsByClassName('m-protected');
// Loop through all the elements, and update them
for (var i = 0; i < allElements.length; i++) {
// fetch the hex-encoded string from the href property
var encoded = allElements[i].getAttribute('href');
// decode the email address
var decoded = atob(encoded.substring(1));
// Set the link to be a "mailto:" link
allElements[i].href = 'mailto:' + decoded;
}
})();
{%- endif %} |
You NAILED it, that small change and its now building! (I added file_templates path for search_for_file ) It is now saying the hash does not match, and I see why. What happens, is that it creates the hashes for search.js before Zola modifies it to update the fetch line. It creates a hash when its like this: var fetchURL = '{{ get_url(path="search_index." ~ lang ~ ".json", trailing_slash=false, cachebust=true) | safe }}'; I verified this by checking the hash of the template before it is processed: After Zola processes the template, it becomes this: var fetchURL = 'https://abridge.netlify.app/search_index.en.json?h=a6e47a0d153131488e74'; it should have had this hash: It is kinda like the processing for the hashes related to search.js (which gets inserted into the HEAD of base.html) need to be delayed until search.js is in the output folder, at which point it would be fully built. Another solution, would be that whenever its using get_url and other hash function on files in the file_templates directory, it should process them with Tera before creating the hash.... I am not sure what would be best, I think parsing them with Tera before grabbing the hash from that would be simplest? I live in WA, USA (LOTS OF RAIN!) I have been waiting for dry weather for some work that I need to get done outdoors, so I will probably only be able to work on it for a couple hours at the end of the day for the next week. (finally dry weather) |
I am just about finished with adding multi-lingual support to abridge. Need to find somebody fluent in French to check my translations here: Jieiku/abridge#108 It occurred to me that we should absolutely grab the language code from the DOM instead of during site creation. Otherwise there is no way to use a single js file for the search, you would need one js file per language. So that is what I did here: It now fetches the correct index, search_index.en.json for english and search_index.fr.json for french. I also found a bug with the new json index feature and opened an issue for it: #2193 |
Bug Report
A new json index_format was added: #1998
When using the new json format for the search index
index_format = "elasticlunr_json"
the cachebust is missing.If you add new posts, repeat visitors may not have those posts in their index if the browser still has the old index cached.
The relevant line is 149 here: https://github.com/getzola/zola/blob/master/docs/static/search.js#L149
zola/docs/static/search.js
Lines 147 to 158 in 8ae4c62
Because the index is fetched from the
search.js
file, zola would need to write to thissearch.js
file to add the hash cachebust to the fetch line. I can think of some fairly simple ways to do this with regex. Zola would need to know ahead of time which js file handles the search, for me this is alwayssearch.js
at the root level.Another issue that I thought about was that the language code is hard coded. One possible solution would be to have the
search.js
file check the language code from the page source<html lang="en-gb">
and then fetch the corresponding search index. I can probably submit a pull request for this later today if this sounds like a good solution.Another issue is that this fetch line grabs the json index from the root, this will be an issues for sites that reside in a subdomain, eg:
github.io/mysite
this is because this fetch will try to grab the resource fromgithub.io/search_index.en.json
when it should grab it fromgithub.io/mysite/search_index.en.json
One way of resolving this is for the site to have set the base meta tag, then have the search.js file check this tag while forming the fetch url, I do exactly that for the old js index+search bundle: https://github.com/Jieiku/abridge/blob/master/static/search_facade.js (the same principle could be applied here)
It would require less js DOM access if we simply used the base_url defined in config.toml to form the fetch url, this would resolve the issue of using subdomains, we could do the same with the language code. (meaning dont do this in js, just have zola handle these values in addition to the cachebust.)
Environment
Zola version: 0.17.1
Expected Behavior
cachebust hash added, and a way to facilitate more than one language code.
Current Behavior
no cachebust, hard coded language code.
Step to reproduce
The search here can be used to reproduce: https://www.getzola.org/documentation/getting-started/overview/
I am also currently refactoring abridge and have it implemented there (refactor branch is messy, still work in progress): https://github.com/Jieiku/abridge/tree/refactor
The text was updated successfully, but these errors were encountered: