Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load rustdoc's JS search index on-demand. #82310

Merged
merged 1 commit into from
Mar 4, 2021

Conversation

jsha
Copy link
Contributor

@jsha jsha commented Feb 20, 2021

Instead of being loaded on every page, the JS search index is now loaded when either (a) there is a ?search= param, or (b) the search input is focused.

This saves both CPU and bandwidth. As of Feb 2021, https://doc.rust-lang.org/search-index1.50.0.js is 273,838 bytes gzipped or 2,544,939 bytes uncompressed. Evaluating it takes 445 ms of CPU time in Chrome 88 on a i7-10710U CPU (out of a total ~2,100 ms page reload).

Tested on Firefox and Chrome.

New:
https://jacob.hoffman-andrews.com/rust/search-on-demand/std/primitive.slice.html
https://jacob.hoffman-andrews.com/rust/search-on-demand/std/primitive.slice.html?search=fn

Old:
https://jacob.hoffman-andrews.com/rust/search-on-load/std/primitive.slice.html
https://jacob.hoffman-andrews.com/rust/search-on-load/std/primitive.slice.html?search=fn

@rust-highfive
Copy link
Collaborator

Some changes occurred in HTML/CSS/JS.

cc @GuillaumeGomez

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @GuillaumeGomez (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 20, 2021
@jsha jsha changed the title Load JS search index on-demand. Load rustdoc's JS search index on-demand. Feb 20, 2021
@GuillaumeGomez
Copy link
Member

I have two problems with this change:

  1. The crates list for the search isn't loaded anymore until you focus on the search input.
  2. Your browser is heavily using caching, so once it loaded the search index once, it's never loaded anymore until the hash/URL of the URL changes.

@jsha
Copy link
Contributor Author

jsha commented Feb 20, 2021

The crates list for the search isn't loaded anymore until you focus on the search input.

Pushed a fix. Thanks for spotting. I've also updated the demo pages at https://jacob.hoffman-andrews.com/rust/search-on-demand/std/primitive.slice.html.

Your browser is heavily using caching, so once it loaded the search index once, it's never loaded anymore until the hash/URL of the URL changes.

In my tests, the browser is spending significant time parsing the JS (and the JSON inside). That's time that blocks other JS work, like building the toggles, which in turn blocks the overall page load from completing. This technique also reduces the memory footprint of doc pages in the common case where someone hasn't used search.

@GuillaumeGomez
Copy link
Member

When I pick a crate to filter the search results, the picked crate won't show up when I reload the page until I load the search index.

@GuillaumeGomez
Copy link
Member

Another thing which bothers me is the fact that the crate list arrives quite a while after clicking on the menu. It's really bad from a UX perspective...

@jsha
Copy link
Contributor Author

jsha commented Feb 20, 2021

When I pick a crate to filter the search results, the picked crate won't show up when I reload the page until I load the search index.

Good catch, fixed (and updated the demo pages).

Another thing which bothers me is the fact that the crate list arrives quite a while after clicking on the menu. It's really bad from a UX perspective...

Agreed, this also bothers me. Perhaps it's possible to generate the crate list statically rather than in JS? I'll take a look.

By the way, this noticeable delay is a good demonstration that loading the search JSON takes non-negligible time even when the JS is cached.

@GuillaumeGomez
Copy link
Member

By the way, this noticeable delay is a good demonstration that loading the search JSON takes non-negligible time even when the JS is cached.

That's actually a great argument in favor for this change!

Another thing which bothers me is the fact that the crate list arrives quite a while after clicking on the menu. It's really bad from a UX perspective...
Agreed, this also bothers me. Perhaps it's possible to generate the crate list statically rather than in JS? I'll take a look.

It'd require another JS file to store the crate list but it's possible.

In the last version on your website, it seems like the search-index is now loaded every time (even with an empty local storage).

@jsha
Copy link
Contributor Author

jsha commented Feb 20, 2021

It'd require another JS file to store the crate list but it's possible.

Is the crate list not known at generation time for any particular page? Also, are there situations where the crate list is quite large, or is the situation on doc.rust-lang.org representative (5 crates)?

In the last version on your website, it seems like the search-index is now loaded every time (even with an empty local storage).

Yep. The issue here is that if you pick a crate, then go back to "All crates", there's still a storage entry. Updated the code to ignore "All crates" (and pushed demo).

@GuillaumeGomez
Copy link
Member

Is the crate list not known at generation time for any particular page? Also, are there situations where the crate list is quite large, or is the situation on doc.rust-lang.org representative (5 crates)?

It's actually very common to have more than 5 crates. To be exact, it depends on how many dependencies you have (so I'll let you do the maths :p).

Yep. The issue here is that if you pick a crate, then go back to "All crates", there's still a storage entry. Updated the code to ignore "All crates" (and pushed demo).

I precised that my local storage was empty, which meant that I removed the local storage "rustdoc-saved-filter-crate" entry completely through the web dev tools. ;)

@GuillaumeGomez
Copy link
Member

Sorry, forgot to answer to one of your questions:

Is the crate list not known at generation time for any particular page?

No, and the reason is actually a bit more complex than what it seems: you might add more crates to your local documentation but the search index has to work in any case and not simply be overwritten. So to do so, if the search-index already has this crate, it overwrites this entry specifically, otherwise it happens it.

@jsha
Copy link
Contributor Author

jsha commented Feb 21, 2021

I pushed a new version (and updated demo) that generates a separate, smaller, crates.js listing the crates. This is used for building the dropdown and also for adding sidebar items depending on the page. There's still some issue: my crates.js has alloc, core, proc_macro, and std. But if I look on doc.rust-lang.org, the list should also include test.

In the last version on your website, it seems like the search-index is now loaded every time (even with an empty local storage).
I precised that my local storage was empty, which meant that I removed the local storage "rustdoc-saved-filter-crate" entry completely through the web dev tools. ;)

Found the issue. Looks like a difference between Firefox and Chrome. savedCrate was undefined in Chrome, but null in Firefox.


let crate_list_dst = cx.dst.join(&format!("crates{}.js", cx.shared.resource_suffix));
let crate_list =
format!("const crates = [{}];", krates.iter().map(|k| format!("\"{}\"", k)).join(","));
Copy link
Member

@GuillaumeGomez GuillaumeGomez Feb 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const isn't part of ES5, please use instead: window.CRATES = [{}]. Also: this update is problematic because you remove crates that aren't listed, which is why you don't have test listed in the crates. Please update the list instead of rewriting it completely.

EDIT: I wrote CRATES capitalized but it's simply to avoid name conflicts, because "crates" seems way too common. If you want to go for another name (longer at least), don't hesitate to do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you wonder: it's why we update the search index like that and have a crate define on each line separately.

}

// Assumes `crates{version}.js` has already been loaded, setting the global const `crates`.
addSearchOptions(crates);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like much to "assume" things are there in an async world. Instead, please call the function to init the crates lists from the crates{version}.js file directly.

Copy link
Member

@GuillaumeGomez GuillaumeGomez Feb 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok nevermind, crates{version}.js will always be loaded before. It's fine as is. However, please use window.crates or equivalent. I don't like the usage of global with such common names, it makes the readibility a lot more problematic.

// Assumes `crates{version}.js` has already been loaded, setting the global const `crates`.
addSearchOptions(crates);
addSidebarCrates(crates);
var crateSearchDropDown = document.getElementById("crate-search");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put all this in a function too so we can avoid creating variables in the "global" scope of this function please?

@GuillaumeGomez
Copy link
Member

I think we're getting close to something good! Not having to load the search-index when unneeded is a good idea. I'm still worried about the big delay we have on the first search though... I'll think about that.

@jsha
Copy link
Contributor Author

jsha commented Feb 21, 2021

A couple of thoughts on reducing the feeling of delay on the first search:

Right now, there's no reaction on the rest of the page when you highlight the search box. It would be nice to insert some text below the search box like "Getting ready to search." Alternately: right now the search box says "Click or press 'S' to search, '?' for more options." And it continues to say that even after you focus the search box. We could make it so focusing the search box changes this to say "Type in your search here." That would indicate to the user that their input was received.

Similarly, when the user starts typing, we should be responsive to that typing even if the index is not yet loaded. For instance, We could add text under the search box "Searching for 'yourquery'." One problem: the index JS load is one big event, not interruptible by other events. We could probably work around that by breaking up the JSON.parse into multiple calls (one for each crate), and adding setTimeout between. Similarly, I think there's a fair amount of work happening to turn the parsed JSON into in-memory structures; we could similarly add setTimeout during that work to make it possible to update the page in between.

@rust-log-analyzer

This comment has been minimized.

@jsha
Copy link
Contributor Author

jsha commented Feb 24, 2021

By the way, I was looking at the delay when searching and realized we have an artificial delay of 500ms before any search gets executed:

searchTimeout = setTimeout(search, 500);
.

@GuillaumeGomez
Copy link
Member

Yep, but in here we outrun this timeout apparently. ;)

@jsha
Copy link
Contributor Author

jsha commented Feb 24, 2021

I did a little test: I added some console.log lines for various events, cleared cache, loaded a page, then hit 's' and typed "string" as fast as I could.

I just took a short typing test, and I type 475 characters in 60 seconds, or 1 character every 126ms. The JS code triggers on both keydown and keyup, so I'd expect to see 2 events every 126ms.

.9890 loading search-index
.0589 search!
.0775 search-index loaded
.0857 input
.0858 input
.0860 keyup
.0860 input
.0861 input
.0862 keyup
.0873 keyup
.0880 input
.0966 keyup
.1468 search!
.5727 search!

This shows that no events fire during load of the search-index. Some time after the index is fully loaded, we see a series of events come in impossibly fast. Then 500ms later, the search is actually executed. Due to the single-threaded nature of JS, no events can fire while the search-index is loading.

Note that you can bypass the 500ms timeout and search immediately by hitting enter. This might be a good way to test whether things feel slow independent of that 500ms timeout.

If my theory above is correct, adding setTimeouts during the search-index load should allow events to fire and generally make the page feel more responsive.

@jsha
Copy link
Contributor Author

jsha commented Feb 25, 2021

I tested the setTimeout concept. Unfortunately it didn't seem to help. None of the keyboard input events fired between loading, parsing, and buildIndex.

I did find that search was running twice. Once, right after buildIndex, it would run with whatever was in the search box, even if I was still typing. Since the search itself is a bit expensive (250ms on my machine), doing it twice slows everything down. I fixed that so that "right after buildIndex" search only fires when there's a search in the URL bar.

Overall, I think this is the best we're likely to do. If we wanted to get really fancy, we could use a Web Worker that always has a copy of the index loaded. That would make searches blazing fast, but it wouldn't work for local files so we'd need two code paths. That's probably too much complexity.

}
}
crates_text.sort(function(a, b) {
crates.sort(function(a, b) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought: now that the crate list is separated from the search index, it might be nice to sort them when generating the crate list in the rust code instead of doing it client side every time. What do you think? If you think it's too much, please add a FIXME comment so it can be taken care of later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'd be happy to do this!

@bors
Copy link
Contributor

bors commented Feb 27, 2021

☔ The latest upstream changes (presumably #82511) made this pull request unmergeable. Please resolve the merge conflicts.

let dst = cx.dst.join(&format!("search-index{}.js", cx.shared.resource_suffix));
let (mut all_indexes, mut krates) = try_err!(collect_json(&dst, &krate.name.as_str()), &dst);
all_indexes.push(search_index);
krates.push(krate.name.to_string());
krates.sort();
krates.dedup();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering: is that case even possible? Normally you can only have each crate once. Did you found a case where there was a duplicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was moving this up from here: https://github.com/rust-lang/rust/pull/82310/files#diff-40a0eb025da61717b3b765ceb7fab21d91af3012360e90b9f46e15a4047946faL1081. I do think it's possible that krates.push(krate.name.to_string()) adds a crate name that is a duplicate of a crate already in the list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the code of collect_json, this more precisely:

// ...
let prefix = format!("\"{}\"", krate);
// ...
    if line.starts_with(&prefix) {
        continue;
    }
// ...

If the crate is already in the search index, we don't add it to the crates list returned. So the dedup is unnecessary here.

@GuillaumeGomez
Copy link
Member

I went through the changes once again and I think we're getting close to the approval. Just two comments (one nit and one question about the dedup). Thanks for going through this review process! :)

Instead of being loaded on every page, the JS search index is now
loaded when either (a) there is a `?search=` param, or (b) the search
input is focused.

This saves both CPU and bandwidth. As of Feb 2021,
https://doc.rust-lang.org/search-index1.50.0.js is 273,838 bytes
gzipped or 2,544,939 bytes uncompressed. Evaluating it takes 445 ms
of CPU time in Chrome 88 on a i7-10710U CPU (out of a total ~2,100
ms page reload).

Generate separate JS file with crate names.

This is much smaller than the full search index, and is used in the "hot
path" to draw the page. In particular it's used to crate the dropdown
for the search bar, and to append a list of crates to the sidebar (on
some pages).

Skip early search that can bypass 500ms timeout.

This was occurring when someone had typed some text during the load of
search-index.js. Their query was usually not ready to execute, and the
search itself is fairly expensive, delaying the overall load, which
delayed the input / keyup events, which delayed eventually executing the
query.
@jsha
Copy link
Contributor Author

jsha commented Mar 2, 2021

Updated, thanks!

@GuillaumeGomez
Copy link
Member

Thanks!

@bors: r+

@bors
Copy link
Contributor

bors commented Mar 3, 2021

📌 Commit 768d5e9 has been approved by GuillaumeGomez

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 3, 2021
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Mar 4, 2021
…laumeGomez

Load rustdoc's JS search index on-demand.

Instead of being loaded on every page, the JS search index is now loaded when either (a) there is a `?search=` param, or (b) the search input is focused.

This saves both CPU and bandwidth. As of Feb 2021, https://doc.rust-lang.org/search-index1.50.0.js is 273,838 bytes gzipped or 2,544,939 bytes uncompressed. Evaluating it takes 445 ms of CPU time in Chrome 88 on a i7-10710U CPU (out of a total ~2,100 ms page reload).

Tested on Firefox and Chrome.

New:
https://jacob.hoffman-andrews.com/rust/search-on-demand/std/primitive.slice.html
https://jacob.hoffman-andrews.com/rust/search-on-demand/std/primitive.slice.html?search=fn

Old:
https://jacob.hoffman-andrews.com/rust/search-on-load/std/primitive.slice.html
https://jacob.hoffman-andrews.com/rust/search-on-load/std/primitive.slice.html?search=fn
JohnTitor added a commit to JohnTitor/rust that referenced this pull request Mar 4, 2021
…laumeGomez

Load rustdoc's JS search index on-demand.

Instead of being loaded on every page, the JS search index is now loaded when either (a) there is a `?search=` param, or (b) the search input is focused.

This saves both CPU and bandwidth. As of Feb 2021, https://doc.rust-lang.org/search-index1.50.0.js is 273,838 bytes gzipped or 2,544,939 bytes uncompressed. Evaluating it takes 445 ms of CPU time in Chrome 88 on a i7-10710U CPU (out of a total ~2,100 ms page reload).

Tested on Firefox and Chrome.

New:
https://jacob.hoffman-andrews.com/rust/search-on-demand/std/primitive.slice.html
https://jacob.hoffman-andrews.com/rust/search-on-demand/std/primitive.slice.html?search=fn

Old:
https://jacob.hoffman-andrews.com/rust/search-on-load/std/primitive.slice.html
https://jacob.hoffman-andrews.com/rust/search-on-load/std/primitive.slice.html?search=fn
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 4, 2021
Rollup of 8 pull requests

Successful merges:

 - rust-lang#80527 (Make rustdoc lints a tool lint instead of built-in)
 - rust-lang#82310 (Load rustdoc's JS search index on-demand.)
 - rust-lang#82315 (Improve page load performance in rustdoc)
 - rust-lang#82564 (Revert `Vec::spare_capacity_mut` impl to prevent pointers invalidation)
 - rust-lang#82697 (Fix stabilization version of move_ref_pattern)
 - rust-lang#82717 (Account for macros when suggesting adding lifetime)
 - rust-lang#82740 (Fix commit detected when using `download-rustc`)
 - rust-lang#82744 (Pass `CrateNum` by value instead of by reference)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 36b7bef into rust-lang:master Mar 4, 2021
@rustbot rustbot added this to the 1.52.0 milestone Mar 4, 2021
@camelid camelid added A-rustdoc-ui Area: Rustdoc UI (generated HTML) T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Mar 6, 2021
@jsha jsha deleted the rustdoc-search-onfocus branch March 7, 2021 19:56
Folyd added a commit to huhu/rust-search-extension that referenced this pull request Mar 17, 2021
rustdoc merged a new PR [#82310](rust-lang/rust#82310) a few days ago, this cause breaking changes to this extension. This commit aims to fix this incompatibility.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rustdoc-ui Area: Rustdoc UI (generated HTML) S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants