-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api-major] Rework the find controller for unit testing #10099
Conversation
2bee9f4
to
3a1de55
Compare
/botio unittest |
3a1de55
to
40bc8ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the how you're making use of the EventBus
to break apart a number of the existing component dependencies in these patches :-)
While I'm liking the overall approach, please note that I've not yet had time to really test this. However, I've left a couple of initial comments based on a cursory look at the code. Thank you for doing this refactoring!
Unrelated to the rest of the PR, but something that I just happened to notice: Should this._linkService.pagesCount
be used instead at
pdf.js/web/pdf_find_controller.js
Line 466 in 54d6c24
const numPages = this._extractTextPromises.length; |
web/pdf_find_controller.js
Outdated
@@ -312,7 +333,14 @@ class PDFFindController { | |||
this._extractTextPromises[i] = extractTextCapability.promise; | |||
|
|||
promise = promise.then(() => { | |||
return this._pdfViewer.getPageTextContent(i).then((textContent) => { | |||
const textContentPromise = this._pdfDocument.getPage(i + 1).then( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the temporary variable here, since it's just as easy to do e.g.
promise = promise.then(() => {
return this._pdfDocument.getPage(i + 1).then((pdfPage) => {
return pdfPage.getTextContent({
normalizeWhitespace: true,
});
}).then((textContent) => {
const textItems = textContent.items;
...
Furthermore, with the only caller of BaseViewer.getPageTextContent
now removed you can probably remove the method itself as well; i.e. this code:
Lines 916 to 923 in 54d6c24
getPageTextContent(pageIndex) { | |
return this.pdfDocument.getPage(pageIndex + 1).then(function(page) { | |
return page.getTextContent({ | |
normalizeWhitespace: true, | |
}); | |
}); | |
} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the new commit series, together with the numPages
comment above.
web/app.js
Outdated
@@ -343,7 +343,7 @@ let PDFViewerApplication = { | |||
pdfLinkService.setHistory(this.pdfHistory); | |||
|
|||
this.findController = new PDFFindController({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not strictly necessary here, but I'd like to know what you think:
With PDFFindController
instances no longer (directly) depending on BaseViewer
instances, I wonder if it would make sense to move this initialization such that a findController
option could be passed in when initializing PDFViewer
above, since this method could then be removed:
Lines 966 to 969 in 54d6c24
setFindController(findController) { | |
this.findController = findController; | |
} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the new commit series (in a separate commit) since I think it's a good idea and it's more consistent with how other components are passed in.
web/pdf_find_controller.js
Outdated
const matchesCount = this._requestMatchesCount(); | ||
this.onUpdateResultsCount(matchesCount); | ||
this._eventBus.dispatch('updatefindbarresultscount', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is bikeshedding, but I do find some of the existing method names somewhat strange/inconsistent (e.g. _updateUIResultsCount
with the current position now being included too) and do slightly prefer the ones used in the ExternalServices
and the Firefox integration (having recently spent some time working with those).
With that in mind, how about using the name updatefindmatchescount
instead here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the new commit series.
web/pdf_find_controller.js
Outdated
this.onUpdateResultsCount(matchesCount); | ||
this._eventBus.dispatch('updatefindbarresultscount', { | ||
source: this, | ||
matchesCount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: You could just do matchesCount: this._requestMatchesCount(),
and get rid of the temporary variable now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the new commit series.
web/pdf_find_controller.js
Outdated
const matchesCount = this._requestMatchesCount(); | ||
this.onUpdateState(state, previous, matchesCount); | ||
this._eventBus.dispatch('updatefindbarstate', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, how about using the name updatefindcontrolstate
instead here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the new commit series.
web/app.js
Outdated
@@ -1976,6 +1963,30 @@ function webViewerFindFromUrlHash(evt) { | |||
}); | |||
} | |||
|
|||
function webViewerUpdateFindBarResultsCount(evt) { | |||
const matchesCount = evt.matchesCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this variable, since the function definition could be written as:
function webViewerUpdateFindMatchesCount({ matchesCount, }) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the new commit series.
web/app.js
Outdated
} | ||
|
||
function webViewerUpdateFindBarState(evt) { | ||
const { state, previous, matchesCount, } = evt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this variable, since the function definition could be written as:
function webViewerUpdateFindControlState({ state, previous, matchesCount, }) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the new commit series.
@@ -25,8 +25,9 @@ | |||
"network_utils_spec.js", | |||
"node_stream_spec.js", | |||
"parser_spec.js", | |||
"pdf_find_utils.js", | |||
"pdf_history.js", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, thanks for fixing this :-)
import { getDocument } from '../../src/display/api'; | ||
import { PDFFindController } from '../../web/pdf_find_controller'; | ||
|
||
class MockLinkService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unfortunate that the SimpleLinkService
cannot be used instead, but I suppose extending that one with the below functionality wouldn't really be a great idea in general (which I'm assuming is why you went with using a mock here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, that's why I chose a mock here. I initially used the SimpleLinkService
class, but quickly realized that it was a bit too simple for the testing purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more tiny nit here: How about importing SimpleLinkService
above and do class MockLinkService extends SimpleLinkService
here, since that will first of all guarantee that all of the expected methods are present and second of all will also help "document" the MockLinkService
w.r.t. the expected interface!?
pdfFindController.executeCommand('find', { query: 'Dynamic', }); | ||
|
||
const totalPages = 14; | ||
const totalMatches = 33; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please calculate this from the matchesPerPage
array instead, to avoid accidentally introducing inconsistent state (especially since this pattern will probably be re-used for additional tests)!?
To easily find the sum you could use e.g. Array.prototype.reduce()
like this:
const totalMatches = matchesPerPage.reduce((a, b) => { return a + b; });
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the new commit, but split over multiple lines because ESLint complained about it being on one line. I also changed totalPages
to be the length of the matchesPerPage
array instead of a hardcoded constant.
web/pdf_find_controller.js
Outdated
if (page.textLayer) { | ||
page.textLayer.updateMatches(); | ||
} | ||
this._eventBus.dispatch('updatematches', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not hurt to use a slightly more specific name here, so how about e.g. updatetextlayermatches
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the new commit series.
examples/components/simpleviewer.js
Outdated
@@ -46,7 +46,7 @@ pdfLinkService.setViewer(pdfViewer); | |||
|
|||
// (Optionally) enable find controller. | |||
var pdfFindController = new pdfjsViewer.PDFFindController({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, isn't searching now broken in all of the components
examples, since you're missing a pdfFindController.setDocument(pdfDocument);
line below (i.e. after the pdfLinkService.setDocument(pdfDocument, null);
line).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh! I forgot that there is no search query by default, so I totally missed this. I fixed this in the new commit series and verified that it works with a query now.
Please note that while I've not tried to implement it, the "only" thing preventing that issue from being solved is that there's (currently) no way of knowing, i.e. no Edit: As expected, implementing this is actually pretty easy; see https://gist.github.com/Snuffleupagus/d0167df8f612d9bef50561dd4bada215 and master...Snuffleupagus:findbarclose |
40bc8ab
to
40a6d17
Compare
Thank you for the helpful review comments! I have addressed them in the new commit series. |
/botio unittest |
/botio-linux preview |
web/text_layer_builder.js
Outdated
@@ -56,6 +56,14 @@ class TextLayerBuilder { | |||
this.textLayerRenderTask = null; | |||
this.enhanceTextSelection = enhanceTextSelection; | |||
|
|||
// Listen for events about updating find matches for this page. | |||
this.eventBus.on('updatetextlayermatches', (evt) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just realized that this PR, and my recent find highlight one, both suffer from the same very serious bug. The EventBus
listeners that are bound in this way are never actually being removed, causing the EventBus._listeners['updatetextlayermatches'] = [...]
array to grow without bounds as new TextLayerBuilder
instances are created :-(
Simply put, just because the this.textLayer
property on PDFPageView
is cleared here that does not imply that the TextLayerBuilder
instance actually goes away unfortunately.
Since it's less than fifteen minutes since I found this problem, I'm still not sure if there exists a generally viable solution here (i.e. one not requiring ugly hacks). Also, it's now becoming clear to me why the current implementation (i.e. the one in master
) looks the way it does, given these recently discovered problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use this.eventBus.off
in
pdf.js/web/text_layer_builder.js
Lines 125 to 130 in 40a6d17
cancel() { | |
if (this.textLayerRenderTask) { | |
this.textLayerRenderTask.cancel(); | |
this.textLayerRenderTask = null; | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did considered that, but quite frankly it's really not at all a good idea since cancelling the renderTextLayer
task itself doesn't necessarily mean that you'll also want to subsequently "destroy" the TextLayerBuilder
instance too; in particular consider what would happen with multiple TextLayerBuilder.render
calls with such a change:
pdf.js/web/text_layer_builder.js
Lines 87 to 91 in 54d6c24
render(timeout = 0) { | |
if (!(this.textContent || this.textContentStream) || this.renderingDone) { | |
return; | |
} | |
this.cancel(); |
The best idea I'm able to come up with on short notice, is the first commit that's now included in PR #10100. Please note that I'm not saying that it's a great solution, but it seems to work reasonably well and having an event fire when rendering is cancelled could be generally useful too (as a complement to 'pagerendered').
Pull request mozilla#10100 removed the last usage of the find controller from the find bar, so we can drop the dependency now.
…page This makes use of the event bus instead of requiring the PDF viewer instance to get the page view for a page and calling `updateMatches` on it.
Now it follows the same pattern as e.g., the document properties component, which allows us to have one instance of the find controller and set a new document to search upon switching documents. Moreover, this allows us to get rid of the dependency on `pdfViewer` in order to fetch the text content for a page. This is working towards getting rid of the `pdfViewer` dependency upon initializing the component entirely in future commits. Finally, we make the `reset` method private since it's not supposed to be used from the outside anymore now that `setDocument` takes care of this, similar to other components.
This removes the dependency on a `PDFViewer` instance from the find controller, which makes it more similar to other components and makes it easier to unit test with a mock link service. Finally, we remove the search capabilities from the SVG example since it doesn't work there because there is no separate text layer.
…event bus This makes it more similar to how other components update the viewer UI and avoids the need to have extra member variables and checks.
With `PDFFindController` instances no longer (directly) depending on `BaseViewer` instances, we can pass a single `findController` when initializing a viewer, similar to other components.
40a6d17
to
8e0418f
Compare
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/a49caf4d1159b56/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/a49caf4d1159b56/output.txt Total script time: 2.93 mins Published |
/botio unittest |
From: Bot.io (Linux m4)ReceivedCommand cmd_unittest from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/6edff0464f88c4a/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_unittest from @timvandermeij received. Current queue size: 0 Live output at: http://54.215.176.217:8877/9037a5e824b5120/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/6edff0464f88c4a/output.txt Total script time: 3.80 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/9037a5e824b5120/output.txt Total script time: 5.29 mins
|
I have rebased the commit series onto the current master and built the commits on top of the work from #10100. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, and seem to work just fine; thank you!
import { getDocument } from '../../src/display/api'; | ||
import { PDFFindController } from '../../web/pdf_find_controller'; | ||
|
||
class MockLinkService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more tiny nit here: How about importing SimpleLinkService
above and do class MockLinkService extends SimpleLinkService
here, since that will first of all guarantee that all of the expected methods are present and second of all will also help "document" the MockLinkService
w.r.t. the expected interface!?
This commit shows that we can now unit test the find controller and that executing regular queries works. Note that this is only a first step and not a complete suite of unit tests for all possible options of the find controller. While writing this unit test, I found two smaller issues that I addressed directly. The first one is that in the previous find controller refactoring I forgot to rename some occurrences of a now private member variable. Fortunately this did not cause any bugs since we did have a public getter and the fetched value may be changed by reference, but it's nevertheless good to fix. The second issue is that some entries in the `test/unit/clitests.json` file were not correct, resulting in these tests not being executed on e.g., Travis CI.
8e0418f
to
1b40299
Compare
…lder` This implements the nice suggestion from mozilla#10099 (comment), which I at the time didn't think would work. I'll probably have to plead temporary insanity here, since it *should* have been totally obvious to me how this could be implemented. By simply not registering the event until the `textLayer` is actually rendered, removing the event on `cancel` works just fine. This patch also removes the `pagecancelled` event, given that it's no longer used anywhere in the code-base and that its implemention was flawed since it wasn't guaranteed to be called in *every* situation where rendering was actually cancelled.
…lder` This implements the nice suggestion from mozilla#10099 (comment), which I at the time didn't think would work. I'll probably have to plead temporary insanity here, since it *should* have been totally obvious to me how this could be implemented. By simply not registering the event until the `textLayer` is actually rendered, removing the event on `cancel` works just fine. This patch also removes the `pagecancelled` event, given that it's no longer used anywhere in the code-base and that its implemention was flawed since it wasn't guaranteed to be called in *every* situation where rendering was actually cancelled.
…lder` This implements the nice suggestion from mozilla#10099 (comment), which I at the time didn't think would work. I'll probably have to plead temporary insanity here, since it *should* have been totally obvious to me how this could be implemented. By simply not registering the event until the `textLayer` is actually rendered, removing the event on `cancel` works just fine. This patch also removes the `pagecancelled` event, given that it's no longer used anywhere in the code-base and that its implemention was flawed since it wasn't guaranteed to be called in *every* situation where rendering was actually cancelled.
…lder` This implements the nice suggestion from mozilla#10099 (comment), which I at the time didn't think would work. I'll probably have to plead temporary insanity here, since it *should* have been totally obvious to me how this could be implemented. By simply not registering the event until the `textLayer` is actually rendered, removing the event on `cancel` works just fine. This patch also removes the `pagecancelled` event, given that it's no longer used anywhere in the code-base and that its implemention was flawed since it wasn't guaranteed to be called in *every* situation where rendering was actually cancelled.
This commit series works towards a resolution for #7356 by removing the dependency on
PDFViewer
(using the event bus) and writing a basic unit test for the find controller to have a framework to build more unit tests upon.I think #7356 can be closed once more unit tests are written after this commit series lands, in particular unit tests that exercise the different options and commands for the find controller such as phrase search, case sensitive search, whole words search, et cetera.