Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api-major] Rework the find controller for unit testing #10099

Merged
merged 7 commits into from
Sep 30, 2018

Conversation

timvandermeij
Copy link
Contributor

@timvandermeij timvandermeij commented Sep 21, 2018

This commit series works towards a resolution for #7356 by removing the dependency on PDFViewer (using the event bus) and writing a basic unit test for the find controller to have a framework to build more unit tests upon.

I think #7356 can be closed once more unit tests are written after this commit series lands, in particular unit tests that exercise the different options and commands for the find controller such as phrase search, case sensitive search, whole words search, et cetera.

@timvandermeij timvandermeij changed the title Rework the find controller to allow writing unit tests for it Rework the find controller for unit testing Sep 21, 2018
@timvandermeij
Copy link
Contributor Author

/botio unittest

Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the how you're making use of the EventBus to break apart a number of the existing component dependencies in these patches :-)

While I'm liking the overall approach, please note that I've not yet had time to really test this. However, I've left a couple of initial comments based on a cursory look at the code. Thank you for doing this refactoring!


Unrelated to the rest of the PR, but something that I just happened to notice: Should this._linkService.pagesCount be used instead at

const numPages = this._extractTextPromises.length;

@@ -312,7 +333,14 @@ class PDFFindController {
this._extractTextPromises[i] = extractTextCapability.promise;

promise = promise.then(() => {
return this._pdfViewer.getPageTextContent(i).then((textContent) => {
const textContentPromise = this._pdfDocument.getPage(i + 1).then(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the temporary variable here, since it's just as easy to do e.g.

promise = promise.then(() => {
  return this._pdfDocument.getPage(i + 1).then((pdfPage) => {
    return pdfPage.getTextContent({
      normalizeWhitespace: true,
    });
  }).then((textContent) => {
    const textItems = textContent.items;
    ...

Furthermore, with the only caller of BaseViewer.getPageTextContent now removed you can probably remove the method itself as well; i.e. this code:

pdf.js/web/base_viewer.js

Lines 916 to 923 in 54d6c24

getPageTextContent(pageIndex) {
return this.pdfDocument.getPage(pageIndex + 1).then(function(page) {
return page.getTextContent({
normalizeWhitespace: true,
});
});
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the new commit series, together with the numPages comment above.

web/app.js Outdated
@@ -343,7 +343,7 @@ let PDFViewerApplication = {
pdfLinkService.setHistory(this.pdfHistory);

this.findController = new PDFFindController({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly necessary here, but I'd like to know what you think:
With PDFFindController instances no longer (directly) depending on BaseViewer instances, I wonder if it would make sense to move this initialization such that a findController option could be passed in when initializing PDFViewer above, since this method could then be removed:

pdf.js/web/base_viewer.js

Lines 966 to 969 in 54d6c24

setFindController(findController) {
this.findController = findController;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the new commit series (in a separate commit) since I think it's a good idea and it's more consistent with how other components are passed in.

const matchesCount = this._requestMatchesCount();
this.onUpdateResultsCount(matchesCount);
this._eventBus.dispatch('updatefindbarresultscount', {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is bikeshedding, but I do find some of the existing method names somewhat strange/inconsistent (e.g. _updateUIResultsCount with the current position now being included too) and do slightly prefer the ones used in the ExternalServices and the Firefox integration (having recently spent some time working with those).

With that in mind, how about using the name updatefindmatchescount instead here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the new commit series.

this.onUpdateResultsCount(matchesCount);
this._eventBus.dispatch('updatefindbarresultscount', {
source: this,
matchesCount,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You could just do matchesCount: this._requestMatchesCount(), and get rid of the temporary variable now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the new commit series.

const matchesCount = this._requestMatchesCount();
this.onUpdateState(state, previous, matchesCount);
this._eventBus.dispatch('updatefindbarstate', {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, how about using the name updatefindcontrolstate instead here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the new commit series.

web/app.js Outdated
@@ -1976,6 +1963,30 @@ function webViewerFindFromUrlHash(evt) {
});
}

function webViewerUpdateFindBarResultsCount(evt) {
const matchesCount = evt.matchesCount;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this variable, since the function definition could be written as:

function webViewerUpdateFindMatchesCount({ matchesCount, }) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the new commit series.

web/app.js Outdated
}

function webViewerUpdateFindBarState(evt) {
const { state, previous, matchesCount, } = evt;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this variable, since the function definition could be written as:

function webViewerUpdateFindControlState({ state, previous, matchesCount, }) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the new commit series.

@@ -25,8 +25,9 @@
"network_utils_spec.js",
"node_stream_spec.js",
"parser_spec.js",
"pdf_find_utils.js",
"pdf_history.js",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, thanks for fixing this :-)

import { getDocument } from '../../src/display/api';
import { PDFFindController } from '../../web/pdf_find_controller';

class MockLinkService {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unfortunate that the SimpleLinkService cannot be used instead, but I suppose extending that one with the below functionality wouldn't really be a great idea in general (which I'm assuming is why you went with using a mock here).

Copy link
Contributor Author

@timvandermeij timvandermeij Sep 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that's why I chose a mock here. I initially used the SimpleLinkService class, but quickly realized that it was a bit too simple for the testing purposes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more tiny nit here: How about importing SimpleLinkService above and do class MockLinkService extends SimpleLinkService here, since that will first of all guarantee that all of the expected methods are present and second of all will also help "document" the MockLinkService w.r.t. the expected interface!?

pdfFindController.executeCommand('find', { query: 'Dynamic', });

const totalPages = 14;
const totalMatches = 33;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please calculate this from the matchesPerPage array instead, to avoid accidentally introducing inconsistent state (especially since this pattern will probably be re-used for additional tests)!?

To easily find the sum you could use e.g. Array.prototype.reduce() like this:

const totalMatches = matchesPerPage.reduce((a, b) => { return a + b; });

Copy link
Contributor Author

@timvandermeij timvandermeij Sep 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the new commit, but split over multiple lines because ESLint complained about it being on one line. I also changed totalPages to be the length of the matchesPerPage array instead of a hardcoded constant.

if (page.textLayer) {
page.textLayer.updateMatches();
}
this._eventBus.dispatch('updatematches', {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not hurt to use a slightly more specific name here, so how about e.g. updatetextlayermatches instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the new commit series.

@@ -46,7 +46,7 @@ pdfLinkService.setViewer(pdfViewer);

// (Optionally) enable find controller.
var pdfFindController = new pdfjsViewer.PDFFindController({
Copy link
Collaborator

@Snuffleupagus Snuffleupagus Sep 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, isn't searching now broken in all of the components examples, since you're missing a pdfFindController.setDocument(pdfDocument); line below (i.e. after the pdfLinkService.setDocument(pdfDocument, null); line).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh! I forgot that there is no search query by default, so I totally missed this. I fixed this in the new commit series and verified that it works with a query now.

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Sep 22, 2018

it should be implemented properly as tracked in issue #7468 and not partially and inconsistently as in the current state.

Please note that while I've not tried to implement it, the "only" thing preventing that issue from being solved is that there's (currently) no way of knowing, i.e. no findbarclose event fired, when the native Firefox findbar is closed (which thus prevents a consistent implementation).

Edit: As expected, implementing this is actually pretty easy; see https://gist.github.com/Snuffleupagus/d0167df8f612d9bef50561dd4bada215 and master...Snuffleupagus:findbarclose
The only possible caveat would be convincing a mozilla-central reviewer that adding a new findbarclose event to the browser findbar is necessary.

@timvandermeij timvandermeij changed the title Rework the find controller for unit testing [api-major] Rework the find controller for unit testing Sep 22, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 22, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 22, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 22, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 22, 2018
@timvandermeij
Copy link
Contributor Author

Thank you for the helpful review comments! I have addressed them in the new commit series.

@timvandermeij
Copy link
Contributor Author

/botio unittest

@timvandermeij
Copy link
Contributor Author

/botio-linux preview

@@ -56,6 +56,14 @@ class TextLayerBuilder {
this.textLayerRenderTask = null;
this.enhanceTextSelection = enhanceTextSelection;

// Listen for events about updating find matches for this page.
this.eventBus.on('updatetextlayermatches', (evt) => {
Copy link
Collaborator

@Snuffleupagus Snuffleupagus Sep 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just realized that this PR, and my recent find highlight one, both suffer from the same very serious bug. The EventBus listeners that are bound in this way are never actually being removed, causing the EventBus._listeners['updatetextlayermatches'] = [...] array to grow without bounds as new TextLayerBuilder instances are created :-(

Simply put, just because the this.textLayer property on PDFPageView is cleared here that does not imply that the TextLayerBuilder instance actually goes away unfortunately.

Since it's less than fifteen minutes since I found this problem, I'm still not sure if there exists a generally viable solution here (i.e. one not requiring ugly hacks). Also, it's now becoming clear to me why the current implementation (i.e. the one in master) looks the way it does, given these recently discovered problems.

Copy link
Contributor Author

@timvandermeij timvandermeij Sep 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use this.eventBus.off in

cancel() {
if (this.textLayerRenderTask) {
this.textLayerRenderTask.cancel();
this.textLayerRenderTask = null;
}
}
to remove the listener once rendering is being cancelled, or would that not be enough?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did considered that, but quite frankly it's really not at all a good idea since cancelling the renderTextLayer task itself doesn't necessarily mean that you'll also want to subsequently "destroy" the TextLayerBuilder instance too; in particular consider what would happen with multiple TextLayerBuilder.render calls with such a change:

render(timeout = 0) {
if (!(this.textContent || this.textContentStream) || this.renderingDone) {
return;
}
this.cancel();

The best idea I'm able to come up with on short notice, is the first commit that's now included in PR #10100. Please note that I'm not saying that it's a great solution, but it seems to work reasonably well and having an event fire when rendering is cancelled could be generally useful too (as a complement to 'pagerendered').

Pull request mozilla#10100 removed the last usage of the find controller from
the find bar, so we can drop the dependency now.
…page

This makes use of the event bus instead of requiring the PDF viewer
instance to get the page view for a page and calling `updateMatches` on
it.
Now it follows the same pattern as e.g., the document properties
component, which allows us to have one instance of the find controller
and set a new document to search upon switching documents.

Moreover, this allows us to get rid of the dependency on `pdfViewer` in
order to fetch the text content for a page. This is working towards
getting rid of the `pdfViewer` dependency upon initializing the
component entirely in future commits.

Finally, we make the `reset` method private since it's not supposed to
be used from the outside anymore now that `setDocument` takes care of
this, similar to other components.
This removes the dependency on a `PDFViewer` instance from the find
controller, which makes it more similar to other components and makes it
easier to unit test with a mock link service.

Finally, we remove the search capabilities from the SVG example since it
doesn't work there because there is no separate text layer.
…event bus

This makes it more similar to how other components update the viewer UI
and avoids the need to have extra member variables and checks.
With `PDFFindController` instances no longer (directly) depending on
`BaseViewer` instances, we can pass a single `findController` when
initializing a viewer, similar to other components.
@timvandermeij
Copy link
Contributor Author

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/a49caf4d1159b56/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/a49caf4d1159b56/output.txt

Total script time: 2.93 mins

Published

@mozilla mozilla deleted a comment from pdfjsbot Sep 30, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 30, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 30, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 30, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 30, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 30, 2018
@timvandermeij
Copy link
Contributor Author

/botio unittest

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_unittest from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/6edff0464f88c4a/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_unittest from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/9037a5e824b5120/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/6edff0464f88c4a/output.txt

Total script time: 3.80 mins

  • Unit Tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/9037a5e824b5120/output.txt

Total script time: 5.29 mins

  • Unit Tests: Passed

@timvandermeij
Copy link
Contributor Author

I have rebased the commit series onto the current master and built the commits on top of the work from #10100.

Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, and seem to work just fine; thank you!

import { getDocument } from '../../src/display/api';
import { PDFFindController } from '../../web/pdf_find_controller';

class MockLinkService {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more tiny nit here: How about importing SimpleLinkService above and do class MockLinkService extends SimpleLinkService here, since that will first of all guarantee that all of the expected methods are present and second of all will also help "document" the MockLinkService w.r.t. the expected interface!?

This commit shows that we can now unit test the find controller and
that executing regular queries works. Note that this is only a first
step and not a complete suite of unit tests for all possible options
of the find controller.

While writing this unit test, I found two smaller issues that I
addressed directly. The first one is that in the previous find
controller refactoring I forgot to rename some occurrences of a now
private member variable. Fortunately this did not cause any bugs since
we did have a public getter and the fetched value may be changed by
reference, but it's nevertheless good to fix. The second issue is that
some entries in the `test/unit/clitests.json` file were not correct,
resulting in these tests not being executed on e.g., Travis CI.
@timvandermeij timvandermeij merged commit ec10cae into mozilla:master Sep 30, 2018
@timvandermeij timvandermeij deleted the find-controller branch September 30, 2018 16:36
Snuffleupagus added a commit to Snuffleupagus/pdf.js that referenced this pull request Feb 15, 2019
…lder`

This implements the nice suggestion from mozilla#10099 (comment), which I at the time didn't think would work.
I'll probably have to plead temporary insanity here, since it *should* have been totally obvious to me how this could be implemented. By simply not registering the event until the `textLayer` is actually rendered, removing the event on `cancel` works just fine.

This patch also removes the `pagecancelled` event, given that it's no longer used anywhere in the code-base and that its implemention was flawed since it wasn't guaranteed to be called in *every* situation where rendering was actually cancelled.
Snuffleupagus added a commit to Snuffleupagus/pdf.js that referenced this pull request Feb 15, 2019
…lder`

This implements the nice suggestion from mozilla#10099 (comment), which I at the time didn't think would work.
I'll probably have to plead temporary insanity here, since it *should* have been totally obvious to me how this could be implemented. By simply not registering the event until the `textLayer` is actually rendered, removing the event on `cancel` works just fine.

This patch also removes the `pagecancelled` event, given that it's no longer used anywhere in the code-base and that its implemention was flawed since it wasn't guaranteed to be called in *every* situation where rendering was actually cancelled.
Snuffleupagus added a commit to Snuffleupagus/pdf.js that referenced this pull request Feb 16, 2019
…lder`

This implements the nice suggestion from mozilla#10099 (comment), which I at the time didn't think would work.
I'll probably have to plead temporary insanity here, since it *should* have been totally obvious to me how this could be implemented. By simply not registering the event until the `textLayer` is actually rendered, removing the event on `cancel` works just fine.

This patch also removes the `pagecancelled` event, given that it's no longer used anywhere in the code-base and that its implemention was flawed since it wasn't guaranteed to be called in *every* situation where rendering was actually cancelled.
lpillonel pushed a commit to lpillonel/pdf.js that referenced this pull request Feb 22, 2019
…lder`

This implements the nice suggestion from mozilla#10099 (comment), which I at the time didn't think would work.
I'll probably have to plead temporary insanity here, since it *should* have been totally obvious to me how this could be implemented. By simply not registering the event until the `textLayer` is actually rendered, removing the event on `cancel` works just fine.

This patch also removes the `pagecancelled` event, given that it's no longer used anywhere in the code-base and that its implemention was flawed since it wasn't guaranteed to be called in *every* situation where rendering was actually cancelled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants