-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for lazy loaded elements when using the fullPage
option
#40
Conversation
…ugh Promises~ subt., and declaring window as global would probably work
Just saw the other issue ... #1 ... Pending some fixes then (for endless scrollers and also expanding horizontally) |
This await wait(100); is not good. You should wait actually for images and other snippets to load, not giving them 100ms... |
@Vasile-Peste |
Bump :) |
Thanks for the followup, I haven't had time in order to investigate further and commit the necessary changes. If it may be of interest, I'm also working on a little app of my own of similar nature. click me |
@sindresorhus |
Do you suppose this would be a cheap or actual solution? I don't think this has been implemented yet. As well as this:
From: #1 (comment) |
Yes, those situations needs to be handled too. |
That sounds like an ok solution. |
Sure |
index.js
Outdated
@@ -7,6 +7,7 @@ const puppeteer = require('puppeteer'); | |||
const devices = require('puppeteer/DeviceDescriptors'); | |||
const toughCookie = require('tough-cookie'); | |||
|
|||
const sleep = promisify(setTimeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const sleep = promisify(setTimeout); | |
const delay = promisify(setTimeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing delay name change in practice (use) as well. Going to commit standalone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: i decided to remove this in order to add waitForNavigation
instead. Let me know when we can run some tests to see if it is working adequately.
}); | ||
|
||
// Some extra delay to let images load | ||
await page.waitForFunction(imagesHaveLoaded, {timeout: 60}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the user would expect it to take up to one minute. If we keep this, it needs to be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might also be added as an argument with default value set to timeout 60. Might want to comment on this or implement some other better functionality? Maybe it's also worth looking into alternate functions, such as the waitForNavigation
Vasile mentioned earlier.
This
await wait(100);is not good.
You should wait actually for images and other snippets to load, not giving them 100ms...
The waitForNavigation method waiting for a network idle should be used instead.
In case it might be useful for documentation purposes; I retrieved part of the solution from https://stackoverflow.com/questions/51651830/how-to-wait-for-all-images-to-load-from-page-evaluate-function-in-puppeteer-when
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would look into something better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using page.waitForNavigation
as per pointed by #40 (review)
fullPage
option
fullPage
optionfullPage
option
This will need some automated tests. |
Co-Authored-By: Sindre Sorhus <[email protected]>
viewportIncr to elongated viewportIncrement Co-Authored-By: Sindre Sorhus <[email protected]>
I found two websites with appropriate content which could be used as guidelines for the test cases:
Perhaps it would be wise to contact the webmasters of said pages and see if they can be used for the test scenarios, or if it would be wise to use a static html inside the capture-website repo distribution, or some other sort of scenario. For the sake of me I couldn't find a legally scrapable website like http://testing-ground.scraping.pro/ for this purpose. P.S.: Should there be some brainstorming spare time, should probably also reconsider the whole eslint-disable/eslint-enable transaction for no-await-in-loop and such. Hoping to hear back from you. Thanks. Edit: also please let me know if you have any clue why the last build failed to run under Node v8. |
It would be better to have something locally that is not affected by network conditions or whether the website in question is up. I think you can just put together a minimal HTML page that lazy loads some images from Unsplash. Lots of examples out there for that.
Not sure I understand?
Just ignore Node.js 8. I plan to drop support for it. |
The eslint part:
I will look into the HTML making and upload that briefly. Hope everything works out, thanks. |
index.js
Outdated
const viewportHeight = viewportOptions.height; | ||
let viewportIncrement = 0; | ||
while (viewportIncrement + viewportHeight < bodyBoundingHeight) { | ||
const navigationPromise = page.waitForNavigation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this line wait for networkidle
instead of the default loaded
? I assume this promise will be awaited to let the images load, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably agree with you, but care to clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably agree with you, but care to clarify?
Waiting for network idle allows lazy elements to load (elements loaded after the "load" event of the document). I suggest doing as follows:
- Wait "load" event of the document.
- Scroll the page to its maximum Y (this will load lazy elements like lazy loaded images).
- Wait for network idle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also worth to keep in mind that some (old) websites can actually never fulfill the requirements for the network idle navigation event - for example frequent chat requests.
In the end, the propsed flow above should work just fine (most cases).
* Fix setting an expiring cookie Fixes sindresorhus/capture-website-cli#21 * 0.8.1 Co-authored-by: Sindre Sorhus <[email protected]>
test.js
Outdated
for(let i=0;i<numItemsToGenerate;i++){ | ||
let randomImageIndex = Math.floor(Math.random() * numImagesAvailable); | ||
renderGalleryItem(randomImageIndex); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clean up the code style here? It's a bit messy. Use async/await, template literals, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also move the string to the top-level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Will learn a few things here and there and proceed as such. Thanks for the feedback
package.json
Outdated
@@ -1,6 +1,6 @@ | |||
{ | |||
"name": "capture-website", | |||
"version": "0.8.0", | |||
"version": "0.8.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't bump this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno why that'd get bumped. Gotcha.
Bump |
Seems like pretty much already complete. All that is pending for me to check is #40 (comment) ... I will look into that and deliver a final result. |
I looked into what you say, and since the necessary solution for testing isn't even that complex, I came up with two approaches:
server.get('/', async (request, response) => {
response.end(`
<body>
<div style="display: grid; grid-template-columns: repeat( auto-fill, minmax(150px, 1fr) );" id="grid">
<img src="https://picsum.photos/150/150?random=1" loading="lazy" />
<img src="https://picsum.photos/150/150?random=2" loading="lazy" />
<img src="https://picsum.photos/150/150?random=3" loading="lazy" />
<img src="https://picsum.photos/150/150?random=4" loading="lazy" />
<img src="https://picsum.photos/150/150?random=5" loading="lazy" />
<img src="https://picsum.photos/150/150?random=6" loading="lazy" />
<img src="https://picsum.photos/150/150?random=7" loading="lazy" />
<img src="https://picsum.photos/150/150?random=8" loading="lazy" />
<img src="https://picsum.photos/150/150?random=9" loading="lazy" />
<img src="https://picsum.photos/150/150?random=10" loading="lazy" />
<img src="https://picsum.photos/150/150?random=11" loading="lazy" />
<img src="https://picsum.photos/150/150?random=12" loading="lazy" />
<img src="https://picsum.photos/150/150?random=13" loading="lazy" />
<img src="https://picsum.photos/150/150?random=14" loading="lazy" />
<img src="https://picsum.photos/150/150?random=15" loading="lazy" />
</div>
</body>
`);
const imageUrl = 'https://picsum.photos/150/150';
const imageHolder = document.getElementById('grid');
const renderGallery = async _ => {
let galleryItem;
for(let i=0;i<250;i++){
galleryItem = document.createElement('div');
galleryItem.innerHTML = '<img src="'+imageUrl+'?random='+i+'" loading="lazy" />';
imageHolder.appendChild(galleryItem);
}
}
renderGallery(); Oh yeah, I switched the API to picsum which is essentially still Unsplash, but it's less chunky and the cache doesn't overcomplicate things. Also pending: #40 (comment)
Missing these two things. Waiting for your input, then I'll proceed with those two. I have no idea how to deal with endless scrollers though, perhaps another waitForNavigation and catch a timeOut specified by default in your docs, could be up to 5-10 minutes, I guess, if the page keeps on scrolling at scrolling. This would be simpler than getting into the whole quantical formulation of connection speed * page content, etc. |
Seems like the simplest solution. You can just do a for loop to generate the |
I don't have any good suggestions on how to resolve those issues. |
Cool. Merge pulls, please. Let's see how I can go about claiming this bounty 👍 |
@netrules This seems to fail randomly on Travis: https://travis-ci.org/github/sindresorhus/capture-website/jobs/716595910 |
@sindresorhus Perhaps because we are running outside sources which are network expensive. Would you like to blob them inside a local storage cache, separated from the code, or maybe as part of the library, as template/skeleton files? Edit: maybe it's even possible to find a solution which generates random image blobs on the run, while inserting lazy elements into the fullPage. It's a bit outside of the scope of the current bug fix, but I can find a work around since it is the test I designed. Edit 2: https://stackoverflow.com/questions/53187941/why-is-puppeteer-failing-simple-tests-with-waiting-for-function-failed-timeou also found this. How do you recommend I go about testing? My own repo and a travis pipeline? Edit 3: this might be useful? master...netrules:patch-1 ... Thanks a lot for the bounty btw. |
Can you do a PR? That will run travis and check if it passes. |
Fixes #28
Based on the following walkthrough initially documented by a now deprecated ("Note: Service has been discontinued, this website remains for archival purposes") platform. Thanks to its wiki's author for documenting.
IssueHunt Summary
Referenced issues
This pull request has been submitted to:
fullPage
optionIssueHunt has been backed by the following sponsors. Become a sponsor