Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Page Navigation Timeout Handling #70

Merged
merged 40 commits into from
Aug 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
5804440
Returning Partial Data After Timeout
bachdumpling Jul 21, 2023
dd27755
Possible fix for timeout 1
bachdumpling Jul 27, 2023
e511c3c
Possible fix for timeout 2
bachdumpling Jul 27, 2023
f6499e9
Possible fix for timeout 2 with extra case handling
bachdumpling Jul 28, 2023
90351a7
fix Nodejs package
bachdumpling Jul 28, 2023
cb0158d
first page is loaded before loading other pages with the timeout guard
bachdumpling Aug 1, 2023
d3077a8
If the first page fails to load, try again with waitUntil: domcontent…
bachdumpling Aug 1, 2023
8f9ce64
Fixes an issue with parsing URL query params
bachdumpling Aug 2, 2023
e5fd552
Update code to main's version
bachdumpling Aug 2, 2023
68d39b9
Merge remote-tracking branch 'origin/main' into fix-timeout
bachdumpling Aug 2, 2023
7b2ef45
Revert last commit
bachdumpling Aug 2, 2023
a481bb7
Merge branch 'main' of github.com:the-markup/blacklight-collector int…
dphiffer Aug 10, 2023
78118de
add more console.logs
dphiffer Aug 10, 2023
c44664b
Fix count in fillForms. Allow it to increment
bachdumpling Aug 10, 2023
d7984e1
Move count+=1 to avoid recounting when continue
bachdumpling Aug 10, 2023
10c366e
Set timeouts for fillForms
bachdumpling Aug 14, 2023
c07c479
refactor fillForms timeout
bachdumpling Aug 14, 2023
463969d
Improved fillForms error handling to prevent unexpected browser closu…
bachdumpling Aug 14, 2023
eca74c2
add tests back to the repo
bachdumpling Aug 14, 2023
21979bf
Comment out console logs
bachdumpling Aug 15, 2023
8de412a
revert example.ts
bachdumpling Aug 15, 2023
e03fd44
restore DEFAULT_INPUT_VALUES
dphiffer Aug 15, 2023
0470c0e
use navigateWithTimeout function for first and subsequent requests
dphiffer Aug 15, 2023
b665f1f
restore test-data
dphiffer Aug 15, 2023
56730c9
restore js-instrument test page
dphiffer Aug 15, 2023
f6e1009
comment out console.logs
dphiffer Aug 15, 2023
13ad69d
shorter protocol error message
dphiffer Aug 16, 2023
16a3a83
make example.ts slightly more flexible
dphiffer Aug 16, 2023
adb6e39
use promise.all on page interactions, isInteracting -> isDone
dphiffer Aug 16, 2023
f283d9b
don't log in the timeout
dphiffer Aug 16, 2023
be7ee9e
make example.ts default to headless
dphiffer Aug 18, 2023
106e404
don't rely on page.goto to complete
dphiffer Aug 18, 2023
53c4149
use 30sec threshold
dphiffer Aug 18, 2023
9d8776d
try/catch loading additional links
dphiffer Aug 18, 2023
2c8a816
Revert "try/catch loading additional links"
dphiffer Aug 18, 2023
6011ee6
Revert "use 30sec threshold"
dphiffer Aug 18, 2023
44db316
Revert "don't rely on page.goto to complete"
dphiffer Aug 18, 2023
3ab395f
Revert "make example.ts default to headless"
dphiffer Aug 18, 2023
09fdff4
Revert "don't log in the timeout"
dphiffer Aug 18, 2023
620d324
Revert "use promise.all on page interactions, isInteracting -> isDone"
dphiffer Aug 18, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,15 @@ build
**/*/*.html
**/*/*.jpeg
**/*/*.zip
!__tests__/test-data/britishmuseum/*
!__tests__/test-data/canvas-fingerprinting/*
!__tests__/test-data/collector/*
!__tests__/test-data/fingerprintjs/*
!__tests__/test-data/propublica.org-no-browser-cookies/*
!__tests__/test-data/propublica.org/*
!__tests__/test-data/kohls-new.com/*
!__tests__/test-data/veteransunited/*
!__tests__/test-data/veteransunitedsession/*
!__tests__/test-data/veteransunited-1.0.3/*
!__tests__/test-pages/*
!__tests__/test-pages/js-instrument/*
12 changes: 6 additions & 6 deletions example.ts
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
import { KnownDevices } from "puppeteer";
import { CollectorOptions, collect } from "./src";
import { KnownDevices } from 'puppeteer';
import { CollectorOptions, collect } from './src';
import { join } from 'path';

(async () => {
const URL = 'example.com';
const URL = process.argv.length > 2 ? process.argv[2] : 'example.com';
const EMULATE_DEVICE = 'iPhone 13 Mini';

const config: CollectorOptions = {
numPages: 3,
numPages: 1,
headless: false,
emulateDevice: KnownDevices[EMULATE_DEVICE],
// Uncomment to run with desktop/laptop browser
// emulateDevice: {
// viewport: {height: 1440, width: 800},
// viewport: {height: 1440, width: 800},
// userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
// },
outDir: join(__dirname, 'demo-dir'),
outDir: join(__dirname, 'demo-dir')
};

console.log(`Beginning scan of ${URL}`);
Expand Down
Loading