[Feature]: Support Firebase IndexedDB for automated login #32300

jslegers · 2024-08-23T15:01:45Z

🚀 Feature Request

1. Use case

I'm trying to extract all content I produced @ https://legacy.mage.space/u/johnslegers before disappears for good in less than 10 days.

Since downloading 14K images with corresponding prompts is quite insane, I figured I'd write a crawler for it instead.

2. Login issues

Since some of my content is set to private and can't be set to public, I wanted to log in with my user so I could scrape all of my content. Since I couldn't get the Google Auth to work in Chromium for some reason, I figured it was best to just inject my Firebase user after the page was loaded.

I found some JS code at #11164 that allowed me to achieve what I wanted after some modifications, but IMO this approach is way too hacky, and I'd expect out-of-the-box support for this in Playwright.

I'll probably create a demo repo of my finished project in the very near future after I cleaned up my code, but for the time being here's some snippets with code that allowed me to get me to correctly log in on Chromium.

3. Snippets

3.1 Dump Firebase use to `Json`

First, I extracted my Firebase user by copy-pasting the following script in the console of the website I'm trying to scrape :

// Source https://gist.github.com/Matt-Jensen/d7c52c51b2a2ac7af7e0f7f1c31ef31d

(() => {
    const asyncForEach = (array, callback, done) => {
        const runAndWait = i => {
            if (i === array.length) return done();
            return callback(array[i], () => runAndWait(i + 1));
        };
        return runAndWait(0);
    };

    const dump = {};
    const dbRequest = window.indexedDB.open("firebaseLocalStorageDb");
    dbRequest.onsuccess = () => {
        const db = dbRequest.result;
        const stores = ['firebaseLocalStorage'];

        const tx = db.transaction(stores);
        asyncForEach(
            stores,
            (store, next) => {
                const req = tx.objectStore(store).getAll();
                req.onsuccess = () => {
                    dump[store] = req.result;
                    next();
                };
            },
            () => {
                console.log(JSON.stringify(dump));
            }
        );
    };
})();

3.2 Adding the correct user info after first removing the wrong info

This is the Javascript code that's injected when the page is loaded in my request handler in __main__.py . For the time being it's just a test string in __main__.py, but it will be moved into a separate .js file with the code & a .json file with the login data.

It's an adapation of the code from the previous comment by from OVO-Josh.

// Adaptation of the code by OVO-Josh

(function adduser() {
    function insertUser(db, user) {
        const txn = db.transaction('firebaseLocalStorage', 'readwrite');
        const store = txn.objectStore('firebaseLocalStorage');
        store.delete(user.fbase_key);
        store.add(user);
        txn.oncomplete = function(ev) {
            db.close();
        };
        txn.onerror = function(ev) {
            console.error(ev.target.error.message)
            db.close();
        };
        return txn;
    }
    const request = window.indexedDB.open('firebaseLocalStorageDb');
    request.onfailure = function(ev) {
        console.error(ev.target.error.message);
    };
    request.onsuccess = function(ev) {
        const db = request.result;
        return insertUser(db, {
            "fbase_key": "firebase:authUser:_____:[DEFAULT]",
            "value": {
                "uid": _____,
                "email": _____,
                "emailVerified": true,
                "displayName": _____,
                "isAnonymous": false,
                "photoURL": _____,
                "providerData": [{
                        "providerId": "google.com",
                        "uid": _____,
                        "displayName": _____,
                        "email": _____,
                        "phoneNumber": null,
                        "photoURL": _____
                    },
                    {
                        "providerId": "password",
                        "uid": _____,
                        "displayName": _____,
                        "email": _____,
                        "phoneNumber": null,
                        "photoURL": _____
                    }
                ],
                "stsTokenManager": {
                    "refreshToken": _____,
                    "accessToken": _____,
                    "expirationTime": _____
                },
                "createdAt": _____,
                "lastLoginAt": _____,
                "apiKey": _____,
                "appName": "[DEFAULT]"
            }
        });
    };
    return request;
})()

3.2 Load page from Python

Here's my Python request handler in __main__.py, where I actually add the JS inject, after first waiting until the DOM has loaded :

async def request_handler(context: PlaywrightCrawlingContext) -> None:
    context.log.info(f"Processing {context.request.url} ...")
    page = context.page

    # Wait until the content I'm interested in is loaded
    await page.wait_for_selector(selector)
    # Update the users in the Firebase DB with the correct value
    # add_user is the above JS code that's injected
    await page.evaluate(add_user)

        _____

Example

async def request_handler(context: PlaywrightCrawlingContext) -> None:
    context.log.info(f"Processing {context.request.url} ...")
    page = context.page

    # Wait until the content I'm interested in is loaded
    await page.wait_for_selector(selector)
    # Update the user in the Firebase DB with the correct value
    await page.authenticate(firebase_user)

Motivation

Since I'm sure other users of Playright struggle with similar issues (see #11164), it makes sense for this behavior to be supported out-of-the box.

The text was updated successfully, but these errors were encountered:

yury-s · 2024-08-26T19:44:28Z

Folding into #11164

jslegers mentioned this issue Aug 23, 2024

[Feature] Support IndexedDB for shared auth use cases #11164

Open

jslegers changed the title ~~[Feature]: Support IndexedDB for shared auth use cases~~ [Feature]: Support IndexedDB automated login Aug 23, 2024

jslegers changed the title ~~[Feature]: Support IndexedDB automated login~~ [Feature]: Support Firebase IndexedDB for automated login Aug 23, 2024

mxschmitt transferred this issue from microsoft/playwright-python Aug 23, 2024

yury-s closed this as completed Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Support Firebase IndexedDB for automated login #32300

[Feature]: Support Firebase IndexedDB for automated login #32300

jslegers commented Aug 23, 2024 •

edited by yury-s

Loading

yury-s commented Aug 26, 2024

[Feature]: Support Firebase IndexedDB for automated login #32300

[Feature]: Support Firebase IndexedDB for automated login #32300

Comments

jslegers commented Aug 23, 2024 • edited by yury-s Loading

🚀 Feature Request

1. Use case

2. Login issues

3. Snippets

3.1 Dump Firebase use to Json

3.2 Adding the correct user info after first removing the wrong info

3.2 Load page from Python

Example

Motivation

yury-s commented Aug 26, 2024

jslegers commented Aug 23, 2024 •

edited by yury-s

Loading

3.1 Dump Firebase use to `Json`