Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support Firebase IndexedDB for automated login #32300

Closed
jslegers opened this issue Aug 23, 2024 · 1 comment
Closed

[Feature]: Support Firebase IndexedDB for automated login #32300

jslegers opened this issue Aug 23, 2024 · 1 comment

Comments

@jslegers
Copy link

jslegers commented Aug 23, 2024

🚀 Feature Request

1. Use case

I'm trying to extract all content I produced @ https://legacy.mage.space/u/johnslegers before disappears for good in less than 10 days.

Since downloading 14K images with corresponding prompts is quite insane, I figured I'd write a crawler for it instead.


2. Login issues

Since some of my content is set to private and can't be set to public, I wanted to log in with my user so I could scrape all of my content. Since I couldn't get the Google Auth to work in Chromium for some reason, I figured it was best to just inject my Firebase user after the page was loaded.

I found some JS code at #11164 that allowed me to achieve what I wanted after some modifications, but IMO this approach is way too hacky, and I'd expect out-of-the-box support for this in Playwright.

I'll probably create a demo repo of my finished project in the very near future after I cleaned up my code, but for the time being here's some snippets with code that allowed me to get me to correctly log in on Chromium.


3. Snippets

3.1 Dump Firebase use to Json

First, I extracted my Firebase user by copy-pasting the following script in the console of the website I'm trying to scrape :

// Source https://gist.github.com/Matt-Jensen/d7c52c51b2a2ac7af7e0f7f1c31ef31d

(() => {
    const asyncForEach = (array, callback, done) => {
        const runAndWait = i => {
            if (i === array.length) return done();
            return callback(array[i], () => runAndWait(i + 1));
        };
        return runAndWait(0);
    };

    const dump = {};
    const dbRequest = window.indexedDB.open("firebaseLocalStorageDb");
    dbRequest.onsuccess = () => {
        const db = dbRequest.result;
        const stores = ['firebaseLocalStorage'];

        const tx = db.transaction(stores);
        asyncForEach(
            stores,
            (store, next) => {
                const req = tx.objectStore(store).getAll();
                req.onsuccess = () => {
                    dump[store] = req.result;
                    next();
                };
            },
            () => {
                console.log(JSON.stringify(dump));
            }
        );
    };
})();

3.2 Adding the correct user info after first removing the wrong info

This is the Javascript code that's injected when the page is loaded in my request handler in __main__.py . For the time being it's just a test string in __main__.py, but it will be moved into a separate .js file with the code & a .json file with the login data.

It's an adapation of the code from the previous comment by from OVO-Josh.

// Adaptation of the code by OVO-Josh

(function adduser() {
    function insertUser(db, user) {
        const txn = db.transaction('firebaseLocalStorage', 'readwrite');
        const store = txn.objectStore('firebaseLocalStorage');
        store.delete(user.fbase_key);
        store.add(user);
        txn.oncomplete = function(ev) {
            db.close();
        };
        txn.onerror = function(ev) {
            console.error(ev.target.error.message)
            db.close();
        };
        return txn;
    }
    const request = window.indexedDB.open('firebaseLocalStorageDb');
    request.onfailure = function(ev) {
        console.error(ev.target.error.message);
    };
    request.onsuccess = function(ev) {
        const db = request.result;
        return insertUser(db, {
            "fbase_key": "firebase:authUser:_____:[DEFAULT]",
            "value": {
                "uid": _____,
                "email": _____,
                "emailVerified": true,
                "displayName": _____,
                "isAnonymous": false,
                "photoURL": _____,
                "providerData": [{
                        "providerId": "google.com",
                        "uid": _____,
                        "displayName": _____,
                        "email": _____,
                        "phoneNumber": null,
                        "photoURL": _____
                    },
                    {
                        "providerId": "password",
                        "uid": _____,
                        "displayName": _____,
                        "email": _____,
                        "phoneNumber": null,
                        "photoURL": _____
                    }
                ],
                "stsTokenManager": {
                    "refreshToken": _____,
                    "accessToken": _____,
                    "expirationTime": _____
                },
                "createdAt": _____,
                "lastLoginAt": _____,
                "apiKey": _____,
                "appName": "[DEFAULT]"
            }
        });
    };
    return request;
})()

3.2 Load page from Python

Here's my Python request handler in __main__.py, where I actually add the JS inject, after first waiting until the DOM has loaded :

async def request_handler(context: PlaywrightCrawlingContext) -> None:
    context.log.info(f"Processing {context.request.url} ...")
    page = context.page

    # Wait until the content I'm interested in is loaded
    await page.wait_for_selector(selector)
    # Update the users in the Firebase DB with the correct value
    # add_user is the above JS code that's injected
    await page.evaluate(add_user)

        _____

Example

async def request_handler(context: PlaywrightCrawlingContext) -> None:
    context.log.info(f"Processing {context.request.url} ...")
    page = context.page

    # Wait until the content I'm interested in is loaded
    await page.wait_for_selector(selector)
    # Update the user in the Firebase DB with the correct value
    await page.authenticate(firebase_user)

Motivation

Since I'm sure other users of Playright struggle with similar issues (see #11164), it makes sense for this behavior to be supported out-of-the box.

@jslegers jslegers changed the title [Feature]: Support IndexedDB for shared auth use cases [Feature]: Support IndexedDB automated login Aug 23, 2024
@jslegers jslegers changed the title [Feature]: Support IndexedDB automated login [Feature]: Support Firebase IndexedDB for automated login Aug 23, 2024
@mxschmitt mxschmitt transferred this issue from microsoft/playwright-python Aug 23, 2024
@yury-s
Copy link
Member

yury-s commented Aug 26, 2024

Folding into #11164

@yury-s yury-s closed this as completed Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants