Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Deviantart] Download sta.sh links in story/html posts #2620

Open
Scripter17 opened this issue May 24, 2022 · 7 comments
Open

[Deviantart] Download sta.sh links in story/html posts #2620

Scripter17 opened this issue May 24, 2022 · 7 comments

Comments

@Scripter17
Copy link
Contributor

Ideally this'd be included in "extra":true

@mikf
Copy link
Owner

mikf commented May 24, 2022

This should already be a thing: 41d0316
Is this not working anymore?

@rautamiekka
Copy link
Contributor

Works perfectly fine for me.

Config=

{
    "extractor": {
        "base-directory": "REDACTED",
        "parent-directory": false,
        "archive": "archive.sqlite3",
        "cookies-update": true,
        "skip": true,

        "postprocessors": [
            {
                "name": "metadata",
                "mode": "custom",
                "content-format": "< folders -->\n{folders}\n<-- folders >\n< tags -->\n{tags}\n<-- tags >\n< description -->\n{description}\n<-- description >",
                "extension-format": "descr.txt"
            },

            {
                "name": "compare",
                "action": "enumerate"
            },

            {
                "name": "metadata",
                "mode": "post",
                "extension-format": "post.json"
            }
        ],

        "retries": 20,
        "timeout": 30.0,
        "verify": true,
        "chapter-unique": true,
        "image-unique": true,

        "sleep": 0,
        "sleep-request": 0,
        "sleep-extractor": 0,

        "category-transfer": false,

        "deviantart": {
            "username": "REDACTED",
            "password": "REDACTED",
            "client-id": "REDACTED",
            "client-secret": "REDACTED",
            "include": "gallery,scraps,journal",
            "extra": true,
            "mature": true,
            "original": true,
            "folders": false,
            "filename": "{category}_{author[username]}_{index}_{date:%Y-%m-%d_%H_%M_%S}_{title}.{extension}",

            "gallery": {
                "folders": false
            },

            "favorite": {
                "folders": false
            },
            
            "journals": "html",
            "metadata": true,
            "cookies": "REDACTED",
            "quality": 100,
            "wait-min": 0,
            "flat": true
        },

        "oauth": {
            "browser": false,
            "cache": true
        }
    },

    "downloader": {
        "mtime": true,
        "part": true,
        "part-directory": null,
        "rate": null,
        "retries": 20,
        "timeout": 30.0,
        "verify": true,
        "progress": 0.1,

        "http": {
            "adjust-extensions": true,
            "headers": null
        },

        "ytdl": {
            "outtmpl": "%(uploader_id)s/%(title)s %(resolution)s #%(id)s#.%(ext)s",
            "config-file": "config.txt",
            "forward-cookies": true
        }
    },

    "output":
    {
        "mode": "auto",

        "log": {
            "level": "debug",

            "format": {
                "debug"  : "\u001b[0;37m{name}: {message}\u001b[0m",
                "info"   : "\u001b[1;37m{name}: {message}\u001b[0m",
                "warning": "\u001b[1;33m{name}: {message}\u001b[0m",
                "error"  : "\u001b[1;31m{name}: {message}\u001b[0m"
            }
        },

        "logfile": {
            "path": "log.txt",
            "mode": "w",
            "level": "debug"
        },

        "unsupportedfile": {
            "path": "unsupported.txt",
            "mode": "a",
            "format": "{asctime} {message}",
            "format-date": "%Y-%m-%d_%H-%M-%S"
        },

        "shorten": false
    },

    "cache": {
        "file": "cache.sqlite3"
    },

    "netrc": true
}

All SFW links:

I couldn't find any status posts with a sta.sh link, though.

@Scripter17
Copy link
Contributor Author

Scripter17 commented May 28, 2022

Did some testing and it seems the post I'm having issues with (which I won't post since it's NSFW) managed to put a link at the bottom that isn't caught by gallery-dl at all

The DeviantartExtractor.items method just doesn't get whatever metadata that link is stored in

I'll see if I can bodge in a solution but don't expect it to be clean

Edit: Yeah whatever this guy did there just isn't an endpoint for it. So unless mikf is okay with (probably) breaking DA's TOS and using internal APIs/webscraping this issue is currently unresolvable

Edit 2: It could be the ?edit=1 at the end of the URL breaking things

@mikf
Copy link
Owner

mikf commented May 30, 2022

Edit: Yeah whatever this guy did there just isn't an endpoint for it. So unless mikf is okay with (probably) breaking DA's TOS and using internal APIs/webscraping this issue is currently unresolvable

That's fine, the current code already uses several internal API endpoints:

class DeviantartEclipseAPI():

Edit 2: It could be the ?edit=1 at the end of the URL breaking things

The regex pattern for sta.sh links also matches those links:

$ gallery-dl "https://sta.sh/022c83odnaxc?edit=1&foo=bar#baz"
/tmp/deviantart/justatest235723/deviantart_778297656_01.png

@Scripter17
Copy link
Contributor Author

So near the bottom of the view-source of each deviation is a line that starts with window.__INITIAL_STATE__ = JSON.parse(. In there is the sta.sh link that gallery-dl doesn't normally get

It may be worth poking around it to see what can be grabbed from there without needing to do weird regex stuff

This really jank code should grab the sta.sh links

source="<view-source contents>"
id="<the number ID in the URL. Yes as a string>"
data=json.loads(re.sub(r"", "", re.search(r"window\.__INITIAL_STATE__ = JSON\.parse\(\"(.+)\"\)", source)[1]))
data2=json.loads(data["@@entities"]["deviation"][id]["textContent"]["html"]["markup"])
for entity in data2["entityMap"]:
    print(data2["entityMap"][entity]["data"]["url"])

Even though I won't post the problem link I imagine this'd work on any page with sta.sh links in it, so it can be tested

@mikf
Copy link
Owner

mikf commented Dec 17, 2022

I think this issue is resolved with the changes from #3366.

@ClosedPort22
Copy link
Contributor

ClosedPort22 commented Dec 17, 2022

I think this issue is resolved with the changes from #3366.

It's possible that deviation["text_content"]["body"]["markup"] contains more sta.sh links. Although the documentation mentions the markup field, no information on how it's used is available. I couldn't find any posts with it to inspect the content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants