Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instagram extractor error #391

Closed
debagos opened this issue Aug 19, 2019 · 11 comments
Closed

Instagram extractor error #391

debagos opened this issue Aug 19, 2019 · 11 comments

Comments

@debagos
Copy link

debagos commented Aug 19, 2019

It looks like Facebook had changed the Instagram profile page. I get a graphql key-error all the time...

[gallery-dl][debug] Version 1.10.1
[gallery-dl][debug] Python 3.6.8 - Linux-4.18.0-25-generic-x86_64-with-Ubuntu-18.10-cosmic
[gallery-dl][debug] requests 2.22.0 - urllib3 1.22
[1/3] https://www.instagram.com/REDACTED/
[gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/REDACTED/'
[gallery-dl][debug] Updating urllib3 ciphers
[instagram][debug] Using InstagramUserExtractor for 'https://www.instagram.com/REDACTED/'
[instagram][info] Logging in as REDACTED
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /accounts/login/ HTTP/1.1" 200 9511
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /web/__mid/ HTTP/1.1" 200 28
[urllib3.connectionpool][debug] https://www.instagram.com:443 "POST /accounts/login/ajax/ HTTP/1.1" 200 296
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /REDACTED/ HTTP/1.1" 200 None
[instagram][error] An unexpected error occurred: KeyError - 'graphql'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[instagram][debug] 
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/gallery_dl/job.py", line 47, in run
    for msg in self.extractor:
  File "/usr/local/lib/python3.6/dist-packages/gallery_dl/extractor/instagram.py", line 36, in items
    for data in self.instagrams():
  File "/usr/local/lib/python3.6/dist-packages/gallery_dl/extractor/instagram.py", line 205, in _extract_profilepage
    yield from self._extract_page(url, 'ProfilePage')
  File "/usr/local/lib/python3.6/dist-packages/gallery_dl/extractor/instagram.py", line 169, in _extract_page
    base_shared_data = shared_data['entry_data'][page_type][0]['graphql']
KeyError: 'graphql'
[2/3] [...]

Thank you for fixing, wish you a great day, yours sincerely.

@iamleot
Copy link
Contributor

iamleot commented Aug 19, 2019 via email

@iamleot
Copy link
Contributor

iamleot commented Aug 19, 2019 via email

@debagos
Copy link
Author

debagos commented Aug 19, 2019

Actually it doesn't matter if public or private profile...
I started the same downloads again without authentication towards Instagram and it worked. So maybe the extractor isn't causing the problem here.
I reckon that the problem is cause by my password.
It contains a apostrophe and it was easier to use a config which contains the username and password, than escaping the apostrophe successfully. That's why I don't use the -u <your_username> -p <your_password> method. I use --config <path> instead.
My method worked fine for weeks, but now it seems like I'm not logged in anymore through Gallery-DL...

Edit:
Is there a way to save a copy from the fetched document? Maybe that can tell use more about whats going on here...

@iamleot
Copy link
Contributor

iamleot commented Aug 19, 2019 via email

@debagos
Copy link
Author

debagos commented Aug 19, 2019

I created a local copy of this repo and now I'm fiddling around, trying to find the cause...
I'm definitively logged in, but the extractor fails at

if 'entry_data' in shared_data:
                base_shared_data = shared_data['entry_data'][psdf['page']][0]['graphql']

in extractor/instagram.py
I will report back if I can fix it.

@mikf
Copy link
Owner

mikf commented Aug 27, 2019

@debagos Did you manage to find anything? Does this error still exist?

If it does, could you add

from .. import util
util.dump_json(shared_data)
exit()

after

shared_data = self._extract_shared_data(page)

and post the output here? (Maybe use pastebin or similar if its too long)
The contents of page might also be interesting.

@debagos
Copy link
Author

debagos commented Sep 1, 2019

Sorry, I'm pretty busy at the moment...
The problem still persists (v.1.10.3) and I did what you suggested @mikf. Thank you.

  • page contains the whole Instagram site and I am logged in. Good.
  • shared_data contains the inner html from the window._sharedData javascript. Good.
  • shared_data also contains the entry_data json-field, which is checked in the following if statement if 'entry_data' in shared_data:. Good.
  • entry_data only contains this: "ProfilePage": [{}] !
  • So logically base_shared_data = shared_data['entry_data'][psdf['page']][0]['graphql'] throws an error. Extractor fails at this point because entry_data has no graphql field. Bad.
  • I searched through page and found a graphql field inside a window.__additionalDataLoaded javascript which seems to hold all the relevant informations the extractor needs.

My assumption is that I am part of a canary/experimental group which gets a newer Instagram layout. My knowledge about python (or programming in general) is very low, so I am not able to resolve this problem by myself. Even if I post my page content here, what about the other people with that old Instagram layout? I think the extractor will get pretty complex... What do you guys think, do you want to investigate further into this very specific problem or should we just wait and drink tea?

@mikf
Copy link
Owner

mikf commented Sep 3, 2019

Thank you for the detailed response!

what about the other people with that old Instagram layout?

This would be handled by first checking if it's the "old" layout, i.e. if there is a graphql field in the initial shared_data, and otherwise it would switch to grabbing the data from window.__additionalDataLoaded or something like that. Shouldn't be very complicated.

What do you guys think, do you want to investigate further into this very specific problem

Yes, I would really like to see in how your Instagram (data) layout differs from a "normal" one, so this can hopefully be fixed. You also don't have to post the contents of page with your personal data out in the open. Sending an email or a PM on Gitter is a possibility as well.

@github-userx
Copy link

Maybe related: instaloader/instaloader#394

@ghost
Copy link

ghost commented Oct 29, 2019

So in the last few days I have recently been getting this graphql error. I have normally been able to download public and private profiles while logged in but it seems Instagram has changed something on their end? Perhaps the rollout of the new dark theme within their app? I'm not the best with coding so not quite sure what went wrong but have pre configured the .conf file with the correct details in my /etc directory as per the defaults.

Commands typed in to the terminal

gallery-dl --sleep 02 https://www.instagram.com/REDACTED/ 

Below is the output.

[gallery-dl][debug] Version 1.10.6
[gallery-dl][debug] Python 3.5.2 - Linux-5.0.0-32-generic-x86_64-with-Ubuntu-18.04-bionic
[gallery-dl][debug] requests 2.22.0 - urllib3 1.25.6
[gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/REDACTED/'
[gallery-dl][debug] Updating urllib3 ciphers
[instagram][debug] Using InstagramUserExtractor for 'https://www.instagram.com/REDACTED/'
[instagram][info] Logging in as REDACTED
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com:443
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /accounts/login/ HTTP/1.1" 200 9969
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /web/__mid/ HTTP/1.1" 200 28
[urllib3.connectionpool][debug] https://www.instagram.com:443 "POST /accounts/login/ajax/ HTTP/1.1" 200 412
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /REDACTED/ HTTP/1.1" 200 18597
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /p/REDACTED/ HTTP/1.1" 200 None
[instagram][error] An unexpected error occurred: KeyError - 'graphql'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[instagram][debug] 
Traceback (most recent call last):
  File "/snap/gallery-dl/865/lib/python3.5/site-packages/gallery_dl/job.py", line 47, in run
    for msg in self.extractor:
  File "/snap/gallery-dl/865/lib/python3.5/site-packages/gallery_dl/extractor/instagram.py", line 35, in items
    for data in self.instagrams():
  File "/snap/gallery-dl/865/lib/python3.5/site-packages/gallery_dl/extractor/instagram.py", line 427, in instagrams
    'query_hash': 'f2405b236d85e8296cf30347c9f08c2a',
  File "/snap/gallery-dl/865/lib/python3.5/site-packages/gallery_dl/extractor/instagram.py", line 269, in _extract_page
    yield from self._extract_postpage(url)
  File "/snap/gallery-dl/865/lib/python3.5/site-packages/gallery_dl/extractor/instagram.py", line 109, in _extract_postpage
    media = shared_data['entry_data']['PostPage'][0]['graphql']['shortcode_media']
KeyError: 'graphql'

mikf added a commit that referenced this issue Oct 29, 2019
The '_sharedData' of Post pages is missing its 'graphql' part for
logged in users. This data is now included in the parameters of a
function call to '__additionalDataLoaded(...)'

And, of course, video extraction with youtube-dl broke because of
this change as well.
@mikf
Copy link
Owner

mikf commented Oct 29, 2019

My own account now also has the new "layout" for Post pages it seems, and I've managed to implement a fix (5fa6ff0). But, as the commit message says, video downloads when logged in no longer work. Disabling downloader.ytdl.forward-cookies works around that for public videos, but private videos aren't downloadable any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants