Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hanging track search #937

Closed
Rsalganik1123 opened this issue Feb 1, 2023 · 22 comments
Closed

Hanging track search #937

Rsalganik1123 opened this issue Feb 1, 2023 · 22 comments
Labels

Comments

@Rsalganik1123
Copy link

Describe the bug
Sometimes, the sp.track() just hangs without an error and without a return but at the same time, sp.audio_features() works properly.

Your code

import os
import time
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from utils.private import * 

os.environ['SPOTIPY_CLIENT_ID'] = spotify_client_id
os.environ['SPOTIPY_CLIENT_SECRET'] = spotify_client_secret


def refresh_spotify():
    print('refreshing spotify token')
    auth_manager = SpotifyClientCredentials()
    sp = spotipy.Spotify(auth_manager=auth_manager, client_credentials_manager=auth_manager, requests_timeout=2)
    return sp


id= '4wCmqSrbyCgxEXROQE6vtV'
uri = 'spotify:track:4wCmqSrbyCgxEXROQE6vtV' 
sp = refresh_spotify()
print("ready")
results = sp.audio_features(uri)
results = sp.track(uri)

And when I call sp.audio_features(uri) it responds immediately but when I call sp.track(uri) it can hang up to more than 3 minutes before I ctrl+C out of the code even though I set a timeout value.

After reading a few GitHub issues (see #913 , #862, #844, etc) it seems that there may be an issue with the spotify token going stale but I am not sure how to fix this since I am following the methodology of the README for initializing my credentials. I know that the URI is valid because it was working for about 30 minutes and then it stopped. Since I doing an augmentation for a large dataset, I need to be able to refresh my token.

Expected behavior
Throwing either an error or returning the search query.

Output
None and I'm not sure how to debug for one.

Environment:

  • OS: [Linux]
  • Python version [3.9]
  • spotipy version [2.22]
  • your IDE (if using any) [VSCode]

Additional context
Add any other context about the problem here.

@stephanebruckert
Copy link
Member

I'm not sure why you have

def refresh_spotify():
    print('refreshing spotify token')

because spotipy is supposed to refresh the token automatically without it.

It would be good to have a complete code that we could run to show the problem. Even if's doing a loop over the same track and it takes 30 minutes. Any chance you can provide that?

@Rsalganik1123
Copy link
Author

Rsalganik1123 commented Feb 1, 2023

Yes, of course! I was trying to give you a simple MVP. But if you'd like my full code, here it is.

Given a csv with spotify uris I load the albums and artists associated with them and then also scrape info about those albums/artists.
I separated the code into a few files.

In org_info.py:

import os
import pickle
import requests
import ipdb 
from tqdm import tqdm
from utils.scraping_utils import chunks
from utils.spotify_connector import refresh_spotify, load_track_data

def scrape_org_info_from_track_uri(track_uri, output_path): 
    all_data = []
    # ipdb.set_trace() 
    batches = list(chunks(track_uri, 100))
    missed_tracks = []
    sp = refresh_spotify()
    for batch in tqdm(batches):
        try:
            results = load_track_data(sp, batch)
        except Exception as e:
            print("error:", e)
            missed_tracks = missed_tracks + batch
        else:
            for uri, track in zip(batch, results['tracks']):
                try:
                    album = track['album']
                    data = {
                        'track_uri': uri, 
                        'track_name': track['name'], 
                        'album_uri': album['uri'], 
                    }
                    all_data.append(data)
                except:
                    missed_tracks.append(uri)    
    pickle.dump(all_data, open(output_path, 'wb'))
    return all_data, missed_tracks

in utils/spotify_connector.py:

import os
import time
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from utils.private import * 

os.environ['SPOTIPY_CLIENT_ID'] = spotify_client_id
os.environ['SPOTIPY_CLIENT_SECRET'] = spotify_client_secret

def refresh_spotify():
    print('refreshing spotify token')
    auth_manager = SpotifyClientCredentials()
    token = auth_manager.get_access_token()
    
    sp = spotipy.Spotify(auth_manager=auth_manager, client_credentials_manager=auth_manager, requests_timeout=2)
    return sp

def load_track_data(sp, batch, retry_num=10):
    if retry_num < 0:
        raise Exception('Error')

    retry_num -= 1
    try:
        results = sp.tracks(tracks=batch)
        return results
    except Exception as e:
        print(e)
        sp = refresh_spotify()
        time.sleep(10)

    return load_track_data(sp, batch, retry_num)

And finally in launch.py:

import pandas as pd 
import time 
import ipdb 

from utils.private import * 
from utils.spotify_connector import refresh_spotify
from scraping.music_features import scrape_music_feature_data 
from scraping.album_artwork import scrape_albums_data_from_track_uri
from scraping.org_info import scrape_org_info_from_track_uri

def test_one(uris): 
    id= '4wCmqSrbyCgxEXROQE6vtV'
    uri = 'spotify:track:4wCmqSrbyCgxEXROQE6vtV' 
    sp = refresh_spotify()
    print("ready")
    # results = sp.audio_features(uri)
    results = sp.track(uri)
   print(results) 

def load_org_info(file_path, output_path): 
    path = file_path
    b = time.time()
    data = pd.read_csv(path, delimiter = '\t', quoting = 3)
    a = time.time() 
    print("LOADING TOOK:{}".format(a-b))
    output_path =output_path
    uris = data.uri.to_list() 
    scrape_org_info_from_track_uri(uris, output_path)

@stephanebruckert
Copy link
Member

Thanks! Would you have a sample csv too please?

@Rsalganik1123
Copy link
Author

Yes! It can be downloaded from this link. Select the option spotify-uris.tsv.bz2.

The loop itself runs for about 30 minutes and then cuts out. As evidenced by the output here:

 21%|██        | 9921/47563 [35:52<2:06:41,  4.95it/s]
 21%|██        | 9922/47563 [35:52<2:07:19,  4.93it/s]
 21%|██        | 9923/47563 [35:52<2:07:48,  4.91it/s]
 21%|██        | 9924/47563 [35:52<2:05:17,  5.01it/s]
 21%|██        | 9925/47563 [35:53<2:06:36,  4.95it/s]
 21%|██        | 9926/47563 [35:53<2:12:19,  4.74it/s]
 21%|██        | 9927/47563 [35:53<2:07:48,  4.91it/s]
 21%|██        | 9928/47563 [35:53<2:10:03,  4.82it/s]
 21%|██        | 9929/47563 [35:53<2:09:34,  4.84it/s]
 21%|██        | 9930/47563 [35:54<2:04:35,  5.03it/s]
 21%|██        | 9931/47563 [35:54<2:17:14,  4.57it/s]
 21%|██        | 9932/47563 [35:54<2:16:12,  4.60it/s]
 21%|██        | 9933/47563 [35:54<2:18:55,  4.51it/s]
 21%|██        | 9934/47563 [35:54<2:12:09,  4.75it/s]
 21%|██        | 9935/47563 [35:55<2:25:06,  4.32it/s]
 21%|██        | 9936/47563 [35:55<2:13:25,  4.70it/s]
 21%|██        | 9937/47563 [35:55<2:19:06,  4.51it/s]
 21%|██        | 9938/47563 [35:55<2:24:45,  4.33it/s]
 21%|██        | 9939/47563 [35:56<2:15:24,  4.63it/s]
 21%|██        | 9940/47563 [35:56<2:11:21,  4.77it/s]
 21%|██        | 9941/47563 [35:56<2:08:40,  4.87it/s]
 21%|██        | 9942/47563 [35:56<2:12:49,  4.72it/s]

The code doesn't exit, just stays hanging.

Thank you for your help!

@stephanebruckert
Copy link
Member

stephanebruckert commented Feb 1, 2023

@Rsalganik1123 sorry this is a bit complex to run. There is no main function.

Please can you provide a minimal reproducible example?

For example something that would be as simple as this and would reproduce the issue for you:

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from pprint import pprint
sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials())

i = 0
while True:
    res = sp.track('spotify:track:3HPHPoXFupNZXfnFXmiJI5')
    print(res['uri'], str(i))
    i += 1

Btw I have been running this ^ for 1.5 hours and it's still running well. Anything that would change on your side?

Also, your code contains refresh_spotify(). As mentionned above, you shouldn't manually refresh it as it is done automatically by spotipy.

@Rsalganik1123
Copy link
Author

Rsalganik1123 commented Feb 1, 2023

Hello, using the MVE that you suggested, I will place the code below, I am able to reproduce the error.

import spotipy
    from spotipy.oauth2 import SpotifyClientCredentials
    from pprint import pprint

    os.environ['SPOTIPY_CLIENT_ID'] = spotify_client_id
    os.environ['SPOTIPY_CLIENT_SECRET'] = spotify_client_secret
    print("Starting Credential Check ")
    sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials())
    print("Completed Credential Check")
    i = 0
    while True:
        res = sp.track('spotify:track:3HPHPoXFupNZXfnFXmiJI5')
        print(res['uri'], str(i))
        i += 1

the output is:

Starting Credential Check 
Completed Credential Check
spotify:track:3HPHPoXFupNZXfnFXmiJI5 0
spotify:track:3HPHPoXFupNZXfnFXmiJI5 1

And now it's hanging. Could it be because I'm running the code from a distributed server?

Thank you!

@stephanebruckert
Copy link
Member

Interesting! What do you mean exactly by distributed server?

@Rsalganik1123
Copy link
Author

Rsalganik1123 commented Feb 1, 2023

I'm running on an academic server which uses a SLURM scheduler to dispatch jobs to various compute nodes. To be honest, I'm not super familiar with the appropriate terminology to describe the architecture.

@stephanebruckert
Copy link
Member

stephanebruckert commented Feb 1, 2023

Really not sure, it still should be one job running on one node with internet access. Perhaps there are some firewalls on the way or the requests are blocked. It would be great if you could try your code outside this server (for example locally) and/or with another internet connection.

It's really strange that it stops after 2 requests only? Did you say the same was not happening with sp.audio_features(uri)?

We can try to show debug logs, just add this to the top of the file:

import logging
logging.basicConfig(level='DEBUG')

It should show something like the following (just make sure to remove your access token after 'Authorization': 'Bearer...):

DEBUG:spotipy.client:Sending GET to https://api.spotify.com/v1/tracks/3HPHPoXFupNZXfnFXmiJI5 with Params: {'market': None} Headers: {'Authorization': 'Bearer ', 'Content-Type': 'application/json'} and Body: None
DEBUG:urllib3.connectionpool:https://api.spotify.com:443 "GET /v1/tracks/3HPHPoXFupNZXfnFXmiJI5 HTTP/1.1" 200 None
DEBUG:spotipy.client:RESULTS: {'albu....
spotify:track:3HPHPoXFupNZXfnFXmiJI5 23

Also, when you cancel it with Ctrl+C does it show a stack trace? If yes please share it.

@Rsalganik1123
Copy link
Author

Rsalganik1123 commented Feb 1, 2023

I can try it locally and report back later today.
In the meantime, regarding all your other suggestions:

  1. Having inserted the logger, here is what I get when I run the code:
Starting Credential Check 
Completed Credential Check
DEBUG:spotipy.oauth2:sending POST request to https://accounts.spotify.com/api/token with Headers: {'Authorization': 'Basic MTU3ZDAyNzYwZjIzNDcxYWE1N2I2YTcwMGE3OGM4NjM6Mjc1NTJjMjc1MTAzNGNmODljN2RmZTcxN2I4YzkyYzQ='} and Body: {'grant_type': 'client_credentials'}
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): accounts.spotify.com:443
DEBUG:urllib3.connectionpool:https://accounts.spotify.com:443 "POST /api/token HTTP/1.1" 200 None
DEBUG:spotipy.client:Sending GET to https://api.spotify.com/v1/tracks/3HPHPoXFupNZXfnFXmiJI5 with Params: {'market': None} Headers: {'Authorization': 'Bearer BQC7sQ5JNp5yTnPiT2N1UKnyuPO-yyBiZjMa-2O608ifh1r6t7X9_-xENIdzPumuGgnNMkpv8bhbAIhAGe4i1d1tgGdfLvJgjD0n0C3Dt3Qvw8ps2sry', 'Content-Type': 'application/json'} and Body: None 
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.spotify.com:443
DEBUG:urllib3.connectionpool:https://api.spotify.com:443 "GET /v1/tracks/3HPHPoXFupNZXfnFXmiJI5 HTTP/1.1" 429 None
DEBUG:urllib3.util.retry:Incremented Retry for (url='/v1/tracks/3HPHPoXFupNZXfnFXmiJI5'): Retry(total=2, connect=None, read=False, redirect=None, status=2)

And then it remains hanging.

  1. If I run it using sp.audio_features() there is no problem - like the output is as expected - and I stop it after i=100

  2. When I cancel with Ctrl + C the stack trace is:

Traceback (most recent call last):
  File "/home/mila/r/rebecca.salganik/Projects/MSD_Extended/runner.py", line 134, in <module>
    test_one_new() 
  File "/home/mila/r/rebecca.salganik/Projects/MSD_Extended/runner.py", line 130, in test_one_new
    sp.track('spotify:track:3HPHPoXFupNZXfnFXmiJI5')
  File "/home/mila/r/rebecca.salganik/.conda/envs/nsv4/lib/python3.9/site-packages/spotipy/client.py", line 369, in track
    return self._get("tracks/" + trid, market=market)
  File "/home/mila/r/rebecca.salganik/.conda/envs/nsv4/lib/python3.9/site-packages/spotipy/client.py", line 321, in _get
    return self._internal_call("GET", url, payload, kwargs)
  File "/home/mila/r/rebecca.salganik/.conda/envs/nsv4/lib/python3.9/site-packages/spotipy/client.py", line 264, in _internal_call
    response = self._session.request(
  File "/home/mila/r/rebecca.salganik/.conda/envs/nsv4/lib/python3.9/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/mila/r/rebecca.salganik/.conda/envs/nsv4/lib/python3.9/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/home/mila/r/rebecca.salganik/.conda/envs/nsv4/lib/python3.9/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/home/mila/r/rebecca.salganik/.conda/envs/nsv4/lib/python3.9/site-packages/urllib3/connectionpool.py", line 876, in urlopen
    retries.sleep(response)
  File "/home/mila/r/rebecca.salganik/.conda/envs/nsv4/lib/python3.9/site-packages/urllib3/util/retry.py", line 428, in sleep
    slept = self.sleep_for_retry(response)
  File "/home/mila/r/rebecca.salganik/.conda/envs/nsv4/lib/python3.9/site-packages/urllib3/util/retry.py", line 407, in sleep_for_retry
    time.sleep(retry_after)
KeyboardInterrupt

@stephanebruckert
Copy link
Member

stephanebruckert commented Feb 1, 2023

Ha! It sounds like this is related #892

Seeing it stopped at sleep_for_retry(), the library is definitely sleeping/waiting and the reason could be a timezone difference. We are reading the token value "expires_at" which is a unix timestamp (UTC). My guess is that your servers likely don't use UTC, so the time that was given to the library doesn't make sense. Either it could be waiting a few extra hours for the time difference, or indefinitely depending on the timezone (+/-UTC) and the type of bug.

I'll have a better look to try and confirm this.

@Rsalganik1123
Copy link
Author

Interesting! But then why would that differ from sp.audio_features?
Thank you for your help!

@stephanebruckert
Copy link
Member

stephanebruckert commented Feb 1, 2023

Actually that was a false lead. It's hanging while trying to retry which is unrelated to the token expiry time.

This discussion is quite interesting #766. It mentions the possibility that the Spotify API can request to retry in 3600 seconds. I'm not sure why it would be so high, maybe a bug on their side.

In the meantime you could try to use this answer #766 (comment) to force a failure, but still wait 1 sec before trying again.

import requests
import urllib3
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy.exceptions import SpotifyException
import time

session = requests.Session()
retry = urllib3.Retry(
    respect_retry_after_header=False
)
adapter = requests.adapters.HTTPAdapter(max_retries=retry)

sp = spotipy.Spotify(auth_manager=SpotifyClientCredentials(),
                     requests_session=session)

i = 0
while True:
    try:
        res = sp.track('spotify:track:3HPHPoXFupNZXfnFXmiJI5')
    except SpotifyException as e:
        if e.http_status == 429:
            print("'retry-after' value:", e.headers['retry-after'])
            time.sleep(1)
        else:
            break
    print(res['uri'], str(i))
    i += 1

Notice it prints the "retry-after" value given by the API. What value do you get?


But then why would that differ from sp.audio_features?

Perhaps for audio_features, the Spotify API doesn't return a "retry_after" value that is so high. Can you try it and check if the "retry-after" value looks different?

@Rsalganik1123
Copy link
Author

Yes! Thank you, I will try it later today and report back.

@Rsalganik1123
Copy link
Author

This works.
A sample of the output for running sp.tracks() (with the exact copy of the code above):

spotify:track:3HPHPoXFupNZXfnFXmiJI5 0
spotify:track:3HPHPoXFupNZXfnFXmiJI5 1
spotify:track:3HPHPoXFupNZXfnFXmiJI5 2
HTTP Error for GET to https://api.spotify.com/v1/tracks/3HPHPoXFupNZXfnFXmiJI5 with Params: {'market': None} returned 429 due to Too many requests
'retry-after' value: 763
spotify:track:3HPHPoXFupNZXfnFXmiJI5 3
HTTP Error for GET to https://api.spotify.com/v1/tracks/3HPHPoXFupNZXfnFXmiJI5 with Params: {'market': None} returned 429 due to Too many requests
'retry-after' value: 762
spotify:track:3HPHPoXFupNZXfnFXmiJI5 4
HTTP Error for GET to https://api.spotify.com/v1/tracks/3HPHPoXFupNZXfnFXmiJI5 with Params: {'market': None} returned 429 due to Too many requests
'retry-after' value: 761
spotify:track:3HPHPoXFupNZXfnFXmiJI5 5
HTTP Error for GET to https://api.spotify.com/v1/tracks/3HPHPoXFupNZXfnFXmiJI5 with Params: {'market': None} returned 429 due to Too many requests
'retry-after' value: 760
spotify:track:3HPHPoXFupNZXfnFXmiJI5 6

Trying the same thing by replacing sp.tracks() with sp.audio_features() doesn't cause any exceptions so it never prints the retry-value.

@Rsalganik1123
Copy link
Author

Also, running with my local machine doesn't have any issues with latency or hanging. So I'm not sure why but it's something about running on the nodes that's causing problems. Could it be because I have an allocation of 2 cpus and there is some sort of multi-threading happening that causes the time out?

@stephanebruckert
Copy link
Member

No I don't think anything that elaborate is happening.

Your retry-after header is telling you to wait about 12 minutes, so it's not a bug but just spotify telling you that your token has been / is being over-used. I would try to start fresh from a new token. On your node/server, try to delete the .cache file in the same folder as your code (it contains your token). Just do rm .cache from a terminal.

Trying the same thing by replacing sp.tracks() with sp.audio_features() doesn't cause any exceptions so it never prints the retry-value.

Make sense!

@Rsalganik1123
Copy link
Author

I see. So for now the only solution is to: A. clear the cache before scraping, B. manually set the retry-after to 1 second ? Is that correct?

@Rsalganik1123
Copy link
Author

Unfortunately, the error is back. Despite cleaning my cache I am getting a 429 error and a 'retry-after' value: 63397

@stephanebruckert
Copy link
Member

stephanebruckert commented Feb 3, 2023

At this point it's not really a spotipy error/bug anymore as you are just being rate-limited by Spotify. Why such a high retry-after value I am not sure but it could be related to the node. Here are other ideas you could experiment with:

  • try with a different Spotify app,
  • try to have multiple apps and rotate through them,
  • try to request an extended quota limit for your app,
  • try to find if something else might be consuming all your quota on that node,
  • try to time.sleep() a few milliseconds before each spotify call, to see if helps reducing the retry-after.

Let us know what your findings are or if you have any other questions, but for now I think it's fair to close this as there is not much we can do inside spotipy.

@BAndriuss
Copy link

I have a function that retrieves me all songs from a playlist and it does it succesfully when called. But when try to use other function that includes sp.album() or sp.album_tracks() the program gets stuck whenever executing album() or album_tracks(). It cant be due to rate-limit cause everything works when not using album() or album_tracks() nor with token/keys.

@stephanebruckert
Copy link
Member

Let's continue this discussion in #913

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants