Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random source ids of 'youtube' #175

Closed
EndOfLine369 opened this issue Aug 21, 2018 · 5 comments
Closed

Random source ids of 'youtube' #175

EndOfLine369 opened this issue Aug 21, 2018 · 5 comments

Comments

@EndOfLine369
Copy link
Collaborator

EndOfLine369 commented Aug 21, 2018

Hey @ZeroQI

Seems to be random listings of 'youtube' being listed as a found forced id.

What is the reasoning behind the 'source' name being optional now?
The vagueness of this id name requirement will also potentially cause mismatches.

  • (IE: "asdf [qwe] [anidb-123]" --> will catch the [qwe] instead of the valid id entry)
>>> SOURCE_IDS                = '\[((?P<source>(anidb|anidb2|tvdb|tvdb2|tvdb3|tvdb4|tvdb5|tmdb|tsdb|imdb|youtube|youtube2))-)?(?P<id>[^\[\]]*)\]'
>>> re.search(SOURCE_IDS, "asdf [qwe] [anidb-123]", re.IGNORECASE).groupdict()
{'id': 'qwe', 'source': None}
>>> SOURCE_IDS                = '\[(?P<source>(anidb|anidb2|tvdb|tvdb2|tvdb3|tvdb4|tvdb5|tmdb|tsdb|imdb|youtube|youtube2))-(?P<id>[^\[\]]*)\]'
>>> re.search(SOURCE_IDS, "asdf [qwe] [anidb-123]", re.IGNORECASE).groupdict()
{'id': '123', 'source': 'anidb'}

There is even a secondary test where you have a blank case for source. (|youtube)
You also have a test of if source or id that for what I can tell will never be set before this test.

...
SOURCE_IDS                = '\[((?P<source>(anidb|anidb2|tvdb|tvdb2|tvdb3|tvdb4|tvdb5|tmdb|tsdb|imdb|youtube|youtube2))-)?(?P<id>[^\[\]]*)\]'                                       #
...
    match = re.search(SOURCE_IDS, folder_show, re.IGNORECASE)
    if not match and len(reverse_path)>1:  match = re.search('(.* )?\[((?P<source>(|youtube))-)?(?P<id>.*)\]', reverse_path[1], re.IGNORECASE)
    if source or id:
      Log.info("Forced ID in season folder: '{}' with id '{}' in series folder".format(source, id))
    elif match:
      id     = match.group('id'    ) if match.groupdict().has_key('id'    ) and match.group('id'    ) else '' 
      source = match.group('source') if match.groupdict().has_key('source') and match.group('source') else 'youtube' 
      Log.info("Forced ID in series folder: '{}' with id '{}'".format(source, id))
    else:
...

Shouldn't it be something like the below?

...
SOURCE_IDS                = '\[(?P<source>(anidb|anidb2|tvdb|tvdb2|tvdb3|tvdb4|tvdb5|tmdb|tsdb|imdb|youtube|youtube2))-(?P<id>[^\[\]]*)\]'                                       #
...
    match = re.search(SOURCE_IDS, folder_show, re.IGNORECASE)
    if not match and len(reverse_path)>1:
      match = re.search(SOURCE_IDS, reverse_path[1], re.IGNORECASE)
      if match:  Log.info("Forced ID sourced from series folder")
    if match:
      id     = match.group('id'    ) if match.groupdict().has_key('id'    ) and match.group('id'    ) else '' 
      source = match.group('source') if match.groupdict().has_key('source') and match.group('source') else ''
      Log.info("Forced ID: '{}' with id '{}'".format(source, id))
    else:
...
ZeroQI added a commit that referenced this issue Aug 21, 2018
#175
#172 (comment)

source name is optional for youtube playlist and channels as already super long
youtube id can be in subfolder (series folder inside grouping folder)
@ZeroQI
Copy link
Owner

ZeroQI commented Aug 21, 2018

source name optional for youtube playlist and channels
the following was used for youtube ids on season folder, which I removed support for by commenting higher the other part of the suppoting code

if not match and len(reverse_path)>1:  match = re.search('(.* )?\[((?P<source>(|youtube))-)?(?P<id>.*)\]', reverse_path[1], re.IGNORECASE)
    if source or id:
      Log.info("Forced ID in season folder: '{}' with id '{}' in series folder".format(source, id))

@purposelycryptic
Copy link

Well, I guess that explains where this line came from in the log for the series folder "Owarimonogatari (2017) [Dmon] [BD, 720p]":

Forced ID in series folder: 'youtube' with id 'Dmon'

Didn't seem to cause any harm, but it did seem rather odd - completely forgot that ASS is used by ZeroQI's YouTube-Agent too.

@EndOfLine369
Copy link
Collaborator Author

EndOfLine369 commented Aug 22, 2018

Hey @ZeroQI,
That commit really adds in a lot of regex searches and additional regex strings that I don't see as required. Which still did not resolve the random matches that were seen like '[Dmon]'.

This should be easily simplified by a single regex.

...
SOURCE_IDS                = '\[((?P<source>(anidb(|2)|tvdb(|[2-5])|tmdb|tsdb|imdb|youtube(|2)))-(?P<id>[^\[\]]*)|(?P<yt>(PL[^\[\]]{16}|UC[^\[\]]{22})))\]'
...
    match = re.search(SOURCE_IDS, folder_show, re.IGNORECASE)
    if not match and len(reverse_path)>1:
      match = re.search(SOURCE_IDS, reverse_path[1], re.IGNORECASE)
      if match:  Log.info("Forced ID sourced from series folder")
    if match:
      if match.groupdict().has_key('yt') and match.group('yt'):
        source, id = 'youtube', match.group('yt')
      else:
        id     = match.group('id'    ) if match.groupdict().has_key('id'    ) and match.group('id'    ) else '' 
        source = match.group('source') if match.groupdict().has_key('source') and match.group('source') else ''
      Log.info("Forced ID: '{}' with id '{}'".format(source, id))
    else:
      for file in SOURCE_ID_FILES:
...

EX:

>>> SOURCE_IDS                = '\[((?P<source>(anidb(|2)|tvdb(|[2-5])|tmdb|tsdb|imdb|youtube(|2)))-(?P<id>[^\[\]]*)|(?P<yt>(PL[^\[\]]{16}|UC[^\[\]]{22})))\]'
>>> qwe = 'asd [qwe] [anidb-123]'                         #Will properly ignore '[qwe]'
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': 'anidb', 'yt': None, 'id': '123'}
>>> qwe = 'zxcasd [PL1234567890123456]'
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': None, 'yt': 'PL1234567890123456', 'id': None}
>>> qwe = 'zxcasd [anidb-123] [PL1234567890123456]'
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': 'anidb', 'yt': None, 'id': '123'}
>>> qwe = 'zxcasd [PL1234567890123456] [anidb-123]'
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': None, 'yt': 'PL1234567890123456', 'id': None}
>>> qwe = 'zxcasd [PL12345678901234567890] [anidb-123]'   #To many numbers after 'PL'
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': 'anidb', 'yt': None, 'id': '123'}
>>> qwe = 'asd [youtube-654321]'
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': 'youtube', 'yt': None, 'id': '654321'}
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': 'youtube2', 'yt': None, 'id': '654321'}
>>> qwe = 'asd [tvdb3-321]'
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': 'tvdb3', 'yt': None, 'id': '321'}

(Edit: Condensed the predefined 'SOURCE_IDS' ids)

EndOfLine369 added a commit that referenced this issue Aug 22, 2018
@ZeroQI
Copy link
Owner

ZeroQI commented Aug 22, 2018

@EndOfLine369 Condensed 'SOURCE_IDS' is indeed clearer and shorter.
Reduced coding further but it is way clearer now thanks to you, so thank you.

@EndOfLine369
Copy link
Collaborator Author

Indeed even looks more cleaner 😄 Team effort.
That should be it then for this then so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants