Random source ids of 'youtube' #175

EndOfLine369 · 2018-08-21T03:54:47Z

Seems to be random listings of 'youtube' being listed as a found forced id.

What is the reasoning behind the 'source' name being optional now?
The vagueness of this id name requirement will also potentially cause mismatches.

(IE: "asdf [qwe] [anidb-123]" --> will catch the [qwe] instead of the valid id entry)

>>> SOURCE_IDS                = '\[((?P<source>(anidb|anidb2|tvdb|tvdb2|tvdb3|tvdb4|tvdb5|tmdb|tsdb|imdb|youtube|youtube2))-)?(?P<id>[^\[\]]*)\]'
>>> re.search(SOURCE_IDS, "asdf [qwe] [anidb-123]", re.IGNORECASE).groupdict()
{'id': 'qwe', 'source': None}
>>> SOURCE_IDS                = '\[(?P<source>(anidb|anidb2|tvdb|tvdb2|tvdb3|tvdb4|tvdb5|tmdb|tsdb|imdb|youtube|youtube2))-(?P<id>[^\[\]]*)\]'
>>> re.search(SOURCE_IDS, "asdf [qwe] [anidb-123]", re.IGNORECASE).groupdict()
{'id': '123', 'source': 'anidb'}

There is even a secondary test where you have a blank case for source. (|youtube)
You also have a test of if source or id that for what I can tell will never be set before this test.

...
SOURCE_IDS                = '\[((?P<source>(anidb|anidb2|tvdb|tvdb2|tvdb3|tvdb4|tvdb5|tmdb|tsdb|imdb|youtube|youtube2))-)?(?P<id>[^\[\]]*)\]'                                       #
...
    match = re.search(SOURCE_IDS, folder_show, re.IGNORECASE)
    if not match and len(reverse_path)>1:  match = re.search('(.* )?\[((?P<source>(|youtube))-)?(?P<id>.*)\]', reverse_path[1], re.IGNORECASE)
    if source or id:
      Log.info("Forced ID in season folder: '{}' with id '{}' in series folder".format(source, id))
    elif match:
      id     = match.group('id'    ) if match.groupdict().has_key('id'    ) and match.group('id'    ) else '' 
      source = match.group('source') if match.groupdict().has_key('source') and match.group('source') else 'youtube' 
      Log.info("Forced ID in series folder: '{}' with id '{}'".format(source, id))
    else:
...

Shouldn't it be something like the below?

...
SOURCE_IDS                = '\[(?P<source>(anidb|anidb2|tvdb|tvdb2|tvdb3|tvdb4|tvdb5|tmdb|tsdb|imdb|youtube|youtube2))-(?P<id>[^\[\]]*)\]'                                       #
...
    match = re.search(SOURCE_IDS, folder_show, re.IGNORECASE)
    if not match and len(reverse_path)>1:
      match = re.search(SOURCE_IDS, reverse_path[1], re.IGNORECASE)
      if match:  Log.info("Forced ID sourced from series folder")
    if match:
      id     = match.group('id'    ) if match.groupdict().has_key('id'    ) and match.group('id'    ) else '' 
      source = match.group('source') if match.groupdict().has_key('source') and match.group('source') else ''
      Log.info("Forced ID: '{}' with id '{}'".format(source, id))
    else:
...

The text was updated successfully, but these errors were encountered:

#175 #172 (comment) source name is optional for youtube playlist and channels as already super long youtube id can be in subfolder (series folder inside grouping folder)

ZeroQI · 2018-08-21T07:53:59Z

source name optional for youtube playlist and channels
the following was used for youtube ids on season folder, which I removed support for by commenting higher the other part of the suppoting code

if not match and len(reverse_path)>1:  match = re.search('(.* )?\[((?P<source>(|youtube))-)?(?P<id>.*)\]', reverse_path[1], re.IGNORECASE)
    if source or id:
      Log.info("Forced ID in season folder: '{}' with id '{}' in series folder".format(source, id))

purposelycryptic · 2018-08-21T16:45:54Z

Well, I guess that explains where this line came from in the log for the series folder "Owarimonogatari (2017) [Dmon] [BD, 720p]":

Forced ID in series folder: 'youtube' with id 'Dmon'

Didn't seem to cause any harm, but it did seem rather odd - completely forgot that ASS is used by ZeroQI's YouTube-Agent too.

EndOfLine369 · 2018-08-22T03:16:51Z

Hey @ZeroQI,
That commit really adds in a lot of regex searches and additional regex strings that I don't see as required. Which still did not resolve the random matches that were seen like '[Dmon]'.

This should be easily simplified by a single regex.

...
SOURCE_IDS                = '\[((?P<source>(anidb(|2)|tvdb(|[2-5])|tmdb|tsdb|imdb|youtube(|2)))-(?P<id>[^\[\]]*)|(?P<yt>(PL[^\[\]]{16}|UC[^\[\]]{22})))\]'
...
    match = re.search(SOURCE_IDS, folder_show, re.IGNORECASE)
    if not match and len(reverse_path)>1:
      match = re.search(SOURCE_IDS, reverse_path[1], re.IGNORECASE)
      if match:  Log.info("Forced ID sourced from series folder")
    if match:
      if match.groupdict().has_key('yt') and match.group('yt'):
        source, id = 'youtube', match.group('yt')
      else:
        id     = match.group('id'    ) if match.groupdict().has_key('id'    ) and match.group('id'    ) else '' 
        source = match.group('source') if match.groupdict().has_key('source') and match.group('source') else ''
      Log.info("Forced ID: '{}' with id '{}'".format(source, id))
    else:
      for file in SOURCE_ID_FILES:
...

EX:

>>> SOURCE_IDS                = '\[((?P<source>(anidb(|2)|tvdb(|[2-5])|tmdb|tsdb|imdb|youtube(|2)))-(?P<id>[^\[\]]*)|(?P<yt>(PL[^\[\]]{16}|UC[^\[\]]{22})))\]'
>>> qwe = 'asd [qwe] [anidb-123]'                         #Will properly ignore '[qwe]'
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': 'anidb', 'yt': None, 'id': '123'}
>>> qwe = 'zxcasd [PL1234567890123456]'
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': None, 'yt': 'PL1234567890123456', 'id': None}
>>> qwe = 'zxcasd [anidb-123] [PL1234567890123456]'
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': 'anidb', 'yt': None, 'id': '123'}
>>> qwe = 'zxcasd [PL1234567890123456] [anidb-123]'
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': None, 'yt': 'PL1234567890123456', 'id': None}
>>> qwe = 'zxcasd [PL12345678901234567890] [anidb-123]'   #To many numbers after 'PL'
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': 'anidb', 'yt': None, 'id': '123'}
>>> qwe = 'asd [youtube-654321]'
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': 'youtube', 'yt': None, 'id': '654321'}
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': 'youtube2', 'yt': None, 'id': '654321'}
>>> qwe = 'asd [tvdb3-321]'
>>> re.search(SOURCE_IDS, qwe, re.IGNORECASE).groupdict()
{'source': 'tvdb3', 'yt': None, 'id': '321'}

(Edit: Condensed the predefined 'SOURCE_IDS' ids)

#175

ZeroQI · 2018-08-22T07:39:05Z

@EndOfLine369 Condensed 'SOURCE_IDS' is indeed clearer and shorter.
Reduced coding further but it is way clearer now thanks to you, so thank you.

EndOfLine369 · 2018-08-22T07:46:38Z

Indeed even looks more cleaner 😄 Team effort.
That should be it then for this then so closing.

ZeroQI added a commit that referenced this issue Aug 21, 2018

Stricter enforcement of Youtube ids

779b909

#175 #172 (comment) source name is optional for youtube playlist and channels as already super long youtube id can be in subfolder (series folder inside grouping folder)

EndOfLine369 added a commit that referenced this issue Aug 22, 2018

Update to 'SOURCE_ID' parsing

4dffc21

#175

EndOfLine369 closed this as completed Aug 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random source ids of 'youtube' #175

Random source ids of 'youtube' #175

EndOfLine369 commented Aug 21, 2018 •

edited

Loading

ZeroQI commented Aug 21, 2018

purposelycryptic commented Aug 21, 2018

EndOfLine369 commented Aug 22, 2018 •

edited

Loading

ZeroQI commented Aug 22, 2018

EndOfLine369 commented Aug 22, 2018

Random source ids of 'youtube' #175

Random source ids of 'youtube' #175

Comments

EndOfLine369 commented Aug 21, 2018 • edited Loading

ZeroQI commented Aug 21, 2018

purposelycryptic commented Aug 21, 2018

EndOfLine369 commented Aug 22, 2018 • edited Loading

ZeroQI commented Aug 22, 2018

EndOfLine369 commented Aug 22, 2018

EndOfLine369 commented Aug 21, 2018 •

edited

Loading

EndOfLine369 commented Aug 22, 2018 •

edited

Loading