Skip to content
This repository has been archived by the owner on Jun 15, 2021. It is now read-only.

To do with pres but best not to say #311

Open
ukharley opened this issue Aug 6, 2017 · 7 comments
Open

To do with pres but best not to say #311

ukharley opened this issue Aug 6, 2017 · 7 comments

Comments

@ukharley
Copy link

ukharley commented Aug 6, 2017

Here's what I've managed on my own. I leave it up to those with better progamming skills to do something useful with it and update the repo.

from sqlalchemy import Column, Integer, BigInteger, LargeBinary, Text, String, Boolean, DateTime, ForeignKey, \
    create_engine, UniqueConstraint, Enum, Index, func, and_, exc, event
from pynab.db import db_session, Pre, Release

import datetime
import pytz
import hashlib

def get_hash(pre_time, title, req):
    with db_session() as db:
        title = bytes(title, 'utf-8')
        req = bytes(req, 'utf-8')
        case_title = [title,
                    title.lower(),
                    title.upper()
                    ]
        found = False
        for c in case_title:
            if not found:
                lst = [c,
                       c + req,
                       c + req + req
                       ]

                for a in lst:
                    md5 = hashlib.md5()
                    sha1 = hashlib.sha1()
                    md5.update(a)
                    sha1.update(a)
                    m_query = db.query(Release).filter(((Release.original_name) == md5.hexdigest())).first()
                    if m_query:
                        print("{} {:100s} {:^48s}".format(pre_time, str(a), md5.hexdigest()))
                        found = True
                        break
                    s_query = db.query(Release).filter(((Release.original_name) == sha1.hexdigest())).first()
                    if s_query:
                        print("{} {:100s} {:^48s}".format(pre_time, str(a), sha1.hexdigest()))
                        found = True
                        break
                if found:
                    break

def get_releases(days):

    with db_session() as db:
        # filter set to just lookup a.b.moovee
        p_query = db.query(Pre).filter((Pre.pretime >= (datetime.datetime.now(pytz.utc) - datetime.timedelta(days=days))) & (Pre.requestgroup == "alt.binaries.moovee"))
        p_query = p_query.order_by(Pre.pretime.desc())
        for pre in p_query.all():
            get_hash(pre.pretime, pre.name, str(pre.requestid))

if __name__ == '__main__':
    get_releases(90) # set to search 90 days of pre's

@gkoh
Copy link
Contributor

gkoh commented Sep 28, 2017

I note this is targeted at moovees, would this apply to teevee?

@ctero
Copy link

ctero commented Sep 28, 2017

I believe it would only be needed for moovee group. I read that somewhat recently the moovee posting behavior changed.

@gkoh
Copy link
Contributor

gkoh commented Sep 29, 2017

Right, I found a reddit post explaining those changes, then only a bit later indicating it had reverted.

I ask only due to noticing a number of hashed titles in teevee but they surely don't match anything I can find/generate.

@ukharley
Copy link
Author

Try this group: alt.binaries.solar-xl
Quite a few of the hashed movie releases are ending up in it. Rar and the other processing finds the name most of the time. when it doesn't, the above code will.

@brookesy2
Copy link
Collaborator

@gkoh @ukharley So did these changes actually happened. Does this need to be incorporated?

@ukharley Are you currently using this in postproc or at time of scan?

@gkoh
Copy link
Contributor

gkoh commented Oct 3, 2017

I gave this a shot and ended up with an a CPU being soaked for 24 hours seemingly without end (I killed it).
I can see what is trying to be done here and had a look at how nzedb does it.

There they generate the comparison hashes during pre entry, so the pre table has a few extra columns. This is then compared and handled during postprocess.

This effectively trades DB space for postprocess CPU time.

Default PostgreSQL has an MD5 function built-in, we could get more hash functions if we require pgcrypto. This would speed up the hash compute time, I assume the Python ones are a bit less efficient.

@ukharley
Copy link
Author

ukharley commented Oct 3, 2017

@brookesy
Proof of concept only. I run it manually if and when needed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants