Skip to content
This repository has been archived by the owner on Jun 15, 2021. It is now read-only.

Release sizes are sometimes wrong #217

Closed
gkoh opened this issue Oct 7, 2015 · 12 comments
Closed

Release sizes are sometimes wrong #217

gkoh opened this issue Oct 7, 2015 · 12 comments

Comments

@gkoh
Copy link
Contributor

gkoh commented Oct 7, 2015

In quite a few releases on my installation, the size in the database does not match the size in the nzb.
I'm not sure why.

I've had quite a few releases report ~700MB, but actually turn out to be 10GB+ releases.

If I zero the size in the release table, then run 'fill_sizes_from_nzb.py' it gets the right size.

I haven't traced through the code to work out how the size is initially populated, but it doesn't seem to come from the nzb.
If you could tell me where to start looking, I might get some time to dig around.

@gkoh
Copy link
Contributor Author

gkoh commented Oct 7, 2015

I should note that I've only really noticed this on larger releases, eg those that are recorded at several hundred MB, but are really several GB.
I haven't noticed smaller releases (eg. less than 1GB) having this problem.

@jamesmeneghello
Copy link
Owner

Size is calculated twice: when building the NZB (by taking the size of each rar/rarpiece) and then again during post-processing. The NZB-based size isn't accurate (because it's compressed and includes extras such as samples, pars, etc). The post-processed size goes into the release rars and gets the sizes from the rar headers, which should theoretically be correct. I haven't looked at the math in the rar size checking in a long time and it's probably missing some edge cases, but that's where you should look: pynab/rars.py.

@gkoh
Copy link
Contributor Author

gkoh commented Oct 7, 2015

Thanks for that.
After following the bouncing ball, there's a slightly concerning TODO at lib/rar.py:234

fileinfo.file_size = unp_size   #TODO: What about >2GiB files? (Zip64 equivalent?)

I'm not sure what the real impact of that note is.

@jamesmeneghello
Copy link
Owner

Me either. That was a library written by someone (author's probably at the top) that I used because it was the only pure-python RAR implementation that would correctly read the headers from an incomplete RAR file. Most others (including the standard RAR library) would simply fail when it didn't have the full file. That said, I definitely do have releases with a post-processed size over 2gb, so it can't make too much difference..

@gkoh
Copy link
Contributor Author

gkoh commented Oct 7, 2015

Yep, found it here:
http://ssokolow.com/scripts/
in particular here:
http://ssokolow.com/scripts/rar.py

No updates or additional comments since the version committed in pynab, could be a red-herring.

@gkoh
Copy link
Contributor Author

gkoh commented Oct 7, 2015

Looks like there's a problem in the rar library decoding some rars.
I grabbed the first rar file of one of the erroneous releases and dumped the metadata.
The command line shows:

Details: RAR 4, volume
 Attributes      Size     Date    Time   Name
----------- ---------  ---------- -----  ----
    ..A.... 9384847541  2015-09-28 09:47 <something totally legit>

Whereas dumping similar data using the library:

>>> info[0].file_size
794912949
>>> info[0].filename
''
>>> info[0].date_time
time.struct_time(tm_year=2007, tm_mon=11, tm_mday=15, tm_hour=13, tm_min=47, tm_sec=16, tm_wday=3, tm_yday=319, tm_isdst=0)

ie. None of the content data is properly parsed.

@gkoh
Copy link
Contributor Author

gkoh commented Oct 7, 2015

Found it, there's a bug in the rar library handling releases greater than 4GB in size.
I've fixed it in my sandbox for now, it needs some testing.

@jamesmeneghello
Copy link
Owner

Nice catch!

@gkoh
Copy link
Contributor Author

gkoh commented Oct 8, 2015

Whilst the fix is testing on my production instance, it raises a few questions:

  1. The bug results in 32-bit truncation, so ~4.3GB. I have had some releases greater than that, but I don't understand how
  2. There will be a number of existing releases with bad sizes, I don't know how to find and fix these without zeroing everything and running 'fill_sizes_from_nzbs.py'

@jamesmeneghello
Copy link
Owner

  1. I suspect this is because there are multiple files within the releases, and the 4.3gb limit may only apply to a single file?

  2. Setting the password status of a releases to UNKNOWN should trigger a full re-processing of RARs. Of course, that'd take a long time.

@gkoh
Copy link
Contributor Author

gkoh commented Oct 8, 2015

On Wed, 2015-10-07 at 22:24 -0700, James Meneghello wrote:

  1. I suspect this is because there are multiple files within the
    releases, and the 4.3gb limit may only apply to a single file?

Yeah, that's quite possible.
The 4.3GB limit applies to a single file header within the rar.

  1. Setting the password status of a releases to UNKNOWN should
    trigger a full re-processing of RARs. Of course, that'd take a long
    time.

I've submitted the pull request.
At the moment I'm satisfied with future releases having the right size.
Correcting the older releases can wait for another day.

jamesmeneghello added a commit that referenced this issue Oct 9, 2015
Fix #217, large releases have wrong sizes.
@jamesmeneghello
Copy link
Owner

I'll include the correction of old releases with the next alembic upgrade, which I suspect will deal with TVRage (whatever happens with that).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants