feat: implement filesize validation for movies and episodes #869

davidemarcoli · 2024-11-06T19:32:39Z

Don't remove it again @Gaisberg! 😡

Pull Request Check List

Resolves: #issue-number-here

Added tests for changed code.
Updated documentation for changed code.

Description:

Summary by CodeRabbit

New Features
- Enhanced file validation for movies and shows based on acceptable file sizes.
- New functions introduced to check file size validity for movies and shows.
- Introduced enumerations for media types to improve type safety and clarity.
Bug Fixes
- Improved error handling for rate limits in request processing.
Refactor
- Simplified file size validation logic by transitioning from instance variables to global functions and variables.
- Updated minimum file size constraints to enforce non-negative limits for movies and episodes.
- Ensured consistency in service results by converting keys to lowercase in the scraping process.

coderabbitai · 2024-11-06T19:32:46Z

Walkthrough

The changes introduce a new exception class, InvalidFileSizeException, and a new result class, DownloadCachedStreamResult, to the Downloader class in src/program/services/downloaders/__init__.py. The download_cached_stream method now returns an instance of this class. The run method has been updated to validate file sizes using the new validate_filesize method. Additionally, the FileFinder class in src/program/services/downloaders/shared.py has been refactored to use global variables for file size limits and modularize validation logic. Default values for minimum file sizes in DownloadersModel have been updated to enforce non-negative constraints.

Changes

File Path	Change Summary
`src/program/services/downloaders/__init__.py`	Added `InvalidFileSizeException` and `DownloadCachedStreamResult`. Updated `download_cached_stream` to return `DownloadCachedStreamResult`. Introduced `validate_filesize` and `_get_item_media_type` methods. Modified `update_item_attributes` for media type validation.
`src/program/services/downloaders/shared.py`	Refactored `FileFinder` class to use global variables for file size limits. Added functions for file size validation and centralized checks.
`src/program/settings/models.py`	Updated default values for `movie_filesize_mb_min` and `episode_filesize_mb_min` from `-1` to `0` in `DownloadersModel`.
`src/program/media/__init__.py`	Expanded imports to include `ShowMediaType`, `MovieMediaType`, and `MediaType`.
`src/program/media/item.py`	Introduced enumerations `ShowMediaType`, `MovieMediaType`, and `MediaType`. Updated `MediaItem` subclasses to use these enumerations instead of strings.
`src/program/utils/request.py`	Removed logging for rate limits and request failures in `_request`. Updated `handle_response` to raise `RateLimitExceeded` exception and added type hints.
`src/program/services/scrapers/__init__.py`	Modified `scrape` method to convert keys in `service_results` to lowercase for consistency.

Possibly related PRs

feat: integrate dependency injection with kink library #859: The main PR introduces a new exception class and modifies the Downloader class, while the retrieved PR focuses on integrating dependency injection across various services, including the Downloader. This connection suggests that the changes in the main PR may benefit from the dependency injection framework established in the retrieved PR.

Suggested reviewers

Gaisberg

Poem

In the burrow where files do play,
New checks for sizes come to stay.
Movies and shows, now fit just right,
Validations shine, oh what a sight!
A hop, a skip, through code we glide,
With size checks strong, we take great pride! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (2)

src/program/services/downloaders/__init__.py (1)

Line range hint 94-108: Consider architectural improvements for file validation

The current implementation could benefit from the following improvements:

Add logging for rejected files to help debug issues

Consider implementing a retry mechanism for borderline cases

Extract the validation logic into a separate validator class for better separation of concerns

Example structure:
class MediaFileValidator:
    def __init__(self, file_finder):
        self.file_finder = file_finder
        
    def validate_file(self, file, media_type):
        try:
            filesize = file[self.file_finder.filesize_attr]
            if media_type == "movie":
                return self._validate_movie(file, filesize)
            return self._validate_show(file, filesize)
        except KeyError as e:
            logger.warning(f"Validation failed: {str(e)}")
            return False

src/program/services/downloaders/shared.py (1)

121-125: Consider encapsulating filesize configuration.

The global variables could benefit from:

Type hints for better code maintainability
Documentation explaining their purpose and units (MB)
Encapsulation in a configuration class

Consider refactoring to:

from dataclasses import dataclass

@dataclass
class FileSizeConfig:
    """Configuration for file size validation in megabytes."""
    min_movie_filesize: int
    max_movie_filesize: int
    min_episode_filesize: int
    max_episode_filesize: int
    
    @classmethod
    def from_settings(cls, settings):
        return cls(
            min_movie_filesize=settings.downloaders.movie_filesize_mb_min,
            max_movie_filesize=settings.downloaders.movie_filesize_mb_max,
            min_episode_filesize=settings.downloaders.episode_filesize_mb_min,
            max_episode_filesize=settings.downloaders.episode_filesize_mb_max
        )

filesize_config = FileSizeConfig.from_settings(settings_manager.settings)
are_filesizes_valid = False

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3ad6cae and 95e0cba.

📒 Files selected for processing (2)

src/program/services/downloaders/__init__.py (3 hunks)
src/program/services/downloaders/shared.py (1 hunks)

🔇 Additional comments (3)

src/program/services/downloaders/__init__.py (2)

9-9: LGTM: Clean import of shared validation functions

The import is well-organized and follows good practices by keeping validation logic in a shared module.

94-94: Add error handling for file size attribute access

The file size validation is correctly integrated, but consider adding error handling for cases where the file size attribute might be missing or invalid.

-            if item.type == "movie" and self.service.file_finder.container_file_matches_movie(file) and filesize_is_acceptable_movie(file[self.service.file_finder.filesize_attr]):
+            try:
+                filesize = file[self.service.file_finder.filesize_attr]
+                if item.type == "movie" and self.service.file_finder.container_file_matches_movie(file) and filesize_is_acceptable_movie(filesize):
+            except KeyError:
+                logger.warning(f"Missing file size information for {file.get(self.service.file_finder.filename_attr, 'unknown file')}")
+                continue

Let's verify the file size attribute usage across the codebase:

src/program/services/downloaders/shared.py (1)

144-164: Add unit tests for filesize validation.

The PR checklist indicates that tests haven't been added yet. Please add unit tests to verify:

Validation of configuration values
Filesize validation for both movies and shows
Edge cases (min/max boundaries, invalid settings)
Error logging scenarios

Would you like me to help generate the unit test suite for these functions?

src/program/services/downloaders/__init__.py

src/program/services/downloaders/shared.py

coderabbitai · 2024-11-06T19:36:27Z

src/program/services/downloaders/shared.py

+def filesize_is_acceptable_movie(filesize):
+    if not are_filesizes_valid:
+        logger.error("Filesize settings are invalid, movie file sizes will not be checked.")
+        return True
+    min_size = settings_manager.settings.downloaders.movie_filesize_mb_min * 1_000_000
+    max_size = settings_manager.settings.downloaders.movie_filesize_mb_max * 1_000_000 if settings_manager.settings.downloaders.movie_filesize_mb_max != -1 else float("inf")
+    is_acceptable = min_size <= filesize <= max_size
+    if not is_acceptable:
+        logger.debug(f"Filesize {filesize} is not within acceptable range {min_size} - {max_size}")
+    return is_acceptable
+


🛠️ Refactor suggestion

Add type hints and extract common validation logic.

The function needs:

Type hints for parameters and return value

A docstring explaining the validation rules

Extracted common validation logic shared with filesize_is_acceptable_show

Consider refactoring to:

BYTES_PER_MB = 1_000_000 def _convert_to_bytes(size_mb: int) -> int: """Convert size from megabytes to bytes.""" return size_mb * BYTES_PER_MB def filesize_is_acceptable_movie(filesize: int) -> bool: """ Check if a movie file size is within acceptable range. Args: filesize: Size in bytes to validate Returns: bool: True if size is within configured range, False otherwise """ if not are_filesizes_valid: logger.error("Filesize settings are invalid, movie file sizes will not be checked.") return True min_size = _convert_to_bytes(settings_manager.settings.downloaders.movie_filesize_mb_min) max_size = (float("inf") if settings_manager.settings.downloaders.movie_filesize_mb_max == -1 else _convert_to_bytes(settings_manager.settings.downloaders.movie_filesize_mb_max)) is_acceptable = min_size <= filesize <= max_size if not is_acceptable: logger.debug(f"Movie filesize {filesize} bytes is not within acceptable range {min_size} - {max_size} bytes") return is_acceptable

src/program/services/downloaders/shared.py

coderabbitai

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (2)

src/program/services/downloaders/shared.py (1)

128-192: Add unit tests for the validation functions.

The implementation looks good, but as noted in the PR description, tests are missing. Consider adding tests for:

Validation of configuration values

Size conversion logic

Edge cases (e.g., -1 values, boundary conditions)

Invalid input handling

Would you like me to help generate a test suite for these validation functions?
src/program/settings/models.py (1)
65-68: Consider adding custom types for filesize fields.

To improve type safety and make the units more explicit, consider creating a custom type for filesizes.
from pydantic import conint
from typing import Literal, Union

FileSizeMB = conint(ge=0)
NoLimit = Literal[-1]
FileSizeConstraint = Union[FileSizeMB, NoLimit]

# Then use in the model:
movie_filesize_mb_min: FileSizeMB
movie_filesize_mb_max: FileSizeConstraint
episode_filesize_mb_min: FileSizeMB
episode_filesize_mb_max: FileSizeConstraint

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 95e0cba and dfcd589.

📒 Files selected for processing (3)

src/program/services/downloaders/__init__.py (3 hunks)
src/program/services/downloaders/shared.py (2 hunks)
src/program/settings/models.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/program/services/downloaders/init.py

🔇 Additional comments (4)

src/program/services/downloaders/shared.py (3)

3-3: LGTM: Clean initialization of configuration variables.

The global variables are properly initialized from settings and follow a consistent naming pattern.

Also applies to: 121-126

128-146: LGTM: Well-implemented validation helpers.

The implementation follows the suggested patterns from previous reviews.

188-192: 🛠️ Refactor suggestion

Update public functions to use MediaType enum.

After implementing the MediaType enum, update these functions accordingly:

 def filesize_is_acceptable_movie(filesize: int) -> bool:
-    return _validate_filesize(filesize, "movie")
+    return _validate_filesize(filesize, MediaType.MOVIE)

 def filesize_is_acceptable_show(filesize: int) -> bool:
-    return _validate_filesize(filesize, "show")
+    return _validate_filesize(filesize, MediaType.SHOW)

Likely invalid or redundant comment.

src/program/settings/models.py (1)

65-68: Verify the impact on existing configurations.

The change from -1 to 0 for minimum sizes might affect existing configurations.

src/program/services/downloaders/shared.py

src/program/settings/models.py

src/program/services/downloaders/__init__.py

src/program/services/downloaders/shared.py

…gging and delete invalid torrents

- Added `ShowMediaType`, `MovieMediaType`, and `MediaType` enums to categorize media types. - Refactored `filesize_is_acceptable` function to use media type enums for validation. - Updated `Downloader` class to utilize the new media type enums for file validation. - Consolidated movie and show filesize validation into a single function. - Enhanced code readability and maintainability by using enums and reducing redundancy.

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (9)

src/program/media/__init__.py (1)
1-1: LGTM! Consider adding module docstring.

The addition of media type enums improves type safety by replacing string literals. While the import line is long, keeping related imports together is preferred over splitting them.

Consider adding a module docstring to document the purpose of these media types and their relationships:
"""
Media type definitions and base classes for the Riven media system.

This module provides the core media item types (Movie, Show, Episode, etc.) along with
their associated media type enumerations for type-safe media classification.
"""
src/program/services/downloaders/shared.py (3)
8-8: Remove unused import.

The ShowMediaType import is not used in this file.
-from program.media import MovieMediaType, ShowMediaType
+from program.media import MovieMediaType
🧰 Tools

🪛 Ruff

8-8: program.media.ShowMediaType imported but unused

Remove unused import: program.media.ShowMediaType

(F401)

123-126: Consider using a settings accessor class instead of global variables.

Global variables for file size limits could lead to stale values if settings are updated at runtime. Consider encapsulating these in a class that can handle dynamic updates and validation.
class FileSizeLimits:
    @property
    def movie_min(self) -> int:
        return settings_manager.settings.downloaders.movie_filesize_mb_min

    @property
    def movie_max(self) -> int:
        return settings_manager.settings.downloaders.movie_filesize_mb_max

    @property
    def episode_min(self) -> int:
        return settings_manager.settings.downloaders.episode_filesize_mb_min

    @property
    def episode_max(self) -> int:
        return settings_manager.settings.downloaders.episode_filesize_mb_max

filesize_limits = FileSizeLimits()
188-189: Add documentation to public function.

The public interface should have proper documentation explaining its usage and parameters.
 def filesize_is_acceptable(filesize: int, media_type: str) -> bool:
+    """
+    Check if a file size is within acceptable limits for the given media type.
+    
+    Args:
+        filesize: Size in bytes to validate
+        media_type: Media type string (use MovieMediaType.Movie.value or similar)
+        
+    Returns:
+        bool: True if size is within configured range, False otherwise
+    """
     return _validate_filesize(filesize, media_type)
src/program/services/downloaders/__init__.py (2)
Line range hint 103-122: Consider simplifying the episode matching logic.

The nested conditionals and multiple parent traversals make the code harder to follow. Consider extracting this logic into a helper method.
-if item.type in (media_type.value for media_type in ShowMediaType):
-    show = item
-    if item.type == ShowMediaType.Season.value:
-        show = item.parent
-    elif item.type == ShowMediaType.Episode.value:
-        show = item.parent.parent
+def _resolve_show_from_item(item):
+    if item.type == ShowMediaType.Episode.value:
+        return item.parent.parent
+    if item.type == ShowMediaType.Season.value:
+        return item.parent
+    return item
+
+if item.type in (media_type.value for media_type in ShowMediaType):
+    show = _resolve_show_from_item(item)
124-134: Enhance logging and fix f-string.

The validation logic is sound, but there are two improvements to consider:
-            logger.warning(f"Couldn't find torrent id to delete")
+            logger.warning("Couldn't find torrent id to delete")
+            logger.debug(f"File size validation failed for {item.log_string}: size={file[self.service.file_finder.filesize_attr]}")
🧰 Tools

🪛 Ruff

130-130: f-string without any placeholders

Remove extraneous f prefix

(F541)
src/program/media/item.py (3)
21-36: Consider refactoring enum structure to avoid value duplication.

While the introduction of enums improves type safety, the current structure duplicates values in MediaType. Consider these alternatives:

Use a single enum with a hierarchical structure:
class MediaType(Enum):
    """Media types with hierarchical structure"""
    class Show:
        SHOW = "show"
        SEASON = "season"
        EPISODE = "episode"
    
    class Movie:
        MOVIE = "movie"
Or use class inheritance:
class BaseMediaType(Enum):
    """Base media type enum"""
    pass

class ShowMediaType(BaseMediaType):
    """Show media types"""
    SHOW = "show"
    SEASON = "season"
    EPISODE = "episode"

class MovieMediaType(BaseMediaType):
    """Movie media types"""
    MOVIE = "movie"
4-4: Add tests for the new enum implementations.

The TODO comment indicates missing tests. With the introduction of new enums and changes to type assignments, it's crucial to add tests that verify:

Correct enum value assignments

Type consistency across the media item hierarchy

State transitions with the new type system

Would you like me to help create a test suite for these changes?

Line range hint 563-599: Consider simplifying state determination logic.

The _determine_state method in the Season class contains complex nested conditions. Consider refactoring to use a more maintainable approach:
def _determine_state(self):
    if not self.episodes:
        return States.Unreleased
        
    state_counts = Counter(episode.state for episode in self.episodes)
    
    if state_counts[States.Completed] == len(self.episodes):
        return States.Completed
    if States.Unreleased in state_counts:
        return States.Ongoing if len(state_counts) > 1 else States.Unreleased
    if States.Completed in state_counts:
        return States.PartiallyCompleted
    
    # Continue with other state checks...

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between dfcd589 and ad40587.

📒 Files selected for processing (5)

src/program/media/__init__.py (1 hunks)
src/program/media/item.py (6 hunks)
src/program/services/downloaders/__init__.py (3 hunks)
src/program/services/downloaders/shared.py (2 hunks)
src/program/settings/models.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/program/settings/models.py

🧰 Additional context used

🪛 Ruff

src/program/services/downloaders/__init__.py

130-130: f-string without any placeholders

Remove extraneous f prefix

(F541)

src/program/services/downloaders/shared.py

8-8: program.media.ShowMediaType imported but unused

Remove unused import: program.media.ShowMediaType

(F401)

🔇 Additional comments (4)

src/program/services/downloaders/__init__.py (4)

7-7: LGTM: Import changes are appropriate.

The new imports support the filesize validation functionality and media type enums.

Also applies to: 11-11

94-96: LGTM: File size validation is well integrated.

The validation is correctly placed before processing files, preventing unnecessary processing of invalid files.

136-140: LGTM: Clean media type resolution.

The static method provides a clear and reusable way to determine media types.

124-140: ⚠️ Potential issue

Add test coverage for the file size validation feature.

This is a critical feature that requires test coverage. Please add tests for:

File size validation for movies
File size validation for different show types (season/episode)
Edge cases (boundary values, invalid sizes)
Cleanup behavior when validation fails

Note: This reinforces the previous review comment about missing tests.

🧰 Tools

🪛 Ruff

130-130: f-string without any placeholders

Remove extraneous f prefix

(F541)

src/program/services/downloaders/shared.py

src/program/media/item.py

Caller should log on exception.

…crapers.

- Introduced `InvalidFileSizeException` for better error handling. - Added `DownloadCachedStreamResult` class to encapsulate download results. - Refactored `validate_filesize` to raise exceptions and log invalid file sizes. - Enhanced logging with `get_invalid_filesize_log_string` for clearer messages. - Improved structure and readability of the downloader service.

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (7)

src/program/services/scrapers/__init__.py (1)

76-81: LGTM! Consider these improvements for better efficiency and type safety.

The lowercase normalization is a good practice for ensuring consistent info hash format. However, there are a few potential improvements:

The isinstance check is redundant as service_results was already validated.

The implementation could be more efficient using dict comprehension.

Consider this more efficient implementation:
-            # ensure that info hash is lower case in each result
-            if isinstance(service_results, dict):
-                for infohash in list(service_results.keys()):
-                    if infohash.lower() != infohash:
-                        service_results[infohash.lower()] = service_results.pop(infohash)
+            # Ensure info hashes are lowercase
+            service_results = {
+                infohash.lower(): result 
+                for infohash, result in service_results.items()
+            }
Also, consider adding a type hint for service_results to improve type safety:
def run_service(service, item) -> None:
    nonlocal total_results
    service_results: Dict[str, str] = service.run(item)

src/program/services/downloaders/shared.py (3)

8-8: Remove unused import ShowMediaType.

The ShowMediaType import is not used in the code. Consider removing it to keep the imports clean.
-from program.media import MovieMediaType, ShowMediaType
+from program.media import MovieMediaType
🧰 Tools

🪛 Ruff

8-8: program.media.ShowMediaType imported but unused

Remove unused import: program.media.ShowMediaType

(F401)

123-126: Add type hints to global variables and consider encapsulation.

The global variables storing filesize limits would benefit from:

Type hints for better code clarity and IDE support

Encapsulation within a configuration class for better maintainability and testability
-min_movie_filesize = settings_manager.settings.downloaders.movie_filesize_mb_min
-max_movie_filesize = settings_manager.settings.downloaders.movie_filesize_mb_max
-min_episode_filesize = settings_manager.settings.downloaders.episode_filesize_mb_min
-max_episode_filesize = settings_manager.settings.downloaders.episode_filesize_mb_max
+min_movie_filesize: int = settings_manager.settings.downloaders.movie_filesize_mb_min
+max_movie_filesize: int = settings_manager.settings.downloaders.movie_filesize_mb_max
+min_episode_filesize: int = settings_manager.settings.downloaders.episode_filesize_mb_min
+max_episode_filesize: int = settings_manager.settings.downloaders.episode_filesize_mb_max
188-191: Improve error message formatting in get_invalid_filesize_log_string.

The error message could be more user-friendly by:

Handling the case when max_mb is -1 (unlimited)

Using consistent decimal places for file sizes
 def get_invalid_filesize_log_string(filesize: int, media_type: str) -> str:
     min_mb, max_mb = _get_size_limits(media_type)
     friendly_filesize = round(filesize / BYTES_PER_MB, 2)
-    return f"{friendly_filesize} MB is not within acceptable range of [{min_mb}MB] to [{max_mb}MB]"
+    max_size_str = "unlimited" if max_mb == -1 else f"{max_mb}MB"
+    return f"{friendly_filesize:.2f} MB is not within acceptable range of [{min_mb}MB] to [{max_size_str}]"

src/program/services/downloaders/__init__.py (3)

16-21: Consider using @DataClass and type hints.

The class could be more concise and type-safe using Python's dataclass decorator.

Here's a suggested improvement:

+from dataclasses import dataclass
+from typing import Optional, Dict, Any

+@dataclass
 class DownloadCachedStreamResult:
-    def __init__(self, container=None, torrent_id=None, info=None, info_hash=None):
-        self.container = container
-        self.torrent_id = torrent_id
-        self.info = info
-        self.info_hash = info_hash
+    container: Optional[Dict[str, Any]] = None
+    torrent_id: Optional[int] = None
+    info: Optional[Dict[str, Any]] = None
+    info_hash: Optional[str] = None

Line range hint 66-77: Add error handling for external service calls.

The method makes several external service calls (get_instant_availability, add_torrent, etc.) without specific error handling. This could lead to unclear error messages or incomplete cleanup.

Consider wrapping each service call with specific error handling:

 def download_cached_stream(self, item: MediaItem, stream: Stream) -> DownloadCachedStreamResult:
-    cached_containers = self.get_instant_availability([stream.infohash]).get(stream.infohash, None)
-    if not cached_containers:
-        raise Exception("Not cached!")
-    the_container = cached_containers[0]
-    torrent_id = self.add_torrent(stream.infohash)
-    info = self.get_torrent_info(torrent_id)
+    try:
+        cached_containers = self.get_instant_availability([stream.infohash]).get(stream.infohash, None)
+        if not cached_containers:
+            raise Exception(f"Stream {stream.infohash} is not cached")
+        the_container = cached_containers[0]
+    except Exception as e:
+        logger.error(f"Failed to check availability: {e}")
+        raise
+
+    try:
+        torrent_id = self.add_torrent(stream.infohash)
+        info = self.get_torrent_info(torrent_id)
+    except Exception as e:
+        logger.error(f"Failed to add/get torrent info: {e}")
+        raise

Line range hint 100-126: Consider extracting show-specific logic to improve maintainability.

The method has complex nested conditionals for show/episode handling that could be simplified.

Consider extracting the show-specific logic:

+    def _update_show_attributes(self, item: MediaItem, show: MediaItem, file: dict, info: dict) -> bool:
+        file_season, file_episodes = self.service.file_finder.container_file_matches_episode(file)
+        if not (file_season and file_episodes):
+            return False
+
+        season = next((season for season in show.seasons if season.number == file_season), None)
+        found = False
+        for file_episode in file_episodes:
+            episode = next((episode for episode in season.episodes if episode.number == file_episode), None)
+            if episode and episode.state not in [States.Completed, States.Symlinked, States.Downloaded]:
+                self._set_episode_attributes(episode, file, info)
+                if item.type != ShowMediaType.Episode.value or (item.type == ShowMediaType.Episode.value and episode.number == item.number):
+                    found = True
+        return found
+
+    def _set_episode_attributes(self, episode: MediaItem, file: dict, info: dict):
+        episode.file = file[self.service.file_finder.filename_attr]
+        episode.folder = info["filename"]
+        episode.alternative_folder = info["original_filename"]
+        episode.active_stream = {"infohash": info["hash"], "id": info["id"]}

     def update_item_attributes(self, item: MediaItem, info, container) -> bool:
         found = False
-        item = item
-        container = container
         for file in container.values():
             if item.type == MovieMediaType.Movie.value and self.service.file_finder.container_file_matches_movie(file):
                 item.file = file[self.service.file_finder.filename_attr]
                 item.folder = info["filename"]
                 item.alternative_folder = info["original_filename"]
                 item.active_stream = {"infohash": info["hash"], "id": info["id"]}
                 found = True
                 break
             if item.type in (media_type.value for media_type in ShowMediaType):
                 show = item
                 if item.type == ShowMediaType.Season.value:
                     show = item.parent
                 elif item.type == ShowMediaType.Episode.value:
                     show = item.parent.parent
-                file_season, file_episodes = self.service.file_finder.container_file_matches_episode(file)
-                if file_season and file_episodes:
-                    season = next((season for season in show.seasons if season.number == file_season), None)
-                    for file_episode in file_episodes:
-                        episode = next((episode for episode in season.episodes if episode.number == file_episode), None)
-                        if episode and episode.state not in [States.Completed, States.Symlinked, States.Downloaded]:
-                            episode.file = file[self.service.file_finder.filename_attr]
-                            episode.folder = info["filename"]
-                            episode.alternative_folder = info["original_filename"]
-                            episode.active_stream = {"infohash": info["hash"], "id": info["id"]}
-                            if item.type != ShowMediaType.Episode.value or (item.type == ShowMediaType.Episode.value and episode.number == item.number):
-                                found = True
+                found = self._update_show_attributes(item, show, file, info)
         return found

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6c6d7c0 and 2d3200e.

📒 Files selected for processing (3)

src/program/services/downloaders/__init__.py (5 hunks)
src/program/services/downloaders/shared.py (2 hunks)
src/program/services/scrapers/__init__.py (1 hunks)

🧰 Additional context used

🪛 Ruff

src/program/services/downloaders/shared.py

8-8: program.media.ShowMediaType imported but unused

Remove unused import: program.media.ShowMediaType

(F401)

🔇 Additional comments (7)

src/program/services/downloaders/shared.py (4)

128-146: LGTM! Well-structured validation functions.

The validation helper functions are well-implemented with:

Clear docstrings
Proper type hints
Good error messages
Modular design

148-148: Add error handling for validation initialization.

This issue was raised in a previous review. Please refer to the existing comment about adding try-except block for validation initialization.

163-182: Optimize validation logic for unlimited max size.

This issue was raised in a previous review. Please refer to the existing comment about optimizing the validation logic when max_mb is -1.

128-191: Add unit tests for the validation functionality.

This issue was raised in a previous review. Please refer to the existing comment about adding comprehensive unit tests for all validation functions.

src/program/services/downloaders/__init__.py (3)

3-14: LGTM: Clean imports and well-structured exception class.

The new imports and exception class are well-organized and follow Python conventions.

52-61: LGTM: Robust error handling and cleanup.

The changes properly handle download results, validate file sizes, and clean up resources on failure. The error logging provides good debugging context.

128-139: Add tests for filesize validation logic.

The validation logic is well-implemented, but as noted in previous reviews, it lacks test coverage.

Let's verify if any tests have been added since the last review:

#!/bin/bash
# Check for any new test files related to filesize validation
fd "test.*downloaders.*\.py$"

# Check for any filesize validation tests in existing test files
rg -i "test.*filesize.*validation"

src/program/services/scrapers/__init__.py

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)

src/program/services/downloaders/__init__.py (4)

16-21: Consider enhancing the DownloadCachedStreamResult class with type hints and documentation.

The class structure is good, but could be improved for better maintainability and type safety.

Consider this enhancement:

+from dataclasses import dataclass
+from typing import Optional, Dict, Any

+@dataclass
 class DownloadCachedStreamResult:
+    """
+    Encapsulates the result of a cached stream download operation.
+    
+    Attributes:
+        container: Dictionary containing file information
+        torrent_id: ID of the torrent in the debrid service
+        info: Additional torrent information
+        info_hash: Hash of the torrent
+    """
-    def __init__(self, container=None, torrent_id=None, info=None, info_hash=None):
-        self.container = container
-        self.torrent_id = torrent_id
-        self.info = info
-        self.info_hash = info_hash
+    container: Optional[Dict[str, Any]] = None
+    torrent_id: Optional[int] = None
+    info: Optional[Dict[str, Any]] = None
+    info_hash: Optional[str] = None

Line range hint 66-77: Consider adding error handling for service operations.

While the method works correctly, it could benefit from more robust error handling for service operations.

Consider wrapping service calls with appropriate error handling:

 def download_cached_stream(self, item: MediaItem, stream: Stream) -> DownloadCachedStreamResult:
     cached_containers = self.get_instant_availability([stream.infohash]).get(stream.infohash, None)
     if not cached_containers:
         raise Exception("Not cached!")
     the_container = cached_containers[0]
-    torrent_id = self.add_torrent(stream.infohash)
-    info = self.get_torrent_info(torrent_id)
+    try:
+        torrent_id = self.add_torrent(stream.infohash)
+        info = self.get_torrent_info(torrent_id)
+    except Exception as e:
+        logger.error(f"Failed to add or get torrent info: {e}")
+        raise

Line range hint 100-126: Consider simplifying the complex conditional logic.

The nested conditions, especially for episode handling, could be made more readable.

Consider extracting the episode validation logic:

+    def _is_valid_episode(self, item: MediaItem, episode: MediaItem) -> bool:
+        """Check if the episode matches the item requirements."""
+        if item.type != ShowMediaType.Episode.value:
+            return True
+        return episode.number == item.number

     def update_item_attributes(self, item: MediaItem, info, container) -> bool:
         # ... existing code ...
-        if item.type != ShowMediaType.Episode.value or (item.type == ShowMediaType.Episode.value and episode.number == item.number):
+        if self._is_valid_episode(item, episode):
             found = True

135-139: Consider adding unit tests for media type determination.

The static helper method is well-implemented but needs test coverage.

Would you like me to help create unit tests for this functionality? I can generate test cases covering different media types and edge cases.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2d3200e and 1257bfe.

📒 Files selected for processing (1)

src/program/services/downloaders/__init__.py (5 hunks)

🔇 Additional comments (2)

src/program/services/downloaders/__init__.py (2)

3-7: LGTM! Clean imports and well-structured exception class.

The new imports and exception class are well-organized and follow Python conventions.

Also applies to: 13-14

52-61: LGTM! Well-structured error handling with proper cleanup.

The changes properly handle download results and cleanup on failure, with good separation of concerns between downloading and validation.

src/program/services/downloaders/__init__.py

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)

src/program/services/downloaders/__init__.py (4)

16-21: LGTM! Well-structured result class.

The DownloadCachedStreamResult provides good encapsulation of download-related data.

Consider adding docstring to document the purpose and parameters of this class.

68-77: Enhance logging for better debugging.

The method correctly returns the new result type, but the log message could be more informative.

Consider enhancing the log message to include file size information:

-logger.log("DEBRID", f"Downloaded {item.log_string} from '{stream.raw_title}' [{stream.infohash}]")
+logger.log("DEBRID", f"Downloaded {item.log_string} from '{stream.raw_title}' [{stream.infohash}] (Size: {the_container.get(list(the_container.keys())[0]).get('size', 'unknown')} bytes)")

Line range hint 94-132: Consider refactoring for better maintainability.

While the logic is correct, the method is quite long and has complex nesting, especially for show/episode handling.

Consider extracting show-specific logic into a separate method:

 def update_item_attributes(self, item: MediaItem, download_result: DownloadCachedStreamResult) -> bool:
     """Update the item attributes with the downloaded files and active stream"""
     found = False
-    item = item
     info_hash = download_result.info.get("hash", None)
     id = download_result.info.get("id", None)
     original_filename = download_result.info.get("original_filename", None)
     filename = download_result.info.get("filename", None)
     if not info_hash or not id or not original_filename or not filename:
         return False
     container = download_result.container
     for file in container.values():
         if item.type == MovieMediaType.Movie.value and self.service.file_finder.container_file_matches_movie(file):
             item.file = file[self.service.file_finder.filename_attr]
             item.folder = filename
             item.alternative_folder = original_filename
             item.active_stream = {"infohash": info_hash, "id": id}
             found = True
             break
         if item.type in (ShowMediaType.Show.value, ShowMediaType.Season.value, ShowMediaType.Episode.value):
-            show = item
-            if item.type == ShowMediaType.Season.value:
-                show = item.parent
-            elif item.type == ShowMediaType.Episode.value:
-                show = item.parent.parent
-            file_season, file_episodes = self.service.file_finder.container_file_matches_episode(file)
-            if file_season and file_episodes:
-                season = next((season for season in show.seasons if season.number == file_season), None)
-                for file_episode in file_episodes:
-                    episode = next((episode for episode in season.episodes if episode.number == file_episode), None)
-                    if episode and episode.state not in [States.Completed, States.Symlinked, States.Downloaded]:
-                        episode.file = file[self.service.file_finder.filename_attr]
-                        episode.folder = filename
-                        episode.alternative_folder = original_filename
-                        episode.active_stream = {"infohash": info_hash, "id": id}
-                        if item.type != ShowMediaType.Episode.value or (item.type == ShowMediaType.Episode.value and episode.number == item.number):
-                            found = True
+            found = self._update_show_attributes(item, file, info_hash, id, filename, original_filename)
     return found

+def _update_show_attributes(self, item: MediaItem, file: dict, info_hash: str, id: str, filename: str, original_filename: str) -> bool:
+    show = self._get_show_root(item)
+    file_season, file_episodes = self.service.file_finder.container_file_matches_episode(file)
+    if not (file_season and file_episodes):
+        return False
+
+    season = next((season for season in show.seasons if season.number == file_season), None)
+    for file_episode in file_episodes:
+        episode = next((episode for episode in season.episodes if episode.number == file_episode), None)
+        if not (episode and episode.state not in [States.Completed, States.Symlinked, States.Downloaded]):
+            continue
+
+        episode.file = file[self.service.file_finder.filename_attr]
+        episode.folder = filename
+        episode.alternative_folder = original_filename
+        episode.active_stream = {"infohash": info_hash, "id": id}
+
+        if item.type != ShowMediaType.Episode.value or (item.type == ShowMediaType.Episode.value and episode.number == item.number):
+            return True
+    return False
+
+def _get_show_root(self, item: MediaItem) -> MediaItem:
+    if item.type == ShowMediaType.Season.value:
+        return item.parent
+    elif item.type == ShowMediaType.Episode.value:
+        return item.parent.parent
+    return item

142-146: LGTM! Clean media type helper.

The method provides a clean way to determine media types using enums.

Consider adding a docstring to explain the return values.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1257bfe and 2c6a11f.

📒 Files selected for processing (1)

src/program/services/downloaders/__init__.py (4 hunks)

🔇 Additional comments (4)

src/program/services/downloaders/__init__.py (4)

3-3: LGTM! Clean import organization.

The imports are well-organized, clearly separating media types and shared validation functions.

Also applies to: 7-7

13-14: LGTM! Exception class follows conventions.

The InvalidFileSizeException is appropriately defined as a custom exception.

134-140: LGTM! Well-implemented file size validation.

The validation logic is clean and provides informative error messages.

Let's verify the validation logic coverage:

#!/bin/bash
# Check if there are tests covering the file size validation
rg -l "filesize_is_acceptable.*test" --type py

52-63: LGTM! Improved error handling with file size validation.

The changes properly integrate file size validation and handle errors appropriately. The code maintains the torrent cleanup on failure.

Let's verify the error handling coverage:

coderabbitai · 2024-11-07T15:37:00Z

src/program/services/downloaders/__init__.py

+        for file in download_result.container.values():
+            item_media_type = self._get_item_media_type(item)
+            if not filesize_is_acceptable(file[self.service.file_finder.filesize_attr], item_media_type):
+
+                raise InvalidFileSizeException(f"File '{file[self.service.file_finder.filename_attr]}' is invalid: {get_invalid_filesize_log_string(file[self.service.file_finder.filesize_attr], item_media_type)}")
+        logger.debug(f"All files for {download_result.info_hash} are of an acceptable size")
+
+    @staticmethod
+    def _get_item_media_type(item):
+        if item.type in (media_type.value for media_type in ShowMediaType):
+            return ShowMediaType.Show.value
+        return MovieMediaType.Movie.value


⚠️ Potential issue

Add test coverage for file size validation.

As mentioned in the PR objectives and past reviews, tests are missing for the file size validation functionality. Please add tests to cover:

Movie file size validation

Show file size validation

Invalid file size handling

Edge cases (e.g., boundary conditions)

Would you like me to help create a test suite for the file size validation functionality?

dreulavelle · 2024-11-09T05:24:12Z

src/program/media/item.py

+    Show = ShowMediaType.Show.value
+    Season = ShowMediaType.Season.value
+    Episode = ShowMediaType.Episode.value
+    Movie = MovieMediaType.Movie.value


we dont need these classes lol, no need to make it more complicated than it has to be! ;)

Gaisberg

LGTM! Props for the type classes

davidemarcoli requested a review from Gaisberg November 6, 2024 19:32

coderabbitai bot reviewed Nov 6, 2024

View reviewed changes

Gaisberg requested changes Nov 7, 2024

View reviewed changes

src/program/services/downloaders/__init__.py Outdated Show resolved Hide resolved

src/program/services/downloaders/__init__.py Outdated Show resolved Hide resolved

src/program/services/downloaders/shared.py Outdated Show resolved Hide resolved

davidemarcoli and others added 4 commits November 7, 2024 12:28

feat: implement filesize validation for movies and episodes

2dae1b5

feat: enhance filesize validation for movies and episodes, improve lo…

794a968

…gging and delete invalid torrents

fix: fix invalid log string property accessor

d30bece

iPromKnight force-pushed the feature/filesize-filter-readded branch from dfcd589 to ad40587 Compare November 7, 2024 13:09

coderabbitai bot reviewed Nov 7, 2024

View reviewed changes

src/program/services/downloaders/shared.py Outdated Show resolved Hide resolved

src/program/services/downloaders/shared.py Outdated Show resolved Hide resolved

src/program/media/item.py Show resolved Hide resolved

iPromKnight added 2 commits November 7, 2024 13:21

fix: ordering of assignment after super call

2428c2b

fix: ordering of assignment before super call

d0b6352

iPromKnight force-pushed the feature/filesize-filter-readded branch from b1f749a to d0b6352 Compare November 7, 2024 13:37

iPromKnight added 3 commits November 7, 2024 13:39

fix: remove duplicate request logging

6c6d7c0

Caller should log on exception.

fix: remove possible duplicate hashes from uppercase, and lowercase s…

a882ab2

…crapers.

coderabbitai bot reviewed Nov 7, 2024

View reviewed changes

src/program/services/scrapers/__init__.py Show resolved Hide resolved

iPromKnight requested a review from Gaisberg November 7, 2024 15:08

fix: less cpu cycles

1257bfe

coderabbitai bot reviewed Nov 7, 2024

View reviewed changes

src/program/services/downloaders/__init__.py Outdated Show resolved Hide resolved

fix: ensure item attributes are updated after filesize validation

2c6a11f

coderabbitai bot reviewed Nov 7, 2024

View reviewed changes

dreulavelle reviewed Nov 9, 2024

View reviewed changes

Gaisberg approved these changes Nov 11, 2024

View reviewed changes

Gaisberg merged commit d1041db into main Nov 13, 2024
2 checks passed

Gaisberg deleted the feature/filesize-filter-readded branch November 13, 2024 16:27

dreulavelle mentioned this pull request Nov 13, 2024

chore(main): release 0.19.0 #868

Merged

coderabbitai bot mentioned this pull request Nov 17, 2024

feat: add denied reasoning when trashing torrents and added adult parsing #888

Merged

2 tasks

coderabbitai bot mentioned this pull request Nov 26, 2024

refactor: downloaders rework #903

Merged

2 tasks

This was referenced Nov 30, 2024

[Fix]#921 #925 Manual scrapping integration fixed. #924

Open

fix: api manual scraping fixes #915

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement filesize validation for movies and episodes #869

feat: implement filesize validation for movies and episodes #869

davidemarcoli commented Nov 6, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 6, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Nov 6, 2024

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Nov 7, 2024

dreulavelle Nov 9, 2024

Gaisberg left a comment

feat: implement filesize validation for movies and episodes #869

feat: implement filesize validation for movies and episodes #869

Conversation

davidemarcoli commented Nov 6, 2024 • edited by coderabbitai bot Loading

Pull Request Check List

Description:

Summary by CodeRabbit

coderabbitai bot commented Nov 6, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 6, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 7, 2024

Choose a reason for hiding this comment

dreulavelle Nov 9, 2024

Choose a reason for hiding this comment

Gaisberg left a comment

Choose a reason for hiding this comment

davidemarcoli commented Nov 6, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 6, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)