-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(clp-package): Add support for deleting archives that are exclusively within a time range. #594
Conversation
WalkthroughThe changes in this pull request introduce a new enumeration value Changes
Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (10)
💤 Files with no reviewable changes (3)
🚧 Files skipped from review as they are similar to previous changes (7)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Outside diff range and nitpick comments (7)
components/package-template/src/sbin/del-archives.sh (1)
1-9
: Add script documentation and usage informationThe script would benefit from a header comment explaining its purpose and usage.
Add this documentation at the start of the script:
#!/usr/bin/env bash +# del-archives.sh - Delete archives based on timestamp +# +# This script executes the del_archives Python module which handles the deletion +# of outdated archives based on specified time ranges and configurations. +# +# Usage: ./del-archives.sh [options] +# For detailed options, run: ./del-archives.sh --help + script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"components/clp-package-utils/clp_package_utils/scripts/del_archives.py (2)
67-73
: Add debug logging for timestamp values.Consider adding debug logging to show the actual timestamp values being used. This will help with troubleshooting issues.
Add logging before validation:
begin_ts = parsed_args.begin_ts end_ts = parsed_args.end_ts +logger.debug(f"Using time range: begin_ts={begin_ts}, end_ts={end_ts}") if end_ts < 0 or begin_ts < 0:
100-101
: Add error handling for cleanup operation.The cleanup step could fail silently. Consider adding error handling to ensure cleanup issues are logged.
Add error handling for cleanup:
-# Remove generated files -generated_config_path_on_host.unlink() +# Remove generated files +try: + generated_config_path_on_host.unlink() +except Exception as e: + logger.warning(f"Failed to clean up generated config file: {e}")components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py (4)
20-20
: Consider using__name__
instead of__file__
for the logger name.Using
__file__
may include the full path to the script, which can vary between environments and potentially expose sensitive directory structures. It's a common practice to use__name__
, which provides a consistent and more secure logger name.Apply this diff to update the logger initialization:
-logger = logging.getLogger(__file__) +logger = logging.getLogger(__name__)
29-29
: Revise the comment for clarity and grammar.The comment
# Consider deduplicate this
could be rephrased for better clarity and grammatical correctness.Consider updating it as follows:
-# Consider deduplicate this +# Consider deduplicating this function
110-110
: Handle exceptions during directory deletion to ensure robustness.Deleting directories may fail due to permissions or other issues. It's advisable to handle exceptions to prevent the script from crashing and to log any errors encountered during deletion.
Wrap the deletion in a try-except block:
try: shutil.rmtree(archive_path) except Exception as e: logger.error(f"Failed to delete archive {archive}: {e}")
131-137
: Correct and simplify the help texts forbegin_ts
andend_ts
arguments.The help texts contain unnecessary string concatenations, and the
end_ts
help text incorrectly refers to "lower-bound" instead of "upper-bound".Apply this diff to correct and simplify the help texts:
args_parser.add_argument( "begin_ts", type=int, - help="Time range filter lower-bound (inclusive) as milliseconds" " from the UNIX epoch.", + help="Time range filter lower-bound (inclusive) as milliseconds from the UNIX epoch.", ) args_parser.add_argument( "end_ts", type=int, - help="Time range filter lower-bound (inclusive) as milliseconds" " from the UNIX epoch.", + help="Time range filter upper-bound (inclusive) as milliseconds from the UNIX epoch.", )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
components/clp-package-utils/clp_package_utils/general.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/del_archives.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
(1 hunks)components/package-template/src/sbin/del-archives.sh
(1 hunks)
🔇 Additional comments (5)
components/package-template/src/sbin/del-archives.sh (1)
8-8
: Verify Python module and its dependencies
Let's ensure the Python module and its dependencies are properly set up in the repository.
✅ Verification successful
Python module and dependencies are properly configured
The verification confirms that:
- The Python module exists at
components/clp-package-utils/clp_package_utils/scripts/del_archives.py
- The module correctly imports all required dependencies
- The module is properly referenced in the shell script, as it uses the native implementation at
clp_package_utils.scripts.native.del_archives
- The implementation follows a container-based execution pattern with appropriate configuration handling
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify Python module existence and its dependencies
# Check if the module exists
echo "Checking for del_archives module..."
fd "del_archives.py$" --type f
# Check module imports and dependencies
echo -e "\nChecking module dependencies..."
rg -l "^import|^from.*import" --type py | grep -i "del_archives"
Length of output: 555
Script:
#!/bin/bash
# Let's check the content of both Python files to understand their relationship and dependencies
echo "Content of components/clp-package-utils/clp_package_utils/scripts/del_archives.py:"
cat components/clp-package-utils/clp_package_utils/scripts/del_archives.py
echo -e "\nContent of components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:"
cat components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
# Check if the module is imported or used elsewhere
echo -e "\nChecking usage of del_archives module across the codebase:"
rg "del_archives" -l
Length of output: 8785
components/clp-package-utils/clp_package_utils/scripts/del_archives.py (3)
1-28
: LGTM! Well-structured imports and logging setup.
The imports are appropriate, and the logging configuration follows best practices with proper formatting and handlers.
106-107
: LGTM! Proper main guard implementation.
The main entry point follows Python best practices with proper exit code handling.
89-89
: Verify the native del_archives module exists.
This script depends on the native module "clp_package_utils.scripts.native.del_archives". Let's verify its existence and implementation.
✅ Verification successful
Native del_archives module is properly implemented and integrated
The native module exists at the correct location with a complete implementation that:
- Accepts required command-line arguments (config file path, begin timestamp, end timestamp)
- Handles archive deletion from both database and storage
- Provides proper logging and error handling
- Maintains consistent interface with the calling script
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence and implementation of the native del_archives module
# Check if the module exists
fd -t f "del_archives.py" components/clp-package-utils/clp_package_utils/scripts/native/
# If it exists, check its interface
rg -A 5 "def main" components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
Length of output: 508
Script:
#!/bin/bash
# Let's verify the complete implementation and interface
cat components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
Length of output: 4826
components/clp-package-utils/clp_package_utils/general.py (1)
54-54
: LGTM! The new job type follows the existing pattern.
The addition of DEL_ARCHIVE
to the JobType
enum is consistent with the PR's objective of supporting archive deletion based on timestamp.
Let's verify the usage of this new job type:
✅ Verification successful
The new DEL_ARCHIVE
job type is properly integrated and used
The verification confirms that:
- The enum is correctly added to
JobType
class alongside other job types - It's properly used in
del_archives.py
withgenerate_container_name()
function, following the same pattern as other job types
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for references to the new DEL_ARCHIVE job type
# Expected: References in the new del_archives.py scripts mentioned in the AI summary
# Search for direct string references
rg -l "DEL_ARCHIVE"
# Search for container name generation with the new job type
ast-grep --pattern 'generate_container_name($x)' -l
Length of output: 308
Script:
#!/bin/bash
# Let's check how DEL_ARCHIVE is used in both files
rg "DEL_ARCHIVE" -A 3 -B 3
# Check container name generation with proper language
ast-grep --pattern 'generate_container_name($x)' --lang python
Length of output: 1445
components/clp-package-utils/clp_package_utils/scripts/del_archives.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/del_archives.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a brand new script, let's use our latest conventions and best practices.
components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
Outdated
Show resolved
Hide resolved
# Note, the error message doesn't output the value of archives_dir because it is a mounted | ||
# path. It could be confusing for user because the path will not exist in their file system. | ||
if not archives_dir.exists(): | ||
logger.error(f"archive directory doesn't exist. abort deletion") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.error(f"archive directory doesn't exist. abort deletion") | |
# TODO: Figure out a way to log `archives_dir` relative to the user's file system rather | |
# than the container's file system. | |
logger.error(f"archive directory doesn't exist. abort deletion") |
components/clp-package-utils/clp_package_utils/scripts/del_archives.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/del_archives.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/del_archives.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/del_archives.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/del_archives.py
Outdated
Show resolved
Hide resolved
|
Co-authored-by: kirkrodrigues <[email protected]>
Co-authored-by: kirkrodrigues <[email protected]>
Co-authored-by: kirkrodrigues <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (2)
components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py (2)
29-76
: Refactor main function to improve separation of concernsThe main function currently handles both argument parsing and configuration loading. Consider separating these responsibilities for better maintainability.
Consider refactoring like this:
def _load_and_validate_config(config_path: pathlib.Path, default_config_path: pathlib.Path, clp_home: pathlib.Path): try: clp_config = load_config_file(config_path, default_config_path, clp_home) clp_config.validate_logs_dir() return clp_config except Exception as e: logger.exception(f"Failed to load config: {str(e)}") raise def main(argv): clp_home = get_clp_home() default_config_file_path = clp_home / CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH args_parser = argparse.ArgumentParser( description="Deletes archives that fall within the specified time range." ) # ... argument parsing code ... parsed_args = args_parser.parse_args(argv[1:]) try: config_file_path = pathlib.Path(parsed_args.config) clp_config = _load_and_validate_config( config_file_path, default_config_file_path, clp_home ) except Exception: return -1 if not clp_config.archive_output.directory.exists(): logger.error("`archive_output.directory` doesn't exist.") return -1 return _delete_archives( clp_config.archive_output.directory, clp_config.database, parsed_args.begin_ts, parsed_args.end_ts, )
78-92
: Enhance function documentationThe docstring could be improved by adding type hints and more detailed parameter descriptions.
Consider updating the docstring:
def _delete_archives( archives_dir: pathlib.Path, database_config: Database, begin_ts: int, end_ts: int, ) -> int: """ Deletes all archives where `begin_ts <= archive.begin_timestamp` and `archive.end_timestamp <= end_ts` from both the metadata database and disk. Args: archives_dir (pathlib.Path): Directory containing the archives database_config (Database): Database configuration object begin_ts (int): Lower bound timestamp in milliseconds (inclusive) end_ts (int): Upper bound timestamp in milliseconds (inclusive) Returns: int: 0 on success, -1 on failure Note: This operation is not atomic. If it fails after deleting from the database but before completing disk cleanup, manual intervention may be required. """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
components/clp-package-utils/clp_package_utils/scripts/del_archives.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
(1 hunks)components/package-template/src/sbin/admin_tools/del-archives.sh
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/clp-package-utils/clp_package_utils/scripts/del_archives.py
🧰 Additional context used
📓 Learnings (1)
components/package-template/src/sbin/admin_tools/del-archives.sh (1)
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the `components/package-template/src/sbin/` directory, keep them simple and avoid adding additional validation code.
🔇 Additional comments (3)
components/package-template/src/sbin/admin_tools/del-archives.sh (2)
1-5
: LGTM! Clean and effective directory setup.
The script correctly determines its location and the package root using robust path resolution.
6-9
: Verify Python module path exists in the package.
The implementation looks good and follows the principle of keeping wrapper scripts simple. Let's verify the Python module path.
✅ Verification successful
Python module path and package structure are correctly implemented
The verification confirms that:
- The Python module
del_archives.py
exists and is properly structured - The package hierarchy is complete with
__init__.py
files in all required directories - The module imports and uses the expected package structure (
clp_package_utils.scripts.native.del_archives
) - The package is located in the correct component (
components/clp-package-utils
)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence of the del_archives module and its package structure
# Check if the module exists in the expected location
fd --type f "del_archives.py" --exec echo "Module found: {}"
# Verify the package structure
fd --type f "__init__.py" --parent-dir "clp_package_utils" --exec echo "Package structure found: {}"
Length of output: 609
Script:
#!/bin/bash
# Let's verify the package structure more accurately
# First, check the content of the Python module
echo "=== Checking del_archives.py content ==="
cat ./components/clp-package-utils/clp_package_utils/scripts/del_archives.py
# Then verify the package structure by checking for __init__.py files
echo -e "\n=== Checking package structure ==="
fd "__init__.py" ./components/clp-package-utils/clp_package_utils
Length of output: 3867
components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py (1)
145-146
: LGTM!
The main guard implementation follows Python best practices.
components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/del_archives.py
Outdated
Show resolved
Hide resolved
components/package-template/src/sbin/admin_tools/del-archives.sh
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/del_archives.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (3)
components/clp-package-utils/clp_package_utils/scripts/compress.py (1)
Line range hint
1-150
: Consider reorganizing script location as suggested in PR comments.As per kirkrodrigues's feedback, consider moving this script to an appropriate directory based on its role (admin vs. user-facing) when implementing the suggested
scripts/admin_tools
reorganization.components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py (2)
56-58
: Validate config file path before loadingAdd explicit path validation before attempting to load the config file.
Apply this diff:
config_file_path = Path(parsed_args.config) + if not config_file_path.is_file(): + logger.error(f"Config file {config_file_path} does not exist or is not a file.") + return -1 try: clp_config = load_config_file(config_file_path, default_config_file_path, clp_home)
84-92
: Enhance function documentationThe docstring should include type hints and better parameter descriptions.
Apply this diff:
""" Deletes all archives where `begin_ts <= archive.begin_timestamp` and `archive.end_timestamp <= end_ts` from both the metadata database and disk. - :param archives_dir: - :param database_config: - :param begin_ts: - :param end_ts: + :param archives_dir: Path to the directory containing archives + :param database_config: Database configuration object + :param begin_ts: Start timestamp in milliseconds since epoch (inclusive) + :param end_ts: End timestamp in milliseconds since epoch (inclusive) :return: 0 on success, -1 otherwise. + :raises: Exception if database operations fail """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (7)
components/clp-package-utils/clp_package_utils/general.py
(2 hunks)components/clp-package-utils/clp_package_utils/scripts/compress.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/decompress.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/del_archives.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/search.py
(1 hunks)components/package-template/src/sbin/admin-tools/del-archives.sh
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- components/package-template/src/sbin/admin-tools/del-archives.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- components/clp-package-utils/clp_package_utils/general.py
- components/clp-package-utils/clp_package_utils/scripts/del_archives.py
🔇 Additional comments (1)
components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py (1)
145-146
: LGTM!
The entry point is correctly implemented using the standard Python idiom.
db_cursor.execute( | ||
f""" | ||
DELETE FROM `{table_prefix}archives` WHERE | ||
begin_timestamp >= {begin_ts} AND end_timestamp <= {end_ts} | ||
RETURNING id | ||
""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve timestamp handling in SQL query
The current implementation has an unresolved issue with large integer timestamps. Consider alternative approaches for handling large integers.
Here are potential solutions:
- Use string representation:
- f"""
- DELETE FROM `{table_prefix}archives` WHERE
- begin_timestamp >= {begin_ts} AND end_timestamp <= {end_ts}
- RETURNING id
- """
+ f"DELETE FROM `{table_prefix}archives` WHERE begin_timestamp >= %s AND end_timestamp <= %s RETURNING id",
+ (str(begin_ts), str(end_ts))
- Use DECIMAL type:
from decimal import Decimal
db_cursor.execute(
f"DELETE FROM `{table_prefix}archives` WHERE begin_timestamp >= %s AND end_timestamp <= %s RETURNING id",
(Decimal(begin_ts), Decimal(end_ts))
)
Please test these solutions to determine which one resolves the integer overflow issue in your environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, managed to resolve it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (3)
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (3)
Line range hint
1-1000
: Consider adding type hints consistently across all functionsWhile type hints have been added to
add_num_workers_argument
andmain
, other functions likecontainer_exists
,update_settings_object
, etc., still lack type hints. Consider adding type hints consistently across all functions to improve code maintainability and enable better static type checking.
Line range hint
1-1000
: Consider improving error handling and configuration managementSeveral areas could be enhanced:
- Error handling in subprocess calls could be more robust by capturing and logging specific error messages
- Container startup timeouts are hardcoded (e.g., 180s for DB, 60s for queue) - consider making these configurable
- Configuration validation is scattered across multiple functions - consider centralizing validation logic
Example of improved error handling:
- subprocess.run(cmd, stdout=subprocess.DEVNULL, check=True) + try: + result = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True) + except subprocess.CalledProcessError as e: + logger.error(f"Failed to start container: {e.stderr.decode()}") + raise
Line range hint
1-1000
: Address potential security concernsSeveral security considerations should be addressed:
- Temporary files containing credentials should be securely deleted (e.g., using
shred
or similar secure deletion)- Error messages should be sanitized to prevent potential exposure of sensitive information
- Consider implementing credential rotation mechanisms for long-running containers
Example of secure file deletion:
+def secure_delete(file_path: pathlib.Path): + """Securely delete a file containing sensitive information.""" + subprocess.run(['shred', '-u', '-z', str(file_path)], check=True) - db_config_file_path.unlink() + secure_delete(db_config_file_path)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (11)
components/clp-package-utils/clp_package_utils/__init__.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/compress.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/decompress.py
(3 hunks)components/clp-package-utils/clp_package_utils/scripts/del_archives.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/native/compress.py
(0 hunks)components/clp-package-utils/clp_package_utils/scripts/native/decompress.py
(0 hunks)components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/native/search.py
(0 hunks)components/clp-package-utils/clp_package_utils/scripts/search.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/start_clp.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/stop_clp.py
(1 hunks)
💤 Files with no reviewable changes (3)
- components/clp-package-utils/clp_package_utils/scripts/native/compress.py
- components/clp-package-utils/clp_package_utils/scripts/native/decompress.py
- components/clp-package-utils/clp_package_utils/scripts/native/search.py
✅ Files skipped from review due to trivial changes (1)
- components/clp-package-utils/clp_package_utils/init.py
🚧 Files skipped from review as they are similar to previous changes (5)
- components/clp-package-utils/clp_package_utils/scripts/compress.py
- components/clp-package-utils/clp_package_utils/scripts/decompress.py
- components/clp-package-utils/clp_package_utils/scripts/del_archives.py
- components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
- components/clp-package-utils/clp_package_utils/scripts/search.py
🔇 Additional comments (1)
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (1)
63-63
: LGTM: Logger initialization follows best practices
Using __file__
as the logger name provides better traceability and consistency across modules.
@@ -18,15 +18,8 @@ | |||
validate_and_load_db_credentials_file, | |||
) | |||
|
|||
# Setup logging | |||
# Create logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Create logger |
Don't think this comment is necessary. Can delete it in all the refactored files.
Co-authored-by: kirkrodrigues <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR title, how about:
feat(clp-package): Add support for deleting archives that are exclusively within a time range.
…vely within a time range. (y-scope#594) Co-authored-by: kirkrodrigues <[email protected]>
…vely within a time range. (y-scope#594) Co-authored-by: kirkrodrigues <[email protected]>
Description
Our current package implementation doesn't support deleting compressed logs. As a result, after packages are used for a period of time, the size of the logs could be huge and cause storage issues, for example, #585. It would be desirable to support something similar to retension period where logs can be deleted based on their timestamp.
There are two possible implementations to prune compressed logs based on their timestamp.
Given a timestamp range,
begin_ts
andend_ts
.begin_ts < msg.ts < end_ts
. This requires a non-trivial implementation as it requires to modify the archive data.begin_ts < archive.begin_ts < archive_end_ts < end_ts
). Compared to method 1, this is only an approximation because for case likearchive.begin_ts < begin_ts < archive_end_ts < end_ts
. this method will skip the entire archive.In this PR, we use the implementation for simplicity. We introduce a new package script, which is similar to the existing
decompress.py
. The script doesn't create any new job for scheduler, but directly operate on archives and databases. The native script takes user parametersbegin_ts
andend_ts
and performs the following:archive.begin_ts < begin_ts < archive_end_ts < end_ts
. returns the IDs.Validation performed
Manually compressed different archives with different begin & end ts. Ran the script with different ts parameters and verified that archives are deleted correctly.
Summary by CodeRabbit
New Features
Changes
DEL_ARCHIVE
.These enhancements improve the flexibility and usability of the archive management features.