Skip to content

Conversation

@qa-cea
Copy link
Collaborator

@qa-cea qa-cea commented Aug 27, 2025

  • Switch from MD5 to SHA3-256 to avoid any risk of collision. MD5 is still supported for retrieving from the annex, pushing is only using SHA3-256
  • Switch to %c date format to a UNIX timestamp to allow collaboration with people with different date format / timezome. The old format is supported for reading, the push only UNIX timestamp

@qa-cea qa-cea requested a review from valeriyoann August 27, 2025 12:45
Comment on lines +127 to 128
# MD5
if meta.st_size == 32:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is there no other way to know the type of hash being used ? Feels really fragile

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MD 5 hashes will always be this size, some other algorithms may have similar hash size but they are rare.
In Rift case, this method as always be used so we can always expect hash this size being MD5.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also add that there is no point of supporting mutliple hashes : it will either be MD5 or SHA256

Comment on lines 128 to +139
if meta.st_size == 32:
logging.warning("Using deprecated hash algorithm (MD5)")
with open(filepath, encoding='utf-8') as fh:
identifier = fh.read(32)
return all(byte in string.hexdigits for byte in identifier)

# SHA3 256
elif meta.st_size == 64:
with open(filepath, encoding='utf-8') as fh:
identifier = fh.read(64)
return all(byte in string.hexdigits for byte in identifier)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: merge the two conditions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

raise ValueError(f"Invalid date format in metadata: {insertion_time}")

# UNIX timestamp
elif isinstance(insertion_time, (int, float)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is the float type really possible ? Seems unlikely for a timestamp

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

elif isinstance(insertion_time, (int, float)):
insertion_time = insertion_time
else:
raise ValueError("Invalid date format in metadata")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest: print the type of the object

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


# UNIX timestamp
elif isinstance(insertion_time, (int, float)):
insertion_time = insertion_time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

major: don't think this line is really necessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me there is 3 options :

  • old format
  • new format
  • unknown format

if unknown format happen and it not handled, Rift will crash

insertion time.
"""
for filename in os.listdir(self.path):
if not filename.endswith('.info'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest: change the condition to check it ends with a .info and continue if true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 281 to +286
if os.path.exists(destpath):
destinfo = os.stat(destpath)
if destinfo and destinfo.st_size == originfo.st_size and \
filename in metadata.get('filenames', {}):
logging.debug('%s is already into annex, skipping it', filename)

else:
# Update them and write them back
fileset = metadata.setdefault('filenames', {})
fileset.setdefault(filename, {})
fileset[filename]['date'] = time.strftime("%c")
self._save_metadata(digest, metadata)

# Move binary file to annex
logging.debug('Importing %s into annex (%s)', filepath, digest)
shutil.copyfile(filepath, destpath)
os.chmod(destpath, self.WMODE)
if destinfo and destinfo.st_size == originfo.st_size and \
filename in metadata.get('filenames', {}):
logging.debug('%s is already into annex, skipping it', filename)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: shouldn't we return an error or at least continue if os.path.exists(destpath) is false ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment noted for the next MR (S3 annex)

@qa-cea qa-cea closed this by deleting the head repository Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants