Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifying file in place corrupts directory records #122

Open
Goshin opened this issue Dec 19, 2023 · 3 comments
Open

Modifying file in place corrupts directory records #122

Goshin opened this issue Dec 19, 2023 · 3 comments

Comments

@Goshin
Copy link

Goshin commented Dec 19, 2023

The bug was recorded back in #71 (comment). After some debugging, I figured out the cause.

When modifying the file in place, the library will try to update the file length in its directory record:

pycdlib/pycdlib/pycdlib.py

Lines 4501 to 4528 in bc5187a

# Finally write out the directory record entry.
# This is a little tricky because of what things mean. First of all,
# child.extents_to_here represents the total number of extents up to
# this child in the parent. Thus, to get the absolute extent offset,
# we start with the parent's extent location, add on the number of
# extents to here, and remove 1 (since our offset will be zero-based).
# Second, child.offset_to_here is the *last* byte that the child uses,
# so to get the start of it we subtract off the length of the child.
# Then we can multiply the extent location by the logical block size,
# add on the offset, and get to the absolute location in the file.
first_joliet = True
for record, is_pvd_unused in child.inode.linked_records:
if isinstance(record, dr.DirectoryRecord):
if self.joliet_vd is not None and id(record.vd) == id(self.joliet_vd) and first_joliet:
first_joliet = False
self.joliet_vd.remove_from_space_size(record.get_data_length())
self.joliet_vd.add_to_space_size(length)
if record.parent is None:
raise pycdlibexception.PyCdlibInternalError('Modifying file with empty parent')
abs_extent_loc = record.parent.extent_location() + record.extents_to_here - 1
offset = record.offset_to_here - record.dr_len
abs_offset = abs_extent_loc * self.logical_block_size + offset
elif isinstance(record, udfmod.UDFFileEntry):
abs_offset = record.extent_location() * self.logical_block_size
record.set_data_length(length)
self._cdfp.seek(abs_offset)
self._cdfp.write(record.record())

The absolute offset of the directory record is calculated based on the parent's extent location and the offset within the parent's directory records:

pycdlib/pycdlib/pycdlib.py

Lines 4520 to 4522 in bc5187a

abs_extent_loc = record.parent.extent_location() + record.extents_to_here - 1
offset = record.offset_to_here - record.dr_len
abs_offset = abs_extent_loc * self.logical_block_size + offset

Here, the record.offset_to_here value is calculated by summing up previous siblings' lengths up to itself:

pycdlib/pycdlib/dr.py

Lines 703 to 735 in bc5187a

def _recalculate_extents_and_offsets(self, index, logical_block_size):
# type: (int, int) -> Tuple[int, int]
"""
Internal method to recalculate the extents and offsets associated with
children of this directory record.
Parameters:
index - The index at which to start the recalculation.
logical_block_size - The block size to use for comparisons.
Returns:
A tuple where the first element is the total number of extents required
by the children and where the second element is the offset into the
last extent currently being used.
"""
if index == 0:
dirrecord_offset = 0
num_extents = 1
else:
dirrecord_offset = self.children[index - 1].offset_to_here
num_extents = self.children[index - 1].extents_to_here
for i in range(index, len(self.children)):
c = self.children[i]
dirrecord_len = c.dr_len
if (dirrecord_offset + dirrecord_len) > logical_block_size:
num_extents += 1
dirrecord_offset = 0
dirrecord_offset += dirrecord_len
c.extents_to_here = num_extents
c.offset_to_here = dirrecord_offset
c.index_in_parent = i
return num_extents, dirrecord_offset

However, Pycdlib maintains the children list in sorted order when walking directory records and adding children:

pycdlib/pycdlib/dr.py

Lines 760 to 775 in bc5187a

# First ensure that this is not a duplicate. For speed purposes, we
# recognize that bisect_left will always choose an index to the *left*
# of a duplicate child. Thus, to check for duplicates we only need to
# see if the child to be added is a duplicate with the entry that
# bisect_left returned.
index = bisect.bisect_left(self.children, child)
if index != len(self.children) and self.children[index].file_ident == child.file_ident:
if not self.children[index].is_associated_file() and not child.is_associated_file():
if not (self.rock_ridge is not None and self.file_identifier() == b'RR_MOVED'):
if not allow_duplicate:
raise pycdlibexception.PyCdlibInvalidInput('Failed adding duplicate name to parent')
self.children[index].data_continuation = child
self.children[index].file_flags |= (1 << self.FILE_FLAG_MULTI_EXTENT_BIT)
index += 1
self.children.insert(index, child)

In this case, the index of a directory record will likely differ from its actual index in the ISO file. Therefore, the directory record offset will be wrong when writing the length back to the ISO file, and some random sibling records will be corrupted.


The solution I can come up with is to rewrite the parent's directory records together. As a temporary fix, one can simply invoke self._write_directory_records() to override all the directory records.

@Goshin
Copy link
Author

Goshin commented Dec 19, 2023

A fix by rewriting all the directory records of the parent.

diff --git a/pycdlib/pycdlib.py b/pycdlib/pycdlib.py
index d9aa951..f868de2 100644
--- a/pycdlib/pycdlib.py
+++ b/pycdlib/pycdlib.py
@@ -4517,15 +4517,21 @@ class PyCdlib(object):
                     self.joliet_vd.add_to_space_size(length)
                 if record.parent is None:
                     raise pycdlibexception.PyCdlibInternalError('Modifying file with empty parent')
-                abs_extent_loc = record.parent.extent_location() + record.extents_to_here - 1
-                offset = record.offset_to_here - record.dr_len
-                abs_offset = abs_extent_loc * self.logical_block_size + offset
+                record.set_data_length(length)
+                abs_extent_loc = record.parent.extent_location()
+                offset = 0
+                for sibling in record.parent.children:
+                    if (offset + sibling.dr_len) > self.logical_block_size:
+                        abs_extent_loc += 1
+                        offset = 0
+                    self._cdfp.seek(abs_extent_loc * self.logical_block_size + offset)
+                    self._cdfp.write(sibling.record())
+                    offset += sibling.dr_len
             elif isinstance(record, udfmod.UDFFileEntry):
                 abs_offset = record.extent_location() * self.logical_block_size
+                record.set_data_length(length)
+                self._cdfp.seek(abs_offset)
+                self._cdfp.write(record.record())
-
-            record.set_data_length(length)
-            self._cdfp.seek(abs_offset)
-            self._cdfp.write(record.record())

     def add_hard_link(self, **kwargs):
         # type: (str) -> None

@Goshin
Copy link
Author

Goshin commented Dec 19, 2023

Just found another related bug in DirectoryRecord.record(). Some ISO files use fixed-length padding for directory records. The current implementation returns record byte arrays shorter than expected. Here is a fix padding the output to self.dr_len.

diff --git a/pycdlib/dr.py b/pycdlib/dr.py
index cdb78d9..cdc421e 100644
--- a/pycdlib/dr.py
+++ b/pycdlib/dr.py
@@ -1097,7 +1097,7 @@ class DirectoryRecord(object):
                                self.seqnum, utils.swab_16bit(self.seqnum),
                                self.len_fi) + self.file_ident + padstr + xa_rec + rr_rec]
 
-        outlist.append(b'\x00' * (len(outlist[0]) % 2))
+        outlist.append(b'\x00' * (self.dr_len - len(outlist[0])))
 
         return b''.join(outlist)

@Goshin
Copy link
Author

Goshin commented Dec 19, 2023

Another fix is to walk the parent and find the record we want to update. This might be better because we can make minimal changes to the ISO file. I am not sure how to handle duplicate names in this case though.

diff --git a/pycdlib/pycdlib.py b/pycdlib/pycdlib.py
index d9aa951..6261289 100644
--- a/pycdlib/pycdlib.py
+++ b/pycdlib/pycdlib.py
@@ -4517,15 +4517,30 @@ class PyCdlib(object):
                     self.joliet_vd.add_to_space_size(length)
                 if record.parent is None:
                     raise pycdlibexception.PyCdlibInternalError('Modifying file with empty parent')
-                abs_extent_loc = record.parent.extent_location() + record.extents_to_here - 1
-                offset = record.offset_to_here - record.dr_len
-                abs_offset = abs_extent_loc * self.logical_block_size + offset
+                record.set_data_length(length)
+                self._cdfp.seek(record.parent.extent_location() * self.logical_block_size)
+                data = self._cdfp.read(record.parent.get_data_length())
+                offset = 0
+                while offset < record.parent.get_data_length():
+                    lenbyte = bytearray([data[offset]])[0]
+                    if lenbyte == 0:
+                        padsize = self.logical_block_size - (offset % self.logical_block_size)
+                        if data[offset:offset + padsize] != b'\x00' * padsize:
+                            raise pycdlibexception.PyCdlibInvalidISO('Invalid padding on ISO')
+                        offset = offset + padsize
+                        continue
+                    sibling = dr.DirectoryRecord()
+                    sibling.parse(self.pvd, data[offset:offset + lenbyte], record.parent)
+                    if sibling.file_ident == record.file_ident:
+                        self._cdfp.seek(record.parent.extent_location() * self.logical_block_size + offset)
+                        self._cdfp.write(record.record())
+                        break
+                    offset += lenbyte
             elif isinstance(record, udfmod.UDFFileEntry):
                 abs_offset = record.extent_location() * self.logical_block_size
+                record.set_data_length(length)
+                self._cdfp.seek(abs_offset)
+                self._cdfp.write(record.record())
-
-            record.set_data_length(length)
-            self._cdfp.seek(abs_offset)
-            self._cdfp.write(record.record())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant