Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change diff to evaluate PKs for FK relationships #420

Merged
merged 3 commits into from
Dec 28, 2022

Conversation

sum-rock
Copy link
Contributor

@sum-rock sum-rock commented Aug 25, 2022

I am going to write up an issue to relate to this PR in the next day or two. At the moment, I'm focused on getting this fix merged into our fork to resolve some bloating on our LogEntry table. (Done, see attached issue)

Take a look and let me know what you think of this fix. I did look through the current issues and couldn't find anyone else complaining about this so maybe we've got some anti-patterns going on in our code. The issue is only happening in some of our more complex systems but I'm sure we're not that unique. I'll highlight the test case that shows the fix.

After letting this sit for an hour or two I realized this change will fix another more insidious bug. Currently, in cases where there are more than one object with the same string representation (e.g., two or more "John Smith" objects in the Customer model) and a related object is updated to a different object with the same string representation (e.g., an update to a PhoneNumber object "+1555-555-5555" to belong to a different "John Smith" Customer object) auditlog is not able to capture that update. This is because because the diff method is comparing object strings rather than primary keys and there is no guarantee that the __str__ method is returning a unique value. Therefore, a portion of updates to ForeignKey fields are likely not being captured by auditlog on current installations.

In this commit you can see a test case that demonstrates the flaw and how it is resolved by this change.


The diff method now evaluates the primary keys for changes to determine
if a new LogEntry should be created. Previously, the diff method was
evaluating the string representation of the object. This was flawed
because cases can occur when a parent object has in memory changes to
its string string representation and the child related object is saved
prior to these in memory changes being persisted. In these cases a new
LogEntry object would be created erroneously. This cases is asserted
with a test and a regression test will verify the bug.

The consequence of these updates is that the LogEntry.changes field
now stores primary keys rather than string representations for related
objects. To keep the changes dictionary display unaffected by this
update, a method was added to the LogEntry model. This method looks
up the object display string from the stored foreign key. Exceptions
were written to handle backwards compatibility.

Issue

Comment on lines +70 to +80
elif (field.one_to_one or field.many_to_one) and hasattr(field, "rel_class"):
value = smart_str(
getattr(obj, field.get_attname(), None), strings_only=True
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the primary fix.

@codecov
Copy link

codecov bot commented Aug 25, 2022

Codecov Report

Merging #420 (39e0be9) into master (c65c38e) will increase coverage by 0.10%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #420      +/-   ##
==========================================
+ Coverage   93.55%   93.66%   +0.10%     
==========================================
  Files          29       29              
  Lines         869      884      +15     
==========================================
+ Hits          813      828      +15     
  Misses         56       56              
Impacted Files Coverage Δ
auditlog/diff.py 97.22% <100.00%> (+0.07%) ⬆️
auditlog/models.py 88.58% <100.00%> (+0.61%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Comment on lines +1644 to +1778
def test_no_log_entry_created_on_related_object_string_update(self):
t1 = self.test_date
with freezegun.freeze_time(t1):
simple = SimpleModel.objects.create(text="Test Foo")
one_simple = SimpleModel.objects.create(text="Test Bar")
instance = RelatedModel.objects.create(
one_to_one=simple, related=one_simple
)

t2 = self.test_date + datetime.timedelta(days=20)
with freezegun.freeze_time(t2):
# Order is important. Without special FK handling, the arbitrary in memory
# changes to the (same) related object's signature result in a perceived
# update where no update has occurred.
one_simple.text = "Test Baz"
instance.save()
one_simple.save()

# Assert that only one log for the instance was created
self.assertEqual(instance.history.all().count(), 1)
# Assert that two logs were created for the parent object
self.assertEqual(one_simple.history.all().count(), 2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the primary fix, this test case fails. Specifically, instance will have two LogEntry records: a create and an update. The update is false though because there was no actual change on the record. The only change that occurred was on the SimpleModel.text field.

Comment on lines 496 to 537
def _get_changes_display_for_fk_field(
self, field: Union[models.ForeignKey, models.OneToOneField], value: Any
) -> str:
"""
:return: An object display string for a given FK value and the field to which it belongs
"""
# Return "None" if the FK value is "None".
if value == "None":
return value

# Attempt to convert given value to the PK type for the related model
try:
pk_value = field.related_model._meta.pk.to_python(value)
# ValidationError will handle legacy values where string representations were
# stored rather than PKs. This will also handle cases where the PK type is
# changed between the time the LogEntry is created and this method is called.
except ValidationError:
return value
# Attempt to return the string representation of the object
try:
return smart_str(field.related_model.objects.get(pk=pk_value))
# ObjectDoesNotExist will be raised if the object was deleted.
except ObjectDoesNotExist:
return f"Deleted '{field.related_model.__name__}' ({value})"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One consequence of this is that the related objects' str representation is based on the CURRENT state of that object, not the historical state. This is not... great when looking at the audit log of object A, it should not tell you the lifecycle of object B.

There are probably ways around this by using the serialized data feature, but this has quite a few drawbacks as serialized data may not be compatible with the current code.

One possible solution is storing a tuple in the changes dict (pk, str) but this changes the shape in other ways and represents a more complex breaking change.

or maybe having a datetime in a str method is just an antipattern (the root issue, on why we are here)

Copy link
Contributor Author

@sum-rock sum-rock Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String representations are usually used as a way to have a human readable display for the object. I do not believe that __str__ is intended to show state. Therefore showing the human readable form of which object was attached to this field on this object may not be so problematic. One could also argue that displaying the current version of the string representation for both the old and new relationships is a feature add because it's easier to compare. That said, I don't think people are likely to change the format of the returned __str__ value that often. (Although losing the string representation when the object is deleted is unfortunate.)

Also... A tuple where we store the PK for comparison and the string representation for display here is the best case scenario. I'd be happy to implement that if we are ok with a (potentially) breaking change.

Comment on lines +1667 to +1800
def test_log_entry_created_if_obj_strings_are_same_for_two_objs(self):
"""FK changes trigger update when the string representation is the same."""
t1 = self.test_date
with freezegun.freeze_time(t1):
simple = SimpleModel.objects.create(text="Test Foo")
one_simple = SimpleModel.objects.create(text="Twinsies", boolean=True)
two_simple = SimpleModel.objects.create(text="Twinsies", boolean=False)
instance = RelatedModel.objects.create(
one_to_one=simple, related=one_simple
)

t2 = self.test_date + datetime.timedelta(days=20)
with freezegun.freeze_time(t2):
instance.related = two_simple
instance.save()

self.assertEqual(instance.history.all().count(), 2)
log_create = instance.history.filter(timestamp=t1).first()
log_update = instance.history.filter(timestamp=t2).first()
self.assertEqual(int(log_create.changes_dict["related"][1]), one_simple.id)
self.assertEqual(int(log_update.changes_dict["related"][1]), two_simple.id)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the primary fix, this test case also fails. This seems pretty likely to be happening in other installations.

self, field: Union[models.ForeignKey, models.OneToOneField], value: Any
) -> str:
"""
:return: An object display string for a given FK value and the field to which it belongs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function does not return an object

@hramezani
Copy link
Member

I think this change is a breaking change as the patch is changing the representation of the related field. we need to add changelog for that and explain the change and the breaking change.

Also, It would be great if @alieh-rymasheuski can take a look.

@sum-rock
Copy link
Contributor Author

@hramezani Sorry for the delay in returning to this. I've added a statement in the change log and updated the docstring as requested. Let me know if there is anything else I can do to help get this patch in the 2.2 release.

If it is critical to preserve the object representation in the changes field (instead of the FK), there are options but they involve more complexity and my estimation is that the complexity isn't worth it.

@hramezani
Copy link
Member

Thanks @sum-rock for addressing the comments. As I mentioned in #420 (comment) it is a breaking change and we can not include it in 2.2.
we will include it in 3.0

@aleh-rymasheuski aleh-rymasheuski added this to the 3.0 milestone Nov 14, 2022
@hramezani
Copy link
Member

@sum-rock we are preparing v3.0. Could you please refresh this PR?

The diff method now evaluates the primary keys for changes to determine
if a new LogEntry should be created. Previously, the diff method was
evaluating the string representation of the object. This was flawed
because cases can occur when a parent object has in memory changes to
its string string representation and the child related object is saved
prior to these in memory changes being persisted. In these cases a new
LogEntry object would be created erroneously. This cases is asserted
with a test and a regression test will verify the bug.

The consequence of these updates is that the ``LogEntry.changes`` field
now stores primary keys rather than string representations for related
objects. To keep the changes dictionary display unaffected by this
update, a method was added to the ``LogEntry`` model. This method looks
up the object display string from the stored foreign key. Exceptions
were written to handle backwards compatibility.
Because the string representation is not unique for every object, relying
on it to determine FK diffs may not capture all changes. This test case
shows another type of scenario that is fixed by comparing primary keys
rather than object string representations. This is likely occurring
fairly regularly but is hard to spot because it is an error of omission.
@sum-rock
Copy link
Contributor Author

@hramezani This branch is now refreshed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants