Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add serialized object field #412

Merged

Conversation

sum-rock
Copy link
Contributor

@sum-rock sum-rock commented Aug 8, 2022

This work creates the basic implementation of the LogEntry.serialized_object field. This field tracks the state of an object following a create, update, or delete action. A manager method is also added to retrieve the value of an object's field at a given date.

Efforts were taken to make sure this addition is not disruptive to current installations of django-auditlog. Specifically, this is an opt-in feature per model. Users are able to opt in by adding the serialize_data=True kwarg to the model registration.

This relates to issue 410.

@codecov
Copy link

codecov bot commented Aug 8, 2022

Codecov Report

Merging #412 (509d0ee) into master (a00d2c2) will increase coverage by 0.44%.
The diff coverage is 94.91%.

@@            Coverage Diff             @@
##           master     #412      +/-   ##
==========================================
+ Coverage   91.79%   92.24%   +0.44%     
==========================================
  Files          23       24       +1     
  Lines         646      709      +63     
==========================================
+ Hits          593      654      +61     
- Misses         53       55       +2     
Impacted Files Coverage Δ
auditlog/models.py 87.12% <93.61%> (+1.64%) ⬆️
...ditlog/migrations/0011_logentry_serialized_data.py 100.00% <100.00%> (ø)
auditlog/registry.py 98.51% <100.00%> (+0.09%) ⬆️
auditlog/middleware.py 100.00% <0.00%> (+7.14%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sum-rock
Copy link
Contributor Author

sum-rock commented Aug 8, 2022

This PR contains a migration. If the wonderful maintainers of django-auditlog would help my team out tremendously by adding this to their project, we would need to be mindful of any other PR's with migrations that might also make it into the release. For example, #407 also contains a migration and we'd need to coordinate that with this one so that they don't have shared dependencies.

I'm also aware that I'd need to add something to the change log here. Let me know what you all think of this PR and if you're feeling positive about it, I'll write something up.

@hramezani
Copy link
Member

Thanks @sum-rock for the PR 👍

I am still not sure about the idea of the feature.
I will back to the PR later. Unfortunately, I don't have time to check it now.

@alieh-rymasheuski @Linkid It would be great if you have time to take a look at this PR.

A field was added to the LogEntry model to store a serialized dictionary
of key value pairs representing the state of the object after the
create, update, or delete action. This field is populated optionally
using an `auditlog.regester` kwarg.
@sum-rock sum-rock force-pushed the add-serialized-object-field branch 2 times, most recently from e0641e5 to d0ec818 Compare August 10, 2022 01:17
auditlog/models.py Outdated Show resolved Hide resolved
@sum-rock
Copy link
Contributor Author

Given some of the feedback here, I've made a few adjustments. I'm still optimistic that you all may still be favorable to including the serialized_data field in the auditlog project. When I think about the typical use cases of an audit log, I think that having an option to access a full account of an objects state, either prior to or following a change, would be helpful in understanding not only what changes were made, but also why they might have been made. This is a concept that is embraced by other audit logging systems such as Papertrail. However, there isn't a good option for Django that we are aware of. Django auditlog is so darn close to having everything that anyone would ever need in an audit log! We just want to help put the cherry on top of the Sunday here.

I can understand if the contents of the HistoricalStateLookupManagerMixin is too heavy handed (although I think people would use it to avoid boilerplate code and more easily find the object or field states they are looking for). So if that is not in keeping with the vision of django auditlog, I can definitely extract it from this PR.

Please let me know if you all have any other concerns that could be mitigated.

@hramezani
Copy link
Member

Thanks @sum-rock for the update. I will take a look whenever I have time.

auditlog/history.py Outdated Show resolved Hide resolved
auditlog/history.py Outdated Show resolved Hide resolved
auditlog/history.py Outdated Show resolved Hide resolved
@hramezani
Copy link
Member

To me, the change has two parts

  1. Adding a JSON field to store the JSON repr of the object. (sounds good)
  2. The HistoricalStateLookupManagerMixin class, is about how we use the stored data. there are two methods there for finding the log record and finding a field value from the log at a specific time.

I think we can remove the HistoricalStateLookupManagerMixin from this path because:

  • I think the idea of this library is to store the changes, not how to use them.
  • Finding a log entry at a given timestamp is not a big thing and users of the library can easily do it
  • The currently added functions are not complete. we only have a filter for one timestamp with __lte and maybe we need another filter for __gte, __lt, __gt, ... . we may want to get multiple fields value not just one field value. we may want to apply filters based on user and action as well. In the end I would say the functions are not generic enough to cover all the needs

I would like to know what do you think @sum-rock and @rposborne?

@sum-rock
Copy link
Contributor Author

To me, the change has two parts

  1. Adding a JSON field to store the JSON repr of the object. (sounds good)
  2. The HistoricalStateLookupManagerMixin class, is about how we use the stored data. there are two methods there for finding the log record and finding a field value from the log at a specific time.

Absolutely. I wanted to decouple these things as much as possible while keeping them in the same PR. (In hind sight, maybe that wasn't the correct approach).

I think we can remove the HistoricalStateLookupManagerMixin from this path because:

  • I think the idea of this library is to store the changes, not how to use them.
  • Finding a log entry at a given timestamp is not a big thing and users of the library can easily do it

I generally agree with this point. I wasn't going to add anything for that PR initially but decided to attempt to incorporate something to both show the value of the serialized_data field and to hopefully reduce an end user's boiler plate code. The manager methods also handle the cases of missing data which is less trivial but, yes, it's not a huge lift for end users to implement themselves either.

  • The currently added functions are not complete. we only have a filter for one timestamp with __lte and maybe we need another filter for __gte, __lt, __gt, ... . we may want to get multiple fields value not just one field value. we may want to apply filters based on user and action as well. In the end I would say the functions are not generic enough to cover all the needs

I'm not sure about needing a separate method for the different query filters because we are looking for the state of an object or field at a given time and that's accomplished with __lte. But I do hear you that the historical lookup methods aren't as full featured as some users may want or need them to be. If you commit to including these retrieval methods, you may find yourself on a slippery slope with end users who want more.

Overall, I think I'm in alignment with your points here @hramezani. You'd like auditlog to focus on recording actions and changes rather than on having full featured queries for the retrieval of those records. I think that is completely valid. I'm going to remove the HistoricalStateLookupManagerMixin from this PR and that should reduce both the surface area of change and burden of maintenance greatly.

@hramezani
Copy link
Member

Thanks for your understanding @sum-rock

Test cases were added to assert the serialized data field is functioning
as intended and that the registration kwargs are implemented as written.
@hramezani
Copy link
Member

Why did you add the changes from the other PR here?
It makes it hard to review. We can first review this one and it can be merged to master because it is backward compatible. Then you could refresh the other PR to fix the migration dependency

@sum-rock
Copy link
Contributor Author

@hramezani I really apologize for that. I had some git madness on my end and unintentionally made that happen. It's fixed now. I was hoping it would be fixed before you'd notice 🙈 .

@hramezani
Copy link
Member

Please add documentation into usage after Many-to-many fields section.

auditlog/registry.py Outdated Show resolved Hide resolved
auditlog/registry.py Outdated Show resolved Hide resolved
auditlog_tests/tests.py Outdated Show resolved Hide resolved
auditlog/registry.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
auditlog/models.py Show resolved Hide resolved
auditlog/models.py Outdated Show resolved Hide resolved
auditlog/models.py Outdated Show resolved Hide resolved
auditlog/models.py Outdated Show resolved Hide resolved
auditlog/models.py Outdated Show resolved Hide resolved
auditlog/models.py Outdated Show resolved Hide resolved
A few redundant lines were able to be elimited from the set logic.
Statment now takes the set of included fields (or all fields if no
included list was found) and removes the fields present in the excluded
list.
@hramezani
Copy link
Member

@alieh-rymasheuski as you left some comments on this PR, It would be great if you check the updates here.

auditlog/models.py Outdated Show resolved Hide resolved
auditlog/models.py Show resolved Hide resolved
auditlog/models.py Outdated Show resolved Hide resolved
auditlog/models.py Outdated Show resolved Hide resolved
auditlog/models.py Show resolved Hide resolved
The `_mask_serialized_fields` method had been receiving a all model
field options but only required the `mask_fields` value. The relative
import for `mask_str` function was also changed to be an absolute
import.
@aleh-rymasheuski
Copy link
Contributor

@hramezani, passing this pull request back to you. I don't have objections to merging this pull request.

@hramezani
Copy link
Member

@hramezani, passing this pull request back to you. I don't have objections to merging this pull request.

Thanks @alieh-rymasheuski for the review. I will take a final look later and will merge it.

@sum-rock
Copy link
Contributor Author

sum-rock commented Aug 16, 2022

@hramezani Before you guys merge. I'm looking into something that's coming up with implementing this in our system. The issue occurs if datetimes are written as strings on in-memory model fields. Without serialize_data=True, the DB will type cast successfully. However when updating, auditlog is sending a pre-save signal and the serializer will attempt to serialize a string as if it is a datetime and thus fail. Please standby as I am currently trouble shooting.

It may be advisable to add a cls=LazyEncoder to the serializer by default? This could be something that that requires solutions all on our side and auditlog with this PR is fine. I'll know more within a few hours.

The Django core serializer assumes that the values on object fields are
correctly typed to their respective fields. Updates made to an object's
in-memory state may not meet this assumption. To prevent this violation, values
are typed by calling `to_python` from the field object, the result is set on a
copy of the instance and the copy is sent to the serializer.

For example, a datetime parsed as a string and set on a `models.DateTimeField()`
field is normally written to the database as a string and set to a python
datetime object when read on the model. Therefore, no error is raised on save
when a datetime field contains a string formatted datetime. The serializer,
however, expects a datetime object and will attempt to call `isoformat()` to
convert the datetime object to a string which then raises an attribute error.
@sum-rock
Copy link
Contributor Author

sum-rock commented Aug 17, 2022

@hramezani @alieh-rymasheuski I was running our testing suite with these changes this morning and noticed a few unexpected errors. After fully untangling the stacktrace this afternoon I believe I've found the issue and put together a correction. I do think it is best to put the correction in this PR rather than relying on developers to correct in their own code. Avoiding any disruption, even if users need to opt into this new feature, is in everyone's interest.

Take a look at what I've written and let me know your thoughts. I've tried to include an explanation in the doc string and commit message so I'll avoid repeating it here. One thing I might add is that the try/except around the deepcopy was done to avoid raising TypeErrors for missing arguments that are required when attempting to copy some objects. The instance_copy is preferred because I'd like to avoid mutating something before saving it possible. However, I don't think it's critical because the model save and model instantiation process cycle will do a similar type transformation in the same way anyway.

Thanks again for your time and dedication, our entire team is very grateful.

auditlog/models.py Outdated Show resolved Hide resolved
auditlog/models.py Outdated Show resolved Hide resolved
auditlog/models.py Outdated Show resolved Hide resolved
@hramezani hramezani added this to the 2.2 milestone Aug 17, 2022
@aleh-rymasheuski
Copy link
Contributor

aleh-rymasheuski commented Aug 17, 2022

@hramezani why milestone 2.2? This seems to be a backwards-compatible change so it can be include in a 1.X feature release.

UPD: please ignore this message, I forgot that auditlog version was already 2.X.

@hramezani hramezani removed this from the 2.2 milestone Aug 17, 2022
@hramezani
Copy link
Member

@hramezani why milestone 2.2? This seems to be a backwards-compatible change so it can be include in a 1.X feature release.

@alieh-rymasheuski why 1.x? Latest version for django-auditlog is 2.1.1 https://pypi.org/project/django-auditlog/
I would like to include this change in the next release 2.2

Removed an unnecessary call to copy the data sent to the mask serialize
data method. An in method import was also changed to be absolute rather
than relative.
With this strategy, a value is not require to determine if the field
should be skipped when performing the ``to_python`` call. This is
preferred because there are a few edge cases around creating log
entries as a result of a cascade delete. In those cases, there are
situations when the system attempts to get a value that has already been
deleted.
@sum-rock
Copy link
Contributor Author

sum-rock commented Aug 18, 2022

Added a two line change due to an edge case. This wasn't happening in normal cascade delete circumstances but it showed up in "real world" use cases where there are lots of interdependencies. We're rolling this out into production here within the next few days so feel confident that if there are any other bugs, I'll be on to squash them.

@hramezani hramezani closed this Aug 21, 2022
@hramezani hramezani reopened this Aug 21, 2022
@hramezani
Copy link
Member

Thanks @sum-rock for your effort!

@hramezani hramezani merged commit 777bd53 into jazzband:master Aug 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants