-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⬆️ [server] Upgrade to the latest everything #856
Comments
Looks like the last full upgrade was The big part of the work took ~ 2 days but the final commit was closer to a week later. |
While upgrading PyMongo from 3.11.0 to 4.3.3, I have been getting these errors: The solution for it can be found here: https://pymongo.readthedocs.io/en/stable/examples/uuid.html#handling-uuid-data-example We are using 'python_legacy' instead of 'standard' UUID representation here because standard UUID representation should be used by new applications or applications that are encoding and/or decoding UUIDs in MongoDB for the first time. Python_legacy should be used by already existing applications using python drivers to encode and/or decode. Ours is a python application that has already been storing data in databases for many months. It is NOT "encoding or decoding for the first time". If we used standard, although the unit tests might run, it would break in production. |
After handling UUID representation using 'python_legacy', numerous test cases have started failing. Some of the failing test case errors are : As we can see the last line of the error, one user_id is binary and another one is not. Therefore, I believe I am not handling UUID representation somewhere |
@swastis10 have you tried deleting and recreating your database? If there are any pending binary entries from your prior experiment with setting it to STANDARD, they may be messing up the lookups. If you do that and the tests still fail, I agree that you need to look through all locations and convert them |
The solution can be found at: PyMongo 4.x does not have collections.remove anymore therefore I will change it to delete_one or delete_many |
Approach: |
Using |
This seems like it is very close and is basically related to arithmetic. |
@shankari Yes! In Python 2.x, If you divide one integer by another you get an integer result. So 5/2 = 2 instead of 2.5. You get floor division, not true division (Python – Changing the Division Operator). In Python 3, true division is the default but in Python 2.x you need to make one of the numbers a float to get a float returned. So 5.0/2 = 2.5 |
@swastis10 yes, I had to deal with that during the python2 -> python3 migration (which was more complex than this BTW). But my question was whether there was any change in the recent versions of python aka the ones affected by this upgrade? |
On line 1141 of The output of my branch is: |
@swastis10 I'm squinting really hard, but it clearly looks like some kind of arithmetic issue - e.g.
BTW, thanks for formatting to make that easier to see. Note also that the new values are always lower by around 0.0001 I would suggest two approaches going forward:
|
I can see that I am upgrading from python 3.7.12 to 3.9.16
I put log statements in the code and noticed that I traced back in the code and realised that Upon lower the version of GeoJson to 2.4.1, the test case starts passing. This default precision is an intentional change in GeoJson v2.5 - jazzband/geojson#135 Proposed solution:
|
Caused due to upgrade in Arrow dependency. |
Caused due to Pandas update.
According to the new version of Pandas, we cannot use |
Caused due to |
|
Fix -
|
|
|
wrt #856 (comment) (the geojson precision issue), I vote for whatever the long-term plans for the geojson community are. I looked at the related issue jazzband/geojson#135, and it looks like the long-term plans have changed. The original plan was to have the truncated precision be the long term. So we should work around this by using the suggested global monkey patch We should add link to this comment in the commit message |
happens when we try to serialize a document read from MongoDB directly (it will have a binary objectID and a binary UUID) without using the
|
Fix:
|
Caused due to upgrade in Arrow dependency. |
`FAIL: testResetToPastWithCrash (pipelineTests.TestPipelineReset.TestPipelineReset)
|
This seems similar to the |
The |
this error is because of this line
Before @swastis10's changes, the The fix for this is to use the The answer is to use the
|
This error is originating from
I understand this error is occuring due to the pandas upgrade.
The test case is fixed because it is returning whatever we are saving in the document. |
This is happening due to the change in Pandas |
We are using 'python_legacy' instead of 'standard' UUID representation here because standard UUID representation should be used by new applications or applications that are encoding and/or decoding UUIDs in MongoDB for the first time. Python_legacy should be used by already existing applications using python drivers to encode and/or decode. Ours is a python application that has already been storing data in databases for many months. It is NOT "encoding or decoding for the first time". If we used standard, although the unit tests might run, it would break in production. Contains fix for the issue: e-mission/e-mission-docs#856 (comment)
For more details- jazzband/geojson#135 It should be removed in the next server update, by which time hopefully the new geojson version will incorporate the long-term fix See - jazzband/geojson#177 Contains fix for the issue: e-mission/e-mission-docs#856 (comment)
Yes, @shankari it does. Upon debugging and comparing the issue with master, I noticed that there were many coordinates which were being truncated to 7 decimal places whereas in master we had in upto 12 significant places so I bumped it upto 15 places which fixed the issue. |
We can only pass argument "on" OR "left_index" and "right_index", not a combination of both in pd.merge() Fix for the issue: e-mission/e-mission-docs#856 (comment)
@shankari upon going through the logs I found a discrepancy: 2023-03-06 14:49:45,321:DEBUG:curr_query = {'user_id': UUID('aa9fdec9-2944-446c-8ee2-50d79b3044d3'), '$or': [{'metadata.key': 'analysis/confirmed_trip'}]}, sort_key = metadata.write_ts The. number of output with and without user_id is different |
@swastis10 that's weird. Again, can I see the code that generated those log statements? |
@shankari Logs: |
@swastis10 the reason is right there in the logs |
@shankari Potential reason - bson.json_util serializes the $uuid as subtype 4 (Binary) and not subtype 3. Solution - I can try to deserialize '$uuid' in the document using UUID representation.
Code Snippet- Logs- |
BSON binary subtype 3 is a legacy UUID format. By default, JSON is deserialized as subtype 4. With this fix, it will deserialize $uuid as binary subtype 3 Fix for - e-mission/e-mission-docs#856 (comment)
@swastis10 this sounds like a good reason and solution. Have you tried it yet?
Also, if this is the problem, then why aren't we running into similar issues in It loads files from
|
@shankari I took at look at
where Potential Fix: To avoid this Binary subtype 3 and 4 confusion in the future, In
with
|
@swastis10 yes, please make the change as well |
BSON binary subtype 3 is a legacy UUID format. By default, JSON is deserialized as subtype 4. With this fix, it will deserialize $uuid as binary subtype 3 See Issue - e-mission/e-mission-docs#856 (comment)
While deploying in the NREL environment, DB connection fails with
|
fixed by changing to Next error is
@swastis10 can you fix this? You should test by connecting the phone and server and running it end to end |
here's a related issue on the bottle repo: you should also be able to just test using |
We cannot encode native uuid.UUID with UuidRepresentation.UNSPECIFIED therefore defining uuid_representation as PYTHON_LEGACY fixes the issue Contains fix for - e-mission/e-mission-docs#856 (comment)
We cannot encode native uuid.UUID with UuidRepresentation.UNSPECIFIED therefore defining uuid_representation as PYTHON_LEGACY fixes the issue. Contains fix for - e-mission/e-mission-docs#856 (comment)
The server upgrade is done, closing issue |
Untested fallout of e-mission/e-mission-docs#856
…sentation to LEGACY - Created a simple class (`json_wrappers.py`) which encapsulates the correct settings that we need for serializing and deserializing the JSON representations with the correct UUID representation - Changed all the references in the `emission` directory to use the new wrapper functions - Including cleaning up prior hardcoded attempts at cleanup, such as e-mission@3b456e7#diff-cfddece09bbf132974a13acdfb77be4e585a4bff39999f559dc8200c1ffbe78d Additional Context: - e-mission/e-mission-docs#856 (comment) - https://pymongo.readthedocs.io/en/stable/examples/uuid.html - e-mission@9c683cf - e-mission@6ac02a2 - e-mission@edd8b77 As of 4.3.3, the LEGACY_JSON_OPTIONS also has an UNSPECIFIED UUID representation > bson.json_util.LEGACY_JSON_OPTIONS: bson.json_util.JSONOptions = > JSONOptions(strict_number_long=False, datetime_representation=0, > strict_uuid=False, json_mode=0, document_class=dict, tz_aware=False, > uuid_representation=UuidRepresentation.UNSPECIFIED, > unicode_decode_error_handler='strict', tzinfo=None, > type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), > datetime_conversion=DatetimeConversion.DATETIME)¶ Testing done: - Ensured that there were no imports of json_utils ``` $ find emission -name \*.py | xargs grep json_utils $ ``` - Ensured that all `bju` prefixes were replaced, other than in the wrapper itself ``` $ find emission -name \*.py | xargs grep bju emission/storage/json_wrappers.py:import bson.json_util as bju emission/storage/json_wrappers.py:wrapped_object_hook = lambda s: bju.object_hook(s, emission/storage/json_wrappers.py: json_options = bju.RELAXED_JSON_OPTIONS.with_options( emission/storage/json_wrappers.py:wrapped_default = lambda o: bju.default(o, json_options = bju.LEGACY_JSON_OPTIONS) emission/storage/json_wrappers.py:wrapped_dumps = lambda o: bju.dumps(o, json_options = bju.LEGACY_JSON_OPTIONS.with_options( emission/storage/json_wrappers.py:wrapped_loads = lambda s: bju.loads(s) ``` - Ensured that all `esj` imports used the `wrapped` version of the name ``` $ find emission -name \*.py | xargs grep esj | egrep -v "import|esj.wrapped" | wc -l 0 ```
…sentation to LEGACY Changed all the references in the `bin` scripts to also use the new wrappers to serialize/deserialize UUID objects. This is a follow-on to e-mission@51a6881 which also has the additional context Additional Context: - e-mission/e-mission-docs#856 (comment) - https://pymongo.readthedocs.io/en/stable/examples/uuid.html - 9c683cf - 6ac02a2 - edd8b77 As of 4.3.3, the LEGACY_JSON_OPTIONS also has an UNSPECIFIED UUID representation > bson.json_util.LEGACY_JSON_OPTIONS: bson.json_util.JSONOptions = > JSONOptions(strict_number_long=False, datetime_representation=0, > strict_uuid=False, json_mode=0, document_class=dict, tz_aware=False, > uuid_representation=UuidRepresentation.UNSPECIFIED, > unicode_decode_error_handler='strict', tzinfo=None, > type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), > datetime_conversion=DatetimeConversion.DATETIME)¶ Testing done: - Ensured that there were no imports of json_utils ``` $ find bin -name \*.py | xargs grep json_utils $ ``` - Ensured that all `bju` prefixes were replaced ``` $ find bin -name \*.py | xargs grep bju $ ``` - Ensured that all `esj` imports used the `wrapped` version of the name ``` $ find bin -name \*.py | xargs grep esj | egrep -v "import|esj.wrapped" $ ```
This is tracking the server upgrade to the most recent version of everything.
This will involve upgrade miniconda, and all the dependencies
The versions are all defined in files in
setup
After upgrade, make sure that all the unit tests pass, and some basic end-to-end tests from the phone to the server still work
If you wanted to clean up the
setup
directory to remove obsolete files (that have not been edited for ~ 3 years) that would be great as well.The text was updated successfully, but these errors were encountered: