Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter work summary to remove non-xml safe characters (PP-1969) #2198

Merged
merged 2 commits into from
Dec 3, 2024

Conversation

jonathangreen
Copy link
Member

@jonathangreen jonathangreen commented Nov 27, 2024

Description

Add some filtering to our set_summary function to filter out xml unsafe characters and add a DB migration to remove these characters from our existing database.

Motivation and Context

Seeing Exception in web app: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters in our logs.

Traceback (most recent call last):
  File "src/lxml/builder.py", line 161, in lxml.builder.ElementMaker.__init__.add_text
  File "src/lxml/etree.pyx", line 1202, in lxml.etree._Element.__getitem__
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/var/www/circulation/env/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
  File "/var/www/circulation/env/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/var/www/circulation/src/palace/manager/api/routes.py", line 120, in decorated
    return f(*args, **kwargs)
  File "/var/www/circulation/src/palace/manager/api/routes.py", line 93, in wrapped_function
    resp = make_response(f(*args, **kwargs))
  File "/var/www/circulation/src/palace/manager/core/app_server.py", line 87, in decorated
    v = f(*args, **kwargs)
  File "/var/www/circulation/src/palace/manager/core/app_server.py", line 163, in compressor
    return f(*args, **kwargs)
  File "/var/www/circulation/src/palace/manager/api/routes.py", line 239, in acquisition_groups
    return app.manager.opds_feeds.groups(lane_identifier)
  File "/var/www/circulation/src/palace/manager/api/controller/opds_feed.py", line 100, in groups
    return feed_class.groups(
  File "/var/www/circulation/src/palace/manager/feed/opds.py", line 64, in as_response
    serializer.serialize_feed(
  File "/var/www/circulation/src/palace/manager/feed/serializer/opds.py", line 102, in serialize_feed
    element = self.serialize_work_entry(entry.computed)
  File "/var/www/circulation/src/palace/manager/feed/serializer/opds.py", line 180, in serialize_work_entry
    entry.append(OPDSFeed.E("summary", feed_entry.summary.text))
  File "src/lxml/builder.py", line 221, in lxml.builder.ElementMaker.__call__
  File "src/lxml/builder.py", line 163, in lxml.builder.ElementMaker.__init__.add_text
  File "src/lxml/etree.pyx", line 1065, in lxml.etree._Element.text.__set__
  File "src/lxml/apihelpers.pxi", line 749, in lxml.etree._setNodeText
  File "src/lxml/apihelpers.pxi", line 737, in lxml.etree._createTextNode
  File "src/lxml/apihelpers.pxi", line 1530, in lxml.etree._utf8
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

Looking at the data in our database, these are only coming in via the Enki integration, but it seemed like a good idea to make sure they can't make it in though any integration.

How Has This Been Tested?

  • Tested locally
  • Running unit tests

Checklist

  • I have updated the documentation accordingly.
  • All new and existing tests passed.

@jonathangreen jonathangreen added bug Something isn't working DB migration This PR contains a DB migration labels Nov 27, 2024
@jonathangreen jonathangreen requested a review from a team November 27, 2024 20:51
Copy link

codecov bot commented Nov 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.09%. Comparing base (be1b50e) to head (871af2f).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2198   +/-   ##
=======================================
  Coverage   91.09%   91.09%           
=======================================
  Files         363      363           
  Lines       41215    41221    +6     
  Branches     8830     8833    +3     
=======================================
+ Hits        37544    37550    +6     
  Misses       2406     2406           
  Partials     1265     1265           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@tdilauro tdilauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥💣 \u0000\u0001\u000C

@jonathangreen jonathangreen force-pushed the bugfix/non-xml-safe-characters-in-summary branch from 5f2130c to 871af2f Compare December 3, 2024 10:49
@jonathangreen jonathangreen merged commit e7ae364 into main Dec 3, 2024
21 checks passed
@jonathangreen jonathangreen deleted the bugfix/non-xml-safe-characters-in-summary branch December 3, 2024 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working DB migration This PR contains a DB migration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants