Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle special characters in display of form submissions, so e.g. curly apostrophe does not show up as ' #245

Closed
rl-utility-man opened this issue Jun 10, 2024 · 3 comments

Comments

@rl-utility-man
Copy link

No description provided.

@signebedi
Copy link
Owner

signebedi commented Jun 10, 2024

This is a reasonable ask. We use markupsafe to escape all inputs across the board. This is a security feature but comes with plenty of drawbacks. This is a down-stream consequence of the escaping we implemented in #125, which makes it the default behavior to escape form data on read operations.

An alternative is to add more robust HTML escaping solutions to sanitize data on writes, instead of escaping output on reads.

  1. HTML sanitizer https://github.com/matthiask/html-sanitizer/
  2. Ammonia https://github.com/messense/nh3
  3. Bleach (deprecated, unfortunately) bleach is deprecated; statement on project going forward (2023-01-23) mozilla/bleach#698
  4. Unidecode https://github.com/avian2/unidecode

I also have a very basic sanitation function that we use for the documentation:

def escape_unsafe_html(html_content):
    """Escapes unsafe HTML patterns in the provided content.

    Args:
        html_content (str): The HTML content to sanitize.

    Returns:
        str: The sanitized HTML content.
    """
    # Dictionary of unsafe patterns and their escaped equivalents.
    replacements = {
        "<script": "&lt;script",
        "</script>": "&lt;/script&gt;",
        "<iframe": "&lt;iframe",
        "</iframe>": "&lt;/iframe&gt;",
        "javascript:": "javascript&#58;",
        "onerror=": "onerror&#61;",
        "onload=": "onload&#61;"
    }

    # Escape each unsafe pattern.
    for unsafe, safe in replacements.items():
        html_content = html_content.replace(unsafe, safe)

    return html_content

This is very basic but can be expanded if we don't want to add an entire new dependency...

@signebedi
Copy link
Owner

Using html-sanitize still escapes ampersands
This is stemming from an issue in the html-sanitize library, see matthiask/html-sanitizer#46. We can fix-in-place by unescaping escaped ampersands.

@signebedi
Copy link
Owner

[bug] html-sanitize config allows links in form submissions
This is probably entirely because of the config we use. Specifically, we use the same html-sanitize config for form data as we use for parsing docs, privacy message, and the homepage message. These should by all accounts employ different configurations, if for no other reason because the people we expect to be submitting forms (general end users with varying degrees of scoped system access) are meaningfully different from those with access to change the docs, privacy policy, and homepage message (system admins, all of whom must, as a core assumption, have higher levels of system access). As such, we can probably just create two sanitizer objects in libreforms_fastapi.utils.docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants