New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Escape control characters in JSON strings #427

Merged

jdisanti merged 4 commits into smithy-lang:main from jdisanti:json-escape-fix

May 27, 2021

Collaborator

jdisanti commented May 27, 2021

This escapes the control characters in range 0x00-0x1F (inclusive), and also reworks the escaping so that it should be more performant. It now only iterates over the input string exactly once, rather than once to see if escaping is necessary, and then once again to escape, and there's no longer any conversion to char.

To verify it correctly escapes everything, I ran the following additional test (not included in the PR since it takes approximately 100 seconds in a debug test run):

    #[test]
    fn all_of_them() {
        for value in 0..u32::MAX {
            if let Some(chr) = char::from_u32(value) {
                let string = String::from(chr);
                let escaped = escape_string(&string);
                let serde_escaped = serde_json::to_string(&string).unwrap();
                let serde_escaped = &serde_escaped[1..(serde_escaped.len() - 1)];
                assert_eq!(&escaped, serde_escaped);
            }
        }
    }

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.


          Escape control characters in JSON strings

rcoh reviewed

View reviewed changes

Collaborator

rcoh left a comment

Can you bring this test back?

Just mark it #[ignore]

rust-runtime/smithy-json/src/escape.rs Outdated

-                      });
-                      last = index + 1;
+              fn escape_string_inner(start: &[u8], rest: &[u8]) -> String {
+                  let mut escaped = start.to_vec();

Collaborator

rcoh May 27, 2021

I would probably try to pre-allocate enough capacity here:

let mut escaped = Vec::with_capacity(start.len() + rest.len() + 1);
escaped.copy_from_slice(start)

rust-runtime/smithy-json/src/escape.rs Outdated

                   }
-                  escaped.push_str(&value[last..end]);
-                  Cow::Owned(escaped)
+                  // Our input was originally valid UTF-8, and we didn't do anything to invalidate it

Collaborator

rcoh May 27, 2021

the safety argument is actually a little more subtle here:

We only escaped bytes that were already single character code points
We only replaced them with valid UTF-8

rust-runtime/smithy-json/src/escape.rs Outdated

+                  let mut escaped = start.to_vec();
+                  for byte in rest {
+                      match byte {
+                          b'"' => escaped.extend("\\\"".bytes()),

Collaborator

rcoh May 27, 2021

did you use bytes() here instead of b to make it clear that these were valid strings before being turned into bytes?

Collaborator Author

jdisanti May 27, 2021

No, I just didn't think to use b.

jdisanti added 2 commits

May 27, 2021 10:21


          Merge remote-tracking branch 'upstream/main' into json-escape-fix

dfa87b7


          CR feedback

rcoh approved these changes

View reviewed changes

rust-runtime/smithy-json/src/escape.rs Outdated

+                  // - The original input was valid UTF-8 since it came in as a `&str`
+                  // - Only single-byte code points were escaped
+                  // - The escape sequences are valid UTF-8
+                  debug_assert!(String::from_utf8(escaped.clone()).is_ok());

Collaborator

rcoh May 27, 2021

nit:

Suggested change

      
                debug_assert!(String::from_utf8(escaped.clone()).is_ok());
          
                debug_assert!(std::str::from_utf8(escaped).is_ok());

rust-runtime/smithy-json/src/escape.rs Outdated

                   }
                   use proptest::proptest;
                   proptest! {
                       #[test]
-                      fn matches_serde_json(s: String) {
+                      fn matches_serde_json(s in ".*") {
                           assert_eq!(
                               serde_json::to_string(&s).unwrap(),
                               format!(r#""{}""#, escape_string(&s))

Collaborator

rcoh May 27, 2021

Suggested change

      
                            format!(r#""{}""#, escape_string(&s))
          
                            escape_string(&s)

Cow and String are comparable


          CR feedback

774c55e

rcoh reviewed

View reviewed changes

rust-runtime/smithy-json/src/escape.rs

@@ @@ -70,10 +70,9 @@ mod test { @@
                   proptest! {
                       #[test]
                       fn matches_serde_json(s in ".*") {
-                          assert_eq!(
-                              serde_json::to_string(&s).unwrap(),
-                              format!(r#""{}""#, escape_string(&s))

Collaborator

rcoh May 27, 2021

you probably want to enable rustfmt on save

Collaborator Author

jdisanti May 27, 2021

Looks like the proptest macro hid the assert_eq from the formatter.

jdisanti merged commit 1b5453f into smithy-lang:main

jdisanti deleted the json-escape-fix branch

June 1, 2021 16:39

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet