Skip to content

Commit

Permalink
[lang] Add escape syntax for field names
Browse files Browse the repository at this point in the history
Related to rcoh#99

Currently field names containing a space or period, e.g. `date received`
or `grpc.method`, cannot be parsed. This could be worked around using
`jq` or similar tools to rewrite the field name, but that's a pain.

This commit adds an escaped field name syntax of `["<FIELD>"]` which is
based on the Object Identifier-Index syntax[0] used by `jq`, so it
should be somewhat familiar to many people who parse JSON on the
command line.

The more obvious option of delimiting with just quotes, e.g.
"date received", creates an ambiguity between string literals and
escaped field names. For example, does `where foo == "date received"`
mean field `foo` matches field `date received`, or field `foo` matches
the string "date received"?

Example query:

```
* | json | where ["grpc.method"] == "Foo" | count by ["date received"]
```

[0]
https://stedolan.github.io/jq/manual/#ObjectIdentifier-Index:.foo,.foo.bar
  • Loading branch information
Will Chandler committed Jul 23, 2021
1 parent e00d6ab commit c4f4678
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 1 deletion.
25 changes: 24 additions & 1 deletion src/lang.rs
Original file line number Diff line number Diff line change
Expand Up @@ -366,6 +366,14 @@ fn is_ident(c: char) -> bool {
is_alphanumeric(c as u8) || c == '_'
}

fn is_escaped_ident(c: char) -> bool {
match c {
space if is_space(space as u8) => true,
'.' => true,
_ => is_ident(c),
}
}

fn starts_ident(c: char) -> bool {
is_alphabetic(c as u8) || c == '_'
}
Expand Down Expand Up @@ -431,12 +439,20 @@ named!(column_ref<Span, Expr>, do_parse!(
(Expr::Column { head: DataAccessAtom::Key(head), rest: rest })
));

named!(ident<Span, String>, do_parse!(
named!(ident<Span, String>, alt!(bare_ident | escaped_ident));

named!(bare_ident<Span, String>, do_parse!(
start: take_while1!(starts_ident) >>
rest: take_while!(is_ident) >>
(start.fragment.0.to_owned() + rest.fragment.0)
));

named!(escaped_ident<Span, String>, do_parse!(
start: preceded!(tag!("[\""), take_while1!(starts_ident)) >>
rest: terminated!(take_while!(is_escaped_ident), tag!("\"]")) >>
(start.fragment.0.to_owned() + rest.fragment.0)
));

named!(arguments<Span, Vec<Expr>>, add_return_error!(SyntaxErrors::StartOfError.into(), delimited!(
tag!("("),
separated_list!(tag!(","), expr),
Expand Down Expand Up @@ -1166,6 +1182,13 @@ mod tests {
expect_fail!(ident, "5x");
}

#[test]
fn parse_escaped_ident() {
expect!(ident, "[\"hello world\"]", "hello world".to_string());
expect!(ident, "[\"hello.world\"]", "hello.world".to_string());
expect_fail!(ident, "\"\"");
}

#[test]
fn parse_var_list() {
expect!(
Expand Down
13 changes: 13 additions & 0 deletions tests/structured_tests/escaped_ident.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
query = """
* | json | count by ["grpc.method"], ["start time"]
"""
input = """
{"start time": "today", "grpc.method": "Foo"}
{"start time": "today", "grpc.method": "Bar"}
"""
output = """
["grpc.method"] ["start time"] _count
-----------------------------------------------------------
Bar today 1
Foo today 1
"""

0 comments on commit c4f4678

Please sign in to comment.