Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config File Syntax: Extend Embedded Ruby Code support for Hashes and Arrays #4580

Merged
merged 8 commits into from
Sep 24, 2024

Conversation

Athishpranav2003
Copy link
Contributor

@Athishpranav2003 Athishpranav2003 commented Aug 6, 2024

Which issue(s) this PR fixes:
Fixes #4385

What this PR does / why we need it:
Support Embedded Ruby Code for Hashes and Arrays.

key ["foo","#{1 + 1}"] => key ["foo","2"]
key {"foo":"#{1 + 1}"} => key {"foo":"2"}

This is not backward compatible.
We can disable this feature by surrounding the entire value with single quotes.

key `["foo","#{1 + 1}"]` => key ["foo","#{1 + 1}"]
key `{"foo":"#{1 + 1}"}` => key {"foo":"#{1 + 1}"}

Note: this feature is for JSON literals

This feature is for literals that using [] or {}.

  • We need to sort out the literal and value of the Fluentd Config Syntax.
    • Fluentd first interprets literal by parsing the config file.
    • literal is String or JSON.
    • Then, Fluentd interprets value by parsing the literal.
    • value has various types.
  • This feature has nothing to do with parsing the value.

For example, we can specify Array/Hash value by using String literal
We don't necessarily need to use JSON literal to specify Array/Hash value.
The following formats works from previous versions, and there is no specification change at all.

key foo,bar
key foo:bar
key "foo,#{1 + 1}"
key "foo:#{1 + 1}"

Note: support only String of JSON

For example, this does not support Number of JSON.

key ["foo",2] => key ["foo",2]
key ["foo",#{1 + 1}] => ERROR

It is because JSON literal supports multiple lines and comments.
# not in String is considered the start of a comment line.

key ["foo", # comment
"bar", # comment
"boo" # comment
]

Docs Changes:
TODO

Release Note:

  • Config File Syntax: Extend Embedded Ruby Code support for Hashes and Arrays
    • Example: key {"foo":"#{1 + 1}"} => key {"foo":"2"}
    • Please note that this is not backward compatible, although we assume that this will never affect to actual existing configs.
    • In case the behavior changes unintentionally, you can disable this feature by surrounding the entire value with single quotes.
      • key '{"foo":"#{1 + 1}"}' => key {"foo":"#{1 + 1}"}

@Athishpranav2003
Copy link
Contributor Author

@ashie please check this whenever you are free

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR!

As I commented on the original issue, we need to consider the specifications in LiteralParser carefully.
I'd like to think about it some more, but at the moment it doesn't seem like a good design decision to give types.rb the same functionality as LiteralParser.

I would like to hear from other maintainers.

#4385 (comment)

We need to consider this as an issue of Embedded Ruby Code. Using environment variables is just one way of it.

Embedded Ruby Code feature is implemented in LiteralParser:

def eval_embedded_code(code)
if @eval_context.nil?
parse_error! "embedded code is not allowed in this file"
end
# Add hostname and worker_id to code for preventing unused warnings
code = <<EOM + code
hostname = Socket.gethostname
worker_id = ENV['SERVERENGINE_WORKER_ID'] || ''
EOM
begin
@eval_context.instance_eval(code)
rescue SetNil
nil
rescue SetDefault
:default
end
end

elsif skip(/\#\{/)
string << eval_embedded_code(scan_embedded_code)
skip(/\}/)

Since it is for double_quoted_string, we need to use double quotes for the entire value to use this feature.

It may be possible to address this as a types problem, as in #4580, but the specifications in LiteralParser need to be carefully considered.

@ashie
Copy link
Member

ashie commented Aug 6, 2024

but at the moment it doesn't seem like a good design decision to give types.rb the same functionality as LiteralParser.

I don't yet see the detail but I feel same smell.

@Athishpranav2003
Copy link
Contributor Author

@daipom do you suggest that we should handle this scenario in literal parser.rb?

@daipom
Copy link
Contributor

daipom commented Aug 6, 2024

@daipom do you suggest that we should handle this scenario in literal parser.rb?

Yes. We should start by considering whether there are points in LiteralParser that can be improved.

@Athishpranav2003
Copy link
Contributor Author

Oh ok
let me look at it today and check if i can move this processing to it

@daipom
Copy link
Contributor

daipom commented Aug 6, 2024

@Athishpranav2003 Thanks so much!

@Athishpranav2003 Athishpranav2003 force-pushed the string-interpolation-fix branch 2 times, most recently from 8e43b54 to 4ed590c Compare August 6, 2024 11:10
@Athishpranav2003 Athishpranav2003 requested a review from daipom August 6, 2024 11:54
@Athishpranav2003
Copy link
Contributor Author

@daipom ignore the prev comments
I have corrected it and pushed
Also tested locally in my side. Please check it and let me know if there is any issue

@Athishpranav2003
Copy link
Contributor Author

@daipom not sure why 2 workflows alone failed, could you check that part alone?

@daipom
Copy link
Contributor

daipom commented Aug 7, 2024

@daipom not sure why 2 workflows alone failed, could you check that part alone?

No problem! They have nothing to do with this PR!
Sorry, some tests are unstable and sometimes fail.

@ashie
Copy link
Member

ashie commented Aug 7, 2024

Could you add unit test for this?

@Athishpranav2003
Copy link
Contributor Author

@ashie added UTs for the same

@daipom
Copy link
Contributor

daipom commented Aug 7, 2024

@Athishpranav2003

@daipom ignore the prev comments I have corrected it and pushed Also tested locally in my side. Please check it and let me know if there is any issue

Thanks!
Could you please clarify the specification change?
The focus is on what to do with the specifications.

The current specification is:

  • String
    • non-quote: Simple one-line string without spaces.
    • single-quote: Allow spaces or multiple lines.
    • double-quote: Allow escaped characters or embedded codes.
  • JSON
    • The specification seems unclear to me, but looks like it is based on JSON.parse() and does not support escaped characters or embedded codes.

As far as seeing changes in the code, embedded codes will always be supported, right?
If so, I have 2 concerns:

  • It will be impossible to write #{...} as a raw string in JSON.
  • Not backward compatible.

We need to consider whether the new specifications are acceptable while taking into account these concerns.

@Athishpranav2003
Copy link
Contributor Author

Strings stay the same - only double quoted strings support string interpolation

For Hashes and Arrays - We do string interpolation for anything that has #{...} in it. What i can do is maybe restrict this to double quoted strings inside hashes but not sure if its needed. Please let me know on this alone

Tho we have one benefit that the docs present for now we didn't restrict the interpolation to strings so it wont go against the specifications from the past

@daipom
Copy link
Contributor

daipom commented Aug 7, 2024

What i can do is maybe restrict this to double quoted strings inside hashes but not sure if its needed. Please let me know on this alone

This is the difficult part.
JSON uses double quotes for strings.
So, we can't switch the enabled and disabled of embedded codes with the type of quates...

@daipom
Copy link
Contributor

daipom commented Aug 7, 2024

Strings stay the same - only double quoted strings support string interpolation

For Hashes and Arrays - We do string interpolation for anything that has #{...} in it.

OK. I think this direction is possible for us!
There are concerns(#4580 (comment)):

* It will be impossible to write `#{...}` as a raw string in JSON.
* Not backward compatible.

but I think these points are less important for this feature.
To write #{...} as is, we can use single-quote to the entire string, such as param '{"k","#{foo}"}'.
So, there is a workaround.

We need to hear from other maintainers on this point.

@Athishpranav2003
Copy link
Contributor Author

Fine
We can wait for other's to put their opinions

@daipom
Copy link
Contributor

daipom commented Aug 7, 2024

The focus is on what extent we care about the possibility that existing users are using settings like:

<match test>
  @type http
  endpoint foo
  headers {"foo": "foo#{bar"}
</match>

When #{ exists, this change can break the existing setting.

@ashie
Copy link
Member

ashie commented Aug 8, 2024

* It will be impossible to write `#{...}` as a raw string in JSON.
* Not backward compatible.

but I think these points are less important for this feature. To write #{...} as is, we can use single-quote to the entire string, such as param '{"k","#{foo}"}'. So, there is a workaround.

I'm not sure but it seems that such workaround isn't available for Array & Hash types.
For example following change breaks the test, since it's tried to parse as String type which isn't matched with declaration in config_param.

diff --git a/test/config/test_config_parser.rb b/test/config/test_config_parser.rb
index fe29028e..b0c18fc9 100644
--- a/test/config/test_config_parser.rb
+++ b/test/config/test_config_parser.rb
@@ -483,8 +483,8 @@ module Fluent::Config
           param_size 4k
           param_bool true
           param_time 10m
-          param_hash { "key1": "value1", "key2": 2 }
-          param_array ["value1", "value2", 100]
+          param_hash '{ "key1": "value1", "key2": 2 }'
+          param_array '["value1", "value2", 100]'
           param_regexp /pattern/
         ])
         target = AllTypes.new.configure(conf)

@ashie
Copy link
Member

ashie commented Aug 8, 2024

I'm not sure but it seems that such workaround isn't available for Array & Hash types. For example following change breaks the test, since it's tried to parse as String type which isn't matched with declaration in config_param.

Sorry, please ignore my comment.
Even it's parsed as String, it will be parsed again as Array or Hash at config/types.rb.

@ashie
Copy link
Member

ashie commented Aug 8, 2024

We need to hear from other maintainers on this point.

In general, we intend to keep compatibility so that users can continue to use same major version without changing settings. When we have to change behaviour in any reason, it would be better to show announcements or warnings before making the change, and wait a few minor versions to make the change in actual.

On the other hand, using #{...} in fluentd's setting is expected to evaluate Ruby code in most cases and I guess using such syntax in other purpose seems very rare. If we investigate major plugins and it seems unlikely that such syntax will be used, we can made this change in next minor version (without adding any warning) I guess.

@ashie
Copy link
Member

ashie commented Aug 8, 2024

I'm not sure but it seems that such workaround isn't available for Array & Hash types. For example following change breaks the test, since it's tried to parse as String type which isn't matched with declaration in config_param.

Sorry, please ignore my comment. Even it's parsed as String, it will be parsed again as Array or Hash at config/types.rb.

It would be better adding a test for this.

@ashie
Copy link
Member

ashie commented Aug 8, 2024

On the other hand, using #{...} in fluentd's setting is expected to evaluate Ruby code in most cases and I guess using such syntax in other purpose seems very rare. If we investigate major plugins and it seems unlikely that such syntax will be used, we can made this change in next minor version (without adding any warning) I guess.

BTW my current impression is very positive for including this change in the next minor release (with addressing some corner cases as @daipom mentioned).
It seems it's much natural than before for users.

@Athishpranav2003
Copy link
Contributor Author

@ashie i tried your tests
Its passing
I guess this seems fine to me now

@Athishpranav2003 Athishpranav2003 force-pushed the string-interpolation-fix branch from ea69709 to af2b11d Compare August 9, 2024 03:55
@Athishpranav2003
Copy link
Contributor Author

But isn't the code using skip to identify the starting { and [ ?
Since it's using the skip the " at the start will anyways match?

I'm talking about this behavior, is this as expected?

$ irb -r strscan
irb(main):001> ss = StringScanner.new('{"foo":"bar"}')
=> #<StringScanner 0/13 @ "{\"foo...">
irb(main):002> ss.rest
=> "{\"foo\":\"bar\"}"
irb(main):003> ss.skip(/\{/)
=> 1
irb(main):004> ss = StringScanner.new('\'{"foo":"bar"}\'')
=> #<StringScanner 0/15 @ "'{\"fo...">
irb(main):005> ss.rest
=> "'{\"foo\":\"bar\"}'"
irb(main):006> ss.skip(/\{/)
=> nil
irb(main):007> ss.skip(/\'/)
=> 1

Not sure
I can check this but the problem seems to be solved now tho :0

@daipom
Copy link
Contributor

daipom commented Aug 9, 2024

Thanks! I'm checking the actual behavior.

@daipom
Copy link
Contributor

daipom commented Aug 9, 2024

Array behavior

Applied to only explicit arrays, except for single quoted arrays.

Config:

<source>
  @type sample
  tag test1
  sample {"k1":"v1","k2":"v2","k3":"v3"}
</source>

<source>
  @type sample
  tag test2
  sample {"k1":"v1","k2":"v2","k3":"v3"}
</source>

<source>
  @type sample
  tag test3
  sample {"k1":"v1","k2":"v2","k3":"v3"}
</source>

# Not applied to shorthand array
<filter test1>
  @type record_transformer
  remove_keys k1, "#{ENV['FOO']}"
</filter>

# Applied to array
<filter test2>
  @type record_transformer
  remove_keys ["k1", "#{ENV['FOO']}"]
</filter>

# Not applied to single quoted array
<filter test3>
  @type record_transformer
  remove_keys '["k1", "#{ENV[\'FOO\']}"]'
</filter>

<match test*>
  @type stdout
</match>

Parsed result (FOO=k2):

<ROOT>
  <source>
    @type sample
    tag "test1"
    sample {"k1":"v1","k2":"v2","k3":"v3"}
  </source>
  <source>
    @type sample
    tag "test2"
    sample {"k1":"v1","k2":"v2","k3":"v3"}
  </source>
  <source>
    @type sample
    tag "test3"
    sample {"k1":"v1","k2":"v2","k3":"v3"}
  </source>
  <filter test1>
    @type record_transformer
    remove_keys k1, "#{ENV['FOO']}"
  </filter>
  <filter test2>
    @type record_transformer
    remove_keys ["k1","k2"]
  </filter>
  <filter test3>
    @type record_transformer
    remove_keys ["k1", "#{ENV['FOO']}"]
  </filter>
  <match test*>
    @type stdout
  </match>
</ROOT>

Output:

$ FOO=k2 bundle exec fluentd -c test.conf
... test1: {"k2":"v2","k3":"v3"}
... test2: {"k3":"v3"}
... test3: {"k2":"v2","k3":"v3"}

@daipom
Copy link
Contributor

daipom commented Aug 9, 2024

Hash behavior

Applied to only explicit hashes, except for single quoted hashes.

It should be noted that it can also be applied to non-string values of JSON.
It may feel strange because non-string values are not surrounded by double quotes.
The same thing is true for Array.
However, this behavior looks acceptable to me because non-string values of JSON would rarely be used.

Config:

# Not Applied to shorthand hash
<match test>
  @type http
  endpoint foo
  headers key:#{ENV['FOO']}
</match>

# Applied to String value of hash
<match test>
  @type http
  endpoint foo
  headers {"key":"#{ENV['FOO']}"}
</match>

# Applied to non-String value of hash
<match test>
  @type http
  endpoint foo
  headers {"key":#{ENV['FOO']}}
</match>

# Not Applied to single quoted hash
<match test>
  @type http
  endpoint foo
  headers '{"key":"#{ENV[\'FOO\']}"}'
</match>

Parsed result (FOO=0):

<ROOT>
  <match test>
    @type http
    endpoint "foo"
    headers key:#{ENV['FOO']}
  </match>
  <match test>
    @type http
    endpoint "foo"
    headers {"key":"0"}
  </match>
  <match test>
    @type http
    endpoint "foo"
    headers {"key":0}
  </match>
  <match test>
    @type http
    endpoint "foo"
    headers {"key":"#{ENV['FOO']}"}
  </match>
</ROOT>

Note

  • in_sample has the original logic to parse the sample param, so it is unsuitable for testing this behavior.
  • worker_id is not set when checking config on launching, so it can cause a parse error if used for a non-string value of JSON.
    • We will need some fixes in the future for this problem.
<match test>
  @type http
  endpoint foo
  headers {"key":#{worker_id}}
</match>
.../home/daipom/work/fluentd/fluentd/lib/fluent/config/basic_parser.rb:92:in `parse_error!': got incomplete JSON hash configuration at 4.conf line 7,1 (Fluent::ConfigParseError)

@daipom
Copy link
Contributor

daipom commented Aug 9, 2024

#4580 (comment)

It should be noted that it can also be applied to non-string values of JSON.
It may feel strange because non-string values are not surrounded by double quotes.
The same thing is true for Array.
However, this behavior looks acceptable to me because non-string values of JSON would rarely be used.

Considering this point, it is inaccurate to describe this PR's specification as string interpolation fix.

#4580 (comment)

For Hashes and Arrays - We do string interpolation for anything that has #{...} in it.

We should describe this specification as

  • Support Embedded Ruby Code for Hashes and Arrays.
    • Now, we can evaluate the Ruby code with #{} in " quoted string or in explicit Hash/Array.
    • This can be disabled by surrounding the entire value with single quotes.

@Athishpranav2003
Copy link
Contributor Author

@daipom i will change the subject of PR as you mentioned.
For the worker_id should we also add that check or should we give that responsibility to users?

@Athishpranav2003 Athishpranav2003 changed the title String interpolation fix Extend Embedded Ruby Code support for Hashes and Arrays Aug 9, 2024
@daipom
Copy link
Contributor

daipom commented Aug 9, 2024

i will change the subject of PR as you mentioned.

Thanks!!

For the worker_id should we also add that check or should we give that responsibility to users?

I think this PR does not need to care about the worker_id problem!
It does not affect the existing users and it should be a rare case.

@daipom
Copy link
Contributor

daipom commented Aug 9, 2024

@Athishpranav2003 Thanks! This looks good to me!
Let me discuss the compatibility concerns a bit more with the other maintainers.

@Athishpranav2003
Copy link
Contributor Author

@Athishpranav2003 Thanks! This looks good to me! Let me discuss the compatibility concerns a bit more with the other maintainers.

Peace :).
Let me know if anything needs to be changed

lib/fluent/config/literal_parser.rb Outdated Show resolved Hide resolved
Signed-off-by: Athish Pranav D <[email protected]>
@Athishpranav2003 Athishpranav2003 force-pushed the string-interpolation-fix branch from be5708b to ee84bef Compare August 13, 2024 06:20
@Athishpranav2003 Athishpranav2003 requested a review from ashie August 13, 2024 06:21
@daipom daipom added this to the v1.17.1 milestone Aug 15, 2024
@Athishpranav2003
Copy link
Contributor Author

@ashie i have added the changes you suggested.

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@kenhys kenhys mentioned this pull request Aug 15, 2024
@ashie ashie modified the milestones: v1.17.1, v1.18.0 Aug 16, 2024
Copy link
Member

@ashie ashie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Waiting to open merge window for v1.18.0.

@daipom
Copy link
Contributor

daipom commented Sep 24, 2024

Waiting to open merge window for v1.18.0.

Now, it is opened.
I'm summarizing spec change...

@daipom daipom changed the title Extend Embedded Ruby Code support for Hashes and Arrays Config File Syntax: Extend Embedded Ruby Code support for Hashes and Arrays Sep 24, 2024
@daipom
Copy link
Contributor

daipom commented Sep 24, 2024

I have updated the description.

@daipom daipom merged commit 8841d5a into fluent:master Sep 24, 2024
16 checks passed
@daipom
Copy link
Contributor

daipom commented Sep 24, 2024

TODO document.

@cosmo0920
Copy link
Contributor

cosmo0920 commented Sep 24, 2024

Yay, it's a really good enhancement! 👍 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable string interpolation for hash-type parameters
4 participants