enhancement(csv codec): added additional csv encoding options#18149
enhancement(csv codec): added additional csv encoding options#18149scMarkus wants to merge 8 commits intovectordotdev:masterfrom
Conversation
✅ Deploy Preview for vector-project canceled.
|
✅ Deploy Preview for vrl-playground ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
| /// In some variants of CSV, quotes are escaped using a special escape character | ||
| /// like \ (instead of escaping quotes by doubling them). | ||
| /// | ||
| /// To use this `double_uotes` needs to be disabled as well |
There was a problem hiding this comment.
| /// To use this `double_uotes` needs to be disabled as well | |
| /// To use this `double_quote` needs to be disabled as well |
| /// like \ (instead of escaping quotes by doubling them). | ||
| /// | ||
| /// To use this `double_uotes` needs to be disabled as well | ||
| pub escape: u8, |
There was a problem hiding this comment.
Also, worth documenting what happens when double_quote is enabled.
| #[derive(Debug, Clone)] | ||
| pub struct CsvSerializer { | ||
| delimiter: u8, | ||
| double_quote: bool, | ||
| escape: u8, | ||
| fields: Vec<ConfigTargetPath>, |
There was a problem hiding this comment.
This can be simplified:
| #[derive(Debug, Clone)] | |
| pub struct CsvSerializer { | |
| delimiter: u8, | |
| double_quote: bool, | |
| escape: u8, | |
| fields: Vec<ConfigTargetPath>, | |
| #[derive(Debug, Clone, Default)] | |
| pub struct CsvSerializer { | |
| config: CsvSerializerConfig, |
| pub fn new(conf: CsvSerializerConfig) -> Self { | ||
| Self { | ||
| delimiter: conf.csv.delimiter, | ||
| double_quote: conf.csv.double_quote, | ||
| escape: conf.csv.escape, | ||
| fields: conf.csv.fields, | ||
| } |
There was a problem hiding this comment.
If you accept the suggestion above, then this can be removed.
| pub fn new(conf: CsvSerializerConfig) -> Self { | |
| Self { | |
| delimiter: conf.csv.delimiter, | |
| double_quote: conf.csv.double_quote, | |
| escape: conf.csv.escape, | |
| fields: conf.csv.fields, | |
| } |
| } | ||
|
|
||
| #[test] | ||
| fn custom_delimiter() { |
There was a problem hiding this comment.
Thank you for providing these tests!
If custom_delimiter fails due to #17261 we can (1) make sure this test passes and (2) make a note on current behavior vs desired behavior.
There was a problem hiding this comment.
From my point of view it is the other way around. Writing custom_escape_char I could not get it to work. Digging into that I wrote custom_delimiter as well as found out about this issue which I think is a bug in the current version of vector?
EDIT: I miss read the comment. In fact I messed up in the initial description already. The delimiter test in is not failing but correct_quoting is (I will edit the description). custom_delimiter runs successfully.
| opts.fields = fields; | ||
| opts.delimiter = b' '; | ||
| opts.double_quote = true; | ||
| //opts.escape = b'\''; |
| .delimiter(self.delimiter) | ||
| .double_quote(self.double_quote) | ||
| .escape(self.escape) | ||
| .terminator(csv::Terminator::Any(b'\0')) // TODO: this needs proper 'nothig' value |
de307c8 to
109f1c1
Compare
|
Hi @scMarkus, whenever you want this reviewed again, please mark it as "ready for review". |
|
Thanks for the offer @pront. As thinks stand at the moment I would like to request your guidance in regards to how to properly proceed with the bug at hand (which I really want to be fixed since I intend to utilize csv quoting myself). To proof that omitting the line terminator in Would it be reasonable to maintain this patch in the vector repository for the time being? Or ignore the bug at the moment and simply implement the new configuration feature only? Or any better strategy you may come up with? |
Let's focus on completing this config feature. Also, document expected caveats. As for the |
5da7233 to
d2d4bea
Compare
|
@pront I tried to document the situation as much as possible in the code. If there is any special syntax for referencing related issues or pull request please let me know. Additional I would like to ask you to have another detailed look at the implementation. I can see some more test failing but I am not quite sure what those are related to. |
d2d4bea to
eb45586
Compare
|
|
|
Hi @scMarkus, thank you for efforts on this PR. I think this looking pretty good, there are some details left to address specifically about the Also, I pinged the maintainer of |
* Initial Signed-off-by: ktf <krunotf@gmail.com> * Fixes Signed-off-by: ktf <krunotf@gmail.com> * Fixes Signed-off-by: ktf <krunotf@gmail.com> * Tests Signed-off-by: ktf <krunotf@gmail.com> * Add docs Signed-off-by: ktf <krunotf@gmail.com> * Add semantic Signed-off-by: ktf <krunotf@gmail.com> * Move url Signed-off-by: ktf <krunotf@gmail.com> * Fix url Signed-off-by: ktf <krunotf@gmail.com> * Add request docs Signed-off-by: ktf <krunotf@gmail.com> * Add batch docs Signed-off-by: ktf <krunotf@gmail.com> * Bump Signed-off-by: ktf <krunotf@gmail.com> * Clippy Signed-off-by: ktf <krunotf@gmail.com> * Apply feedback Signed-off-by: ktf <krunotf@gmail.com> * Apply feedback Signed-off-by: ktf <krunotf@gmail.com> * Add use Signed-off-by: ktf <krunotf@gmail.com> * Bump Signed-off-by: ktf <krunotf@gmail.com>
found potential bug on writing lines with quoted fields
697e558 to
71dcd47
Compare
Closes #17261
This will be a Draft for now since I potentially found a bug in the existing implementation. This commit so far includes the discussed config changes which I happily accept critique for since I am quite new to rust. Furthermore it contains additional tests like
correct_quotingwhich might be surface a bug in the current implementation?To fix this behavior I asked here for en enhancement in the respective csv lib. Opinions on that are appreciated as well