-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
escape special chars in csv and json output. #802
Conversation
- escape \b,\f,\n,\r,\t,\," from strings before dumping them to json or csv. - also faithfully reproduce the sign of nan in json. this fixes github issue google#745.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending build bots.
src/string_util.h
Outdated
@@ -37,8 +37,9 @@ inline std::string StrCat(Args&&... args) { | |||
return ss.str(); | |||
} | |||
|
|||
void ReplaceAll(std::string* str, const std::string& from, | |||
const std::string& to); | |||
struct StrEscape : std::string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm.
/me doesn't like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh i didn't spot that. yeah, that should likely just be a method that takes a string and returns a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/what /does /you /prefer /?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about this...
src/string_util.cc
Outdated
start += to.length(); | ||
} | ||
std::string StrEscape(const std::string & s) { | ||
std::string tmp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tmp.reserve(s.size());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to take over the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, it's your PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my use case isn't going to notice the 2 ns (maybe?) speed improvement, if you need it feel free to add it though.
src/csv_reporter.cc
Outdated
std::string name = run.benchmark_name(); | ||
ReplaceAll(&name, "\"", "\"\""); | ||
Out << '"' << name << "\","; | ||
Out << '"' << StrEscape(run.benchmark_name()) << "\","; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please link where it's documented what should be escaped in CSV?
I'm probably looking in wrong places, https://tools.ietf.org/html/rfc4180 only says to replace "
with ""
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there even a standard for csv? at least \r and \n need to be escaped, \b and \f probably should be, \t is a matter of taste- the tests dont check for it in any case. and anyway, isn't csv support being deprecated #500 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least \r and \n need to be escaped
I'm not seeing that in a quick test with loading
"test","my
test"
And replacing \n
with "\n":
"test","my\ntest"
breaks it:
Thus yes, i'm curious as to motivation/documentation.
and anyway, isn't csv support being deprecated #500 ?
Yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, \t and \ should proabaly not be escaped for csv. the other ones I think the "right thing" is to escape them. \b because if you're lookign at csv, you typically want to see it on a terminal, and any \b would end up hidden, and they're more often than not in there by mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that JSON changes are good.
The issue does not mention anything about CSV; are you affected by CSV side of things?
How about a middle ground solution then?
- Click apply on that
reserve()
suggestion. - Drop CSV changes, thus avoiding all the questions here as to what should and should not be escaped.
src/string_util.cc
Outdated
start += to.length(); | ||
} | ||
std::string StrEscape(const std::string & s) { | ||
std::string tmp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a review comment, it is good to address them, so that there is forward progress on proposed changes.
Not sure how these github suggestions work:
std::string tmp; | |
std::string tmp; | |
tmp.reserve(s.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you do realize that std::string is already designed for this kind of concatenation? this is trading 0-1-2 reallocs (depending on the length of the key) with a guaranteed two calls (strlen,realloc), not sure if there's a real win here, unless it's an issue of coding style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know the escaped string will be at least not shorter than the input string.
clang/gcc will not magically insert that pre-allocation.
Therefore, if we don't do it explicitly, we are relying on the sane behavior
of the std::string
implementation in the C++ std library that is used.
- We will likely always have some allocs if the string after escaping does not fit within small-size-optimization.
- Without pre-allocation, if the source string does not fit within small-size-optimization,
and there won't be extra escape symbols, we will have allocations. - WITH pre-allocation, if the source string does not fit within small-size-optimization,
and there won't be extra escape symbols, we will NOT have allocations. WIN.
It is well-known that reserve()
is good, and it is quite obvious it won't hurt here.
I do not understand your point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG modulo suggestions.
@dominichamon i think they can even be applied via github interface
Co-Authored-By: tesch1 <[email protected]>
Co-Authored-By: tesch1 <[email protected]>
So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the ℹ️ Googlers: Go here for more info. |
So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the ℹ️ Googlers: Go here for more info. |
Blargh, of course github tries to be smart, and use the author of suggestions for commits, |
:D googlebot removed the cla check when I applied the suggested changes. nice. |
I can still merge. I will once builds go green. |
Looks green to me |
Thanks! |
* escape special chars in csv and json output. - escape \b,\f,\n,\r,\t,\," from strings before dumping them to json or csv. - also faithfully reproduce the sign of nan in json. this fixes github issue google#745. * functionalize. * split string escape functions between csv and json * Update src/csv_reporter.cc Co-Authored-By: tesch1 <[email protected]> * Update src/json_reporter.cc Co-Authored-By: tesch1 <[email protected]>
them to json or csv.
this fixes github issue Strings in JSON output are not escaped. #745.