Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

escape special chars in csv and json output. #802

Merged
merged 5 commits into from
Apr 19, 2019

Conversation

tesch1
Copy link
Contributor

@tesch1 tesch1 commented Apr 16, 2019

- escape \b,\f,\n,\r,\t,\," from strings before dumping
  them to json or csv.
- also faithfully reproduce the sign of nan in json.
this fixes github issue google#745.
Copy link
Member

@dmah42 dmah42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending build bots.

@coveralls
Copy link

coveralls commented Apr 16, 2019

Coverage Status

Coverage increased (+1.8%) to 91.714% when pulling f3ceed0 on tesch1:fix-strescape-745 into 415835e on google:master.

@@ -37,8 +37,9 @@ inline std::string StrCat(Args&&... args) {
return ss.str();
}

void ReplaceAll(std::string* str, const std::string& from,
const std::string& to);
struct StrEscape : std::string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm.
/me doesn't like

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh i didn't spot that. yeah, that should likely just be a method that takes a string and returns a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/what /does /you /prefer /?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about this...

start += to.length();
}
std::string StrEscape(const std::string & s) {
std::string tmp;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmp.reserve(s.size());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to take over the PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, it's your PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my use case isn't going to notice the 2 ns (maybe?) speed improvement, if you need it feel free to add it though.

std::string name = run.benchmark_name();
ReplaceAll(&name, "\"", "\"\"");
Out << '"' << name << "\",";
Out << '"' << StrEscape(run.benchmark_name()) << "\",";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please link where it's documented what should be escaped in CSV?
I'm probably looking in wrong places, https://tools.ietf.org/html/rfc4180 only says to replace " with ""
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there even a standard for csv? at least \r and \n need to be escaped, \b and \f probably should be, \t is a matter of taste- the tests dont check for it in any case. and anyway, isn't csv support being deprecated #500 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least \r and \n need to be escaped

I'm not seeing that in a quick test with loading

"test","my
test"

into libreoffice:
image

And replacing \n with "\n":

"test","my\ntest"

breaks it:
image
Thus yes, i'm curious as to motivation/documentation.

and anyway, isn't csv support being deprecated #500 ?

Yep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, \t and \ should proabaly not be escaped for csv. the other ones I think the "right thing" is to escape them. \b because if you're lookign at csv, you typically want to see it on a terminal, and any \b would end up hidden, and they're more often than not in there by mistake.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that JSON changes are good.
The issue does not mention anything about CSV; are you affected by CSV side of things?

How about a middle ground solution then?

  1. Click apply on that reserve() suggestion.
  2. Drop CSV changes, thus avoiding all the questions here as to what should and should not be escaped.

start += to.length();
}
std::string StrEscape(const std::string & s) {
std::string tmp;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a review comment, it is good to address them, so that there is forward progress on proposed changes.
Not sure how these github suggestions work:

Suggested change
std::string tmp;
std::string tmp;
tmp.reserve(s.size());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you do realize that std::string is already designed for this kind of concatenation? this is trading 0-1-2 reallocs (depending on the length of the key) with a guaranteed two calls (strlen,realloc), not sure if there's a real win here, unless it's an issue of coding style?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know the escaped string will be at least not shorter than the input string.
clang/gcc will not magically insert that pre-allocation.
Therefore, if we don't do it explicitly, we are relying on the sane behavior
of the std::string implementation in the C++ std library that is used.

  • We will likely always have some allocs if the string after escaping does not fit within small-size-optimization.
  • Without pre-allocation, if the source string does not fit within small-size-optimization,
    and there won't be extra escape symbols, we will have allocations.
  • WITH pre-allocation, if the source string does not fit within small-size-optimization,
    and there won't be extra escape symbols, we will NOT have allocations. WIN.

It is well-known that reserve() is good, and it is quite obvious it won't hurt here.
I do not understand your point.

Copy link
Collaborator

@LebedevRI LebedevRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG modulo suggestions.
@dominichamon i think they can even be applied via github interface

src/csv_reporter.cc Show resolved Hide resolved
src/json_reporter.cc Show resolved Hide resolved
@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

ℹ️ Googlers: Go here for more info.

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

ℹ️ Googlers: Go here for more info.

@LebedevRI
Copy link
Collaborator

Blargh, of course github tries to be smart, and use the author of suggestions for commits,
and multi-committer PR's are too smart for CLI bot :/ I should have guessed.

@tesch1
Copy link
Contributor Author

tesch1 commented Apr 18, 2019

:D googlebot removed the cla check when I applied the suggested changes. nice.

@dmah42
Copy link
Member

dmah42 commented Apr 18, 2019

I can still merge. I will once builds go green.

@LebedevRI
Copy link
Collaborator

I can still merge. I will once builds go green.

Looks green to me
(that was supposed to be a word play on "LGTM" that can be read both as "G"ood/"G"reen)

@dmah42 dmah42 merged commit 588be04 into google:master Apr 19, 2019
@dmah42
Copy link
Member

dmah42 commented Apr 19, 2019

Thanks!

JBakamovic pushed a commit to JBakamovic/benchmark that referenced this pull request Sep 11, 2020
* escape special chars in csv and json output.

- escape \b,\f,\n,\r,\t,\," from strings before dumping
  them to json or csv.
- also faithfully reproduce the sign of nan in json.
this fixes github issue google#745.

* functionalize.

* split string escape functions between csv and json

* Update src/csv_reporter.cc

Co-Authored-By: tesch1 <[email protected]>

* Update src/json_reporter.cc

Co-Authored-By: tesch1 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants