Skip to content

Conversation

@nealrichardson
Copy link
Member

@bkietz and I identified the performance regression I observed today, and by more-or-less reverting 55defbf#diff-090c5cff4eadd62a121e85babd186c9838055d2757670971204817eb2d96211aR297-R313 (Ben suggested renaming the function and calling GetView instead of GetString), the performance regression went away.

I haven't investigated why the cpp11::r_string way is so significantly slower but it's worth exploring. We should also make sure we aren't inefficiently calling that elsewhere.

cc @kszucs

@github-actions
Copy link

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpp11::r_string(const std::string&) does not take advantage of knowing the length of the string (so the terminating null must be searched for), adds the resulting CHARSXP to the preserved list, and guards against stack unwinding originating in Rf_mkCharCE.

Rf_mkCharLenCE should probably be used in r_string constructors when appropriate.

Adding each charsxp to the preserved list is unnecessary since we're immediately passing ownership to the constructed vector, but we probably should guard against stack unwinding. We can do so outside the loop with much less overhead

@bkietz
Copy link
Member

bkietz commented Oct 27, 2020

The only other usage of cpp11::r_string which I see that might be problematic is in array_from_vector where it's used to translate each string element to utf8. It appears that code path is only taken for string (and binary) arrays which are nested inside other arrays as a dictionary, list item column, ... Repairing this can probably wait for a follow up in which we apply the framework introduced in #8088 to clean up conversion for R. @kszucs

@nealrichardson
Copy link
Member Author

cpp11::r_string(const std::string&) does not take advantage of knowing the length of the string (so the terminating null must be searched for),

Side note: it's not always safe to assume nuls are terminating, and R and Rcpp don't assume that. cf. #8365

@nealrichardson
Copy link
Member Author

Confirmed that Ben's suggestions don't hurt the performance that was recovered.

@nealrichardson
Copy link
Member Author

Side note: it's not always safe to assume nuls are terminating, and R and Rcpp don't assume that. cf. #8365

Indeed, with this change, the test on that PR now errors again rather than silently truncating strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants