-
Notifications
You must be signed in to change notification settings - Fork 193
Description
The following currently linters use //expr
:
- any_duplicated_linter.R
- any_is_na_linter.R
- brace_linter.R
- class_equals_linter.R
- condition_message_linter.R
- conjunct_test_linter.R
- consecutive_stopifnot_linter.R
- duplicate_argument_linter.R
- equals_na_linter.R
- expect_comparison_linter.R
- expect_identical_linter.R
- expect_length_linter.R
- expect_named_linter.R
- expect_not_linter.R
- expect_null_linter.R
- expect_s3_class_linter.R
- expect_true_false_linter.R
- expect_type_linter.R
- fixed_regex_linter.R
- function_left_parentheses_linter.R
- ifelse_censor_linter.R
- inner_combine_linter.R
- literal_coercion_linter.R
- missing_package_linter.R
- nested_ifelse_linter.R
- object_usage_linter.R
- outer_negation_linter.R
- package_hooks_linter.R
- paste_linter.R
- pipe_call_linter.R
- redundant_ifelse_linter.R
- regex_subset_linter.R
- seq_linter.R
- sprintf_linter.R
- string_boundary_linter.R
- strings_as_factors_linter.R
- system_file_linter.R
- unneeded_concatenation_linter.R
- unused_import_linter.R
- yoda_test_linter.R
Original issue raised
As seen in #1353, #1340, #1310, there are subtle performance implications to the way we write our XPaths.
One thing that became clear is that writing //NODE1[NODE2]
is slower than //NODE2[parent::NODE1]
if NODE2
is far less frequent than NODE1
.
And <expr>
is by far the most common node; here's a guess at the general frequency by tabulating across r-devel and my local R packages/scripts:
library(data.table)
library(knitr)
get_r_files = function(dir) list.files(dir, pattern = "\\.R$", recursive = TRUE, full.names = TRUE)
get_freq = function(f) {
pc = tryCatch(parse(f), error = identity, warning = identity)
if (inherits(pc, "condition")) return(NULL)
setDT(getParseData(pc))[, .N, by = token]
}
r_files = c(get_r_files("~/github"), get_r_files("~/svn"))
token_freq = rbindlist(lapply(r_files, get_freq))
token_freq = token_freq[, .(N = sum(N)), by = token][order(-N)]
token_freq[, pct := round(100 * N/sum(N), 1)]
knitr::kable(token_freq[pct > 1])
token | N | pct |
---|---|---|
expr | 5713829 | 36.4 |
',' | 1514487 | 9.6 |
SYMBOL | 1357662 | 8.6 |
'(' | 1119392 | 7.1 |
')' | 1119392 | 7.1 |
NUM_CONST | 1086271 | 6.9 |
SYMBOL_FUNCTION_CALL | 917895 | 5.8 |
STR_CONST | 375598 | 2.4 |
LEFT_ASSIGN | 334988 | 2.1 |
COMMENT | 307772 | 2.0 |
EQ_SUB | 263926 | 1.7 |
SYMBOL_SUB | 254903 | 1.6 |
i.e., //expr
eliminates at most 2/3 of tokens, while other tokens typically eliminate >90% of the tree.
The trade-off here is for readability. XPaths with a lot of parent::
/preceding-sibling::
/following-sibling::
axes tend to be less readable -- our current XPaths are fairly readable IMO. Moreover, most of our linters are built around expression-level lints, and having a comparatively small tree is the norm in that case -- I guess the overhead of iterating over expressions is usually higher than the savings from fine-tuning XPaths, and that in the presence of cacheing, performance gains will be unnoticeable in all but edge cases.
So we should proceed gently on this issue. Some ideas:
- Write some helpers that have high readability but do translation to more performant XPaths "under the hood"
- Prioritize fixing it on file-level linters
- Wait for Add proper benchmarking functionality to .dev/compare_branches.R #778 to help quantify which linters are performing worse and prioritize those