Avoid //expr XPaths

The following currently linters use `//expr`:

- [x] any_duplicated_linter.R
- [x] any_is_na_linter.R
- [x] brace_linter.R
- [x] class_equals_linter.R
- [x] condition_message_linter.R
- [x] conjunct_test_linter.R
- [x] consecutive_stopifnot_linter.R
- [x] duplicate_argument_linter.R
- [x] equals_na_linter.R
- [x] expect_comparison_linter.R
- [x] expect_identical_linter.R
- [x] expect_length_linter.R
- [x] expect_named_linter.R
- [x] expect_not_linter.R
- [x] expect_null_linter.R
- [x] expect_s3_class_linter.R
- [x] expect_true_false_linter.R
- [x] expect_type_linter.R
- [x] fixed_regex_linter.R
- [x] function_left_parentheses_linter.R
- [x] ifelse_censor_linter.R
- [x] inner_combine_linter.R
- [x] literal_coercion_linter.R
- [x] missing_package_linter.R
- [x] nested_ifelse_linter.R
- [x] object_usage_linter.R
- [x] outer_negation_linter.R
- [x] package_hooks_linter.R
- [x] paste_linter.R
- [x] pipe_call_linter.R
- [x] redundant_ifelse_linter.R
- [x] regex_subset_linter.R
- [x] seq_linter.R
- [x] sprintf_linter.R
- [x] string_boundary_linter.R
- [x] strings_as_factors_linter.R
- [x] system_file_linter.R
- [x] unneeded_concatenation_linter.R
- [x] unused_import_linter.R
- [x] yoda_test_linter.R

---

_Original issue raised_

As seen in #1353, #1340, #1310, there are subtle performance implications to the way we write our XPaths.

One thing that became clear is that writing `//NODE1[NODE2]` is slower than `//NODE2[parent::NODE1]` if `NODE2` is far less frequent than `NODE1`.

And `<expr>` is by far the most common node; here's a guess at the general frequency by tabulating across r-devel and my local R packages/scripts:

```r
library(data.table)
library(knitr)

get_r_files = function(dir) list.files(dir, pattern = "\\.R$", recursive = TRUE, full.names = TRUE)
get_freq = function(f) {
  pc = tryCatch(parse(f), error = identity, warning = identity)
  if (inherits(pc, "condition")) return(NULL)
  setDT(getParseData(pc))[, .N, by = token]
}

r_files = c(get_r_files("~/github"), get_r_files("~/svn"))
token_freq = rbindlist(lapply(r_files, get_freq))
token_freq = token_freq[, .(N = sum(N)), by = token][order(-N)]
token_freq[, pct := round(100 * N/sum(N), 1)]
knitr::kable(token_freq[pct > 1])
```

|token                |       N|  pct|
|:--------------------|-------:|----:|
|expr                 | 5713829| 36.4|
|','                  | 1514487|  9.6|
|SYMBOL               | 1357662|  8.6|
|'('                  | 1119392|  7.1|
|')'                  | 1119392|  7.1|
|NUM_CONST            | 1086271|  6.9|
|SYMBOL_FUNCTION_CALL |  917895|  5.8|
|STR_CONST            |  375598|  2.4|
|LEFT_ASSIGN          |  334988|  2.1|
|COMMENT              |  307772|  2.0|
|EQ_SUB               |  263926|  1.7|
|SYMBOL_SUB           |  254903|  1.6|

i.e., `//expr` eliminates at most 2/3 of tokens, while other tokens typically eliminate >90% of the tree.

The trade-off here is for readability. XPaths with a lot of `parent::`/`preceding-sibling::`/`following-sibling::` axes tend to be less readable -- our current XPaths are fairly readable IMO. Moreover, most of our linters are built around expression-level lints, and having a _comparatively small_ tree is the norm in that case -- I guess the overhead of iterating over expressions is usually higher than the savings from fine-tuning XPaths, and that in the presence of cacheing, performance gains will be unnoticeable in all but edge cases.

So we should proceed gently on this issue. Some ideas:

 - Write some helpers that have high readability but do translation to more performant XPaths "under the hood"
 - Prioritize fixing it on file-level linters
 - Wait for #778 to help quantify which linters are performing worse and prioritize those

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid //expr XPaths #1358

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

token	N	pct
expr	5713829	36.4
','	1514487	9.6
SYMBOL	1357662	8.6
'('	1119392	7.1
')'	1119392	7.1
NUM_CONST	1086271	6.9
SYMBOL_FUNCTION_CALL	917895	5.8
STR_CONST	375598	2.4
LEFT_ASSIGN	334988	2.1
COMMENT	307772	2.0
EQ_SUB	263926	1.7
SYMBOL_SUB	254903	1.6

Avoid //expr XPaths #1358

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions