Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve $-completion of method-chaining syntax and more #1469

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

sorhawell
Copy link

@sorhawell sorhawell commented Jan 2, 2024

What problem did you solve?

with setting.json

    "r.lsp.debug": true,
    "r.session.useWebServer": true,

vscode-R webserver has a simple regex based isolation of code to eval when using the $/@ to trigger in-session completion. This regex will fail for many but the simplest cases.

Instead I suggest to rely on the R parser and the AST to decide what the left-hand-side of e.g. $ operator is.

This will allow multi-line completion, chained method completion, and potentially anything the R parser accepts.

I'm interested in improving vscode-R similar as this patch for Rstudio completions to support completion for method chaining r-polars (... and also R6 and arrow),

How can I check this pull request?

check out branch with vscode and emulate the extension
"r.session.useWebServer": true

example 1

#$-complete after arbitrary parenthesis
(
    iris |> # |> is supported if the R interpreter supports it
    (\(iris) {
        iris$is_alien = FALSE
        head(iris, 3)
    })()
)$
image

example 2

#$ complete chained R6 methods 
FruitBasket <- R6::R6Class(
  "FruitBasket",
  private = list(content = character()),
  public = list(
    initialize = \() {},
    mutate_content = \(fruit) {
        private$content = c(private$content, fruit)
        self
    },
    add_banana = \() self$clone()$mutate_content("banana"),
    add_apple = \() self$clone()$mutate_content("apple"),
    to_count_list = \() as.list(table(private$content))
  )
)

#$-complete any multiline chain. If it is valid R syntax is works
{
  print("this print statement is not evaluated, because not a child of `$` call in AST");
  FruitBasket$new()$
    add_apple()$
    add_banana(
    
    )$
    add_apple()$
    to_count_list()$
}
image

example 3

support r-polars completion ;) (disclaimer I have co-authored the polars bindings for R)

# load polars and disable completion () and <- in suggestions
library(polars) #polars as of https://github.com/pola-rs/r-polars/pull/597
cs = polars:::completion_symbols
cs$method = ""
cs$setter = ""


pl$
  DataFrame(
    a = 1:5,
    b = letters[1:5]
  )$
  lazy()$
  select(
    pl$all()$head(5)
  )
image image

example 4 completion for the arrow package (R6)

library(arrow)


arrow_table(iris[,1:2])$
  Slice(2,10)$
  print()$
  Take(5)$
  
image

@sorhawell sorhawell changed the title Improve complete Improve $-completion of method-chaining syntax and more Jan 2, 2024
@sorhawell
Copy link
Author

4b3044d commit adds finish_expression which is related to the rstudio approach of filling in missing parenthesis. I forgot to add that to PR sry.

@renkun-ken
Copy link
Member

renkun-ken commented Jan 6, 2024

Thanks for working on this! It is definitely a super useful improvement of the current completion.

I notice that the code in scope has to be evaluated (silently) to provide the completion, which makes the completion result accurate. I wonder if it is safe to do so in many use cases?

For example:

df <- data.frame(id = 1:10)
obj <- local({
  file.remove("/path/to/something/important")
  df$x <- 1
  df
}$)
image

If a user writes some critical code that changes some important state, it seems only desirable that the code should only be evaluated when manually sent to the terminal by user. However, maybe a typo $ could trigger a completion that evaluates the code in the scope, i.e. the code is silently evaluated in the background.

A less damaging example is that the user code to be evaluated might be too time-consuming before the user realizes it. If the code contains C/C++ impl without caring user interruption, then the session might be blocked.

Evaluting any user code other than simple name1$name2$name3 cases seems to suffer such problem. What do you think?

@sorhawell
Copy link
Author

sorhawell commented Jan 7, 2024

@renkun-ken

I principally agree with your concern. I will think in practice, it is likely not as problematic. I suggest to add a user option something like "evaluation_level" to control how eager or cautious completion should be.

What do you think of that?

In any case, parsing is a fast first step before evaluation and could likely replace regex at any desired "evaluation_level".

evaluation_level = "super_secure"

If opting for this level. The auto-completion must only extract variables and not run user-defined or package defined generic functions. A malicious developer could sneak in bad side effects disguised as variables via e.g. activeBindings or the generic methods .$, .[, .[[ or .dollarNames. That would also be an issue in the current vscode-R. I think it should be possible to block most of that.

# sneaky evaluation #1
makeActiveBinding("xyz", f = \() {print("hello there");42L}, env = .GlobalEnv)
xyz
#> [1] "hello there"
#> [1] 42

# sneaky evaluation #2
l = list()
class(l) = "TomatoCan"
"$.TomatoCan" = \(self, name) {print("hi again");self[[name]]}
l$foo = "bar"
l$foo
#> [1] "hi again"
#> [1] "bar"

Created on 2024-01-07 with reprex v2.0.2

evaluation_level = "only_generic_getters"

allow (chained) evaluation generic functions like .[ .[[ and .$, but no function calls ( in the AST which are not on the positive-list. This is pretty much what rstudio is running with today I think. BTW .[ in many ways behaves just as function call ( and can be used to circumvent rstudio completion limits. But by convention .[ is expected to only index so that is ok.

example of what could work:

image image

example of what would not work (unless c and list is added to the positive list)
image

evaluation_level = "only_single_line_scope"

There could be a choice which will only parse the same code line to make a meaningful evaluation.

evaluation_level = "full_evaluation"

Parsing like rstudio, but unlimited evaluation. I will argue if using most package APIs, then there are very few mutable methods out there. The polars and arrow packages is built around immutable structures. Most R objects SEXPs have imuttable behavior, where environment is the central exception. Most R package have APIs where the central objects are imuttable. On top of that polars has lazy evaluation (just as dplyrDB) where evaluation does not trigger computation before explicitly calling collect(). A good R package running low level code, should check for R user interrupts every 100-1000ms and abort if requested.

There are exceptions e.g. data.table set-functions, (file)-connections ...

I think some users would happily opt-in for "full_evaluation".

@renkun-ken
Copy link
Member

Thanks for the detailed explanation. I think the evaluation_level should be good enough to handle such problem.

@sorhawell
Copy link
Author

Update. I have been away for a while. I failed to make good safe list of functions which could be evaluated by IDE auto completion to avoid side effects. In the end, I think this feature is something a user has to willingly opt in to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants