Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snowflake: runtime driver config checks #857

Merged
merged 11 commits into from
Dec 7, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 27 additions & 2 deletions R/driver-databricks.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ NULL
#' Workbench or the Databricks CLI on desktop.
#' All of these credentials are detected automatically if present using
#' [standard environment variables](https://docs.databricks.com/en/dev-tools/auth.html#environment-variables-and-fields-for-client-unified-authentication).
#' In addition, on macOS platforms, the `dbConnect` method will check
#' for irregularities with how the driver is configured,
#' and attempt to fix in-situ, unless the `odbc.no_config_override`
#' environment variable is set.
#'
#' @inheritParams DBI::dbConnect
#' @param httpPath,HTTPPath To query a cluster, use the HTTP Path value found
Expand All @@ -33,7 +37,7 @@ NULL
#' Specifying these options will disable automated credential discovery.
#' @param ... Further arguments passed on to [`dbConnect()`].
#'
#' @returns An `OdbcConnection` object with an active connection to a Databricks
#' @return An `OdbcConnection` object with an active connection to a Databricks
#' cluster or SQL warehouse.
#'
#' @examples
Expand All @@ -45,7 +49,6 @@ NULL
#' }
#' @export
databricks <- function() {
configure_spark()
new("DatabricksOdbcDriver")
}

Expand Down Expand Up @@ -84,6 +87,8 @@ setMethod("dbConnect", "DatabricksOdbcDriver",
pwd = pwd,
...
)
configure_simba( (function() { spark_simba_config(args$driver) }),
action = "modify")
inject(dbConnect(odbc(), !!!args))
}
)
Expand Down Expand Up @@ -315,3 +320,23 @@ workbench_databricks_token <- function(host, cfg_file) {
}
token
}

# p. 44 https://downloads.datastax.com/odbc/2.6.5.1005/Simba%20Spark%20ODBC%20Install%20and%20Configuration%20Guide.pdf
spark_simba_config <- function(driver) {
spark_env <- Sys.getenv("SIMBASPARKINI")
if (!identical(spark_env, "")) {
return(spark_env)
}
common_dirs <- c(
driver_dir(driver),
"/Library/simba/spark/lib",
"/etc",
getwd(),
Sys.getenv("HOME")
)
list.files(
common_dirs,
pattern = "simba\\.sparkodbc\\.ini$",
full.names = TRUE
)
}
35 changes: 33 additions & 2 deletions R/driver-snowflake.R
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,10 @@ setMethod("odbcDataType", "Snowflake",
#'
#' In particular, the custom `dbConnect()` method for the Snowflake ODBC driver
#' detects ambient OAuth credentials on platforms like Snowpark Container
#' Services or Posit Workbench.
#' Services or Posit Workbench. In addition, on macOS platforms, the
#' `dbConnect` method will check and warn if it detects irregularities with
#' how the driver is configured, unless the `odbc.no_config_override`
#' environment variable is set.
#'
#' @inheritParams DBI::dbConnect
#' @param account A Snowflake [account
Expand Down Expand Up @@ -174,6 +177,8 @@ setMethod(
pwd = pwd,
...
)
configure_simba( (function() { snowflake_simba_config(args$driver) }),
action = "warn")
inject(dbConnect(odbc(), !!!args))
}
)
Expand Down Expand Up @@ -246,13 +251,39 @@ snowflake_default_driver_paths <- function() {
"/usr/lib/snowflake/odbc/lib/libSnowflake.so"
)
} else if (Sys.info()["sysname"] == "Darwin") {
paths <- "/opt/snowflake/snowflakeodbc/lib/universal/libSnowflake.dylib"
paths <- c(
"/opt/snowflake/snowflakeodbc/lib/universal/libSnowflake.dylib",
"/opt/snowflake-osx-x64/bin/lib/libsnowflakeodbc_sb64-universal.dylib"
)
} else {
paths <- character()
}
paths[file.exists(paths)]
}

snowflake_simba_config <- function(driver) {
snowflake_env <- Sys.getenv("SIMBASNOWFLAKEINI")
if (!identical(snowflake_env, "")) {
return(snowflake_env)
}
# Posit configuration is likely at:
# /opt/snowflake-osx-x64/bin/lib/rstudio.snowflakeodbc.ini
# OEM configuration is likely at:
# /opt/snowflake/snowflakeodbc/lib/universal/simba.snowflake.ini
common_dirs <- c(
driver_dir(driver),
"/opt/snowflake-osx-x64/bin/lib/",
"/opt/snowflake/snowflakeodbc/lib/universal/",
getwd(),
Sys.getenv("HOME")
)
list.files(
simba_config_dirs(driver),
pattern = "snowflake(odbc)?\\.ini$",
full.names = TRUE
)
}

snowflake_server <- function(account) {
if (nchar(account) == 0) {
abort(
Expand Down
141 changes: 93 additions & 48 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -288,28 +288,47 @@ check_attributes <- function(attributes, call = caller_env()) {
}

# apple + spark drive config (#651) --------------------------------------------
configure_spark <- function(call = caller_env()) {
# Method will attempt to:
# 1. Locate the simba driver config using the user supplied callback. Callback
# method defaults to locate_config_spark.
# 2. Inspect the config for some settings that can impact how our package
# performs.
# 3. If action == "modify" then we attempt to modify the config in-situ.
# 4. Otherwise we throw a warning asking the user to revise.
configure_simba <- function(locate_config_callback = (function() character()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
configure_simba <- function(locate_config_callback = (function() character()),
configure_simba <- function(locate_config_callback = character(),

Poking at this for a bit, it's unclear to me why this is a function rather than a character string? In this function, we just call the inputted function which is inlined as returning a character everywhere it's currently used, and then use that character result as usual. Maybe this function just takes the config file as a string and we pass e.g. snowflake_simba_config(args$driver) rather than function() snowflake_simba_config(args$driver)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Simon:

Wrote it as a callback because in may cases ( not macos, or no_config_override is set ) it may not get executed at all. Can move those checks outside of the method, but it felt like they belong within ( and they would need to be duplicated for each call ).

Let me know if you think it's worth trying to chase this down further / refactor more.

Copy link
Collaborator

@simonpcouch simonpcouch Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! Since those callbacks take some time (and could even raise some unanticipated error), you opt to wait to evaluate the code until you're sure you need its result.

We're actually lucky here in that R will evaluate function arguments lazily. So, if you pass a function call as an argument to f(), do some checks, and then reference the result of the function call after the checks, the function call will never actually evaluate if the check returns beforehand. e.g.:

fn_a <- function() {
  cli::cli_abort("Stoppp!")
}

fn_b <- function(x) {
  if (TRUE) {
    cli::cli_abort("{.fun fn_a} hasn't evaluated yet!")
  }
  
  x
}

fn_b(x = fn_a())
#> Error in `fn_b()`:
#> ! `fn_a()` hasn't evaluated yet!

Created on 2024-11-12 with reprex v2.1.1

...the implication here being that we can pass snowflake_simba_config(args$driver) directly as the function argument and it won't actually run until those Not macOS or No no_config_override checks have evaluated.

action = "modify", call = caller_env()) {
if (!is_macos()) {
return(invisible())
}

if (is.null(getOption("odbc.no_config_override"))) {
return(invisible())
}
unixodbc_install <- locate_install_unixodbc()
if (length(unixodbc_install) == 0) {
error_install_unixodbc(call)
}

spark_config <- locate_config_spark()
if (length(spark_config) == 0) {
abort(
simba_config <- locate_config_callback()
if (length(simba_config) == 0) {
func <- cli::warn
if ( action == "modify" ){
fun <- cli::abort
}
func(
c(
"Unable to locate the needed spark ODBC driver.",
i = "Please install the needed driver from https://www.databricks.com/spark/odbc-drivers-download."
"Unable to locate the config for the ODBC driver.",
i = paste("Please make sure you've installed the appropriate driver",
"for your platform and back end."),
i = paste("For example, for Databricks, you can download the OEM",
"driver from https://www.databricks.com/spark/odbc-drivers-download."),
i = paste("Similarly, for Snowflake, you can find information on",
"installing the driver at https://docs.snowflake.com/en/developer-guide/odbc/odbc-download.")
),
call = call
)
}

configure_unixodbc_spark(unixodbc_install[1], spark_config[1], call)
configure_unixodbc_simba(unixodbc_install[1], simba_config[1], action, call)
}

locate_install_unixodbc <- function() {
Expand Down Expand Up @@ -369,51 +388,48 @@ error_install_unixodbc <- function(call) {
)
}

# p. 44 https://downloads.datastax.com/odbc/2.6.5.1005/Simba%20Spark%20ODBC%20Install%20and%20Configuration%20Guide.pdf
locate_config_spark <- function() {
spark_env <- Sys.getenv("SIMBASPARKINI")
if (!identical(spark_env, "")) {
return(spark_env)
}

common_dirs <- c(
"/Library/simba/spark/lib",
"/etc",
getwd(),
Sys.getenv("HOME")
)

list.files(
common_dirs,
pattern = "simba\\.sparkodbc\\.ini$",
full.names = TRUE
)
}
configure_unixodbc_simba <- function(unixodbc_install, simba_config, action, call) {

configure_unixodbc_spark <- function(unixodbc_install, spark_config, call) {
# As shipped, the simba spark ini has an incomplete final line
suppressWarnings(
spark_lines <- readLines(spark_config)
simba_lines <- readLines(simba_config)
)

spark_lines_new <- replace_or_append(
lines = spark_lines,
pattern = "^ODBCInstLib=",
res <- replace_or_append(
lines = simba_lines,
key_pattern = "^ODBCInstLib=",
accepted_value = unixodbc_install,
replacement = paste0("ODBCInstLib=", unixodbc_install)
)

spark_lines_new <- replace_or_append(
lines = spark_lines_new,
pattern = "^DriverManagerEncoding=",
if (action != "modify" && res$modified) {
cli::cli_warn(
paste0("Detected potentially unsafe driver settings. ",
"Please consider revising the `ODBCInstLib` setting in ", simba_config)
)
}
simba_lines_new <- res$new_lines
res <- replace_or_append(
lines = simba_lines_new,
key_pattern = "^DriverManagerEncoding=",
accepted_value = "UTF-16|utf-16",
replacement = "DriverManagerEncoding=UTF-16"
)
if (action != "modify" && res$modified) {
cli::cli_warn(
paste0("Detected potentially unsafe driver settings. ",
"Please consider revising the `DriverManagerEncoding` setting in ",
simba_config)
)
}
simba_lines_new <- res$new_lines

write_spark_lines(spark_lines, spark_lines_new, spark_config, call)
if (action == "modify") {
write_simba_lines(simba_lines, simba_lines_new, simba_config, call)
}

invisible()
}

write_spark_lines <- function(spark_lines, spark_lines_new, spark_config, call) {
write_simba_lines <- function(spark_lines, spark_lines_new, spark_config, call) {
if (identical(spark_lines, spark_lines_new)) {
return(invisible())
}
Expand All @@ -436,21 +452,50 @@ write_spark_lines <- function(spark_lines, spark_lines_new, spark_config, call)
writeLines(spark_lines_new, spark_config)
}

# Interpret the argument as an `ODBC` driver
# and attempt to infer the directory housing it.
# It will return an empty character vector if unable to.
driver_dir <- function(driver) {
# driver argument could be an outright path, or a name
# of a driver specified in odbcinst.ini Try to discern
driver_spec <- subset(odbcListDrivers(), name == driver)
if (nrow(driver_spec) ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (nrow(driver_spec) ) {
if (nrow(driver_spec)) {

driver_path <- subset(driver_spec, attribute == "Driver")$value
} else {
driver_path <- driver
}

ret <- dirname(driver_path)
if (ret == ".") {
ret <- character()
}
return(ret)
}

is_writeable <- function(path) {
tryCatch(file.access(path, mode = 2)[[1]] == 0, error = function(e) FALSE)
}

# given a vector of lines in an ini file, look for a given key pattern.
# the `replacement` is the whole intended line, giving the "key=value" pair.
# if the key is found, replace that line with `replacement`.
# if the key isn't found, append a new line with `replacement`.
replace_or_append <- function(lines, pattern, replacement) {
matching_lines_loc <- grepl(pattern, lines)
# Given a vector of lines in an ini file, look for a given key pattern.
# If found:
# - No action if the `accepted_value` argument is found on line.
# - Replace otherwise.
# If not found: append.
# The `replacement` is the whole intended line, giving the desired
# "key=value" pair.
# @return a list with two elements:
# - new_lines: Potentially modified lines
# - modified: Whether method modified lines, where modified means
# both changed or appended.
replace_or_append <- function(lines, key_pattern, accepted_value, replacement) {
matching_lines_loc <- grepl(key_pattern, lines)
matching_lines <- lines[matching_lines_loc]
found_ok = length(matching_lines) != 0 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
found_ok = length(matching_lines) != 0 &&
found_ok <- length(matching_lines) != 0 &&

any(grepl(accepted_value, lines[matching_lines_loc]))
if (length(matching_lines) == 0) {
lines <- c(lines, replacement)
} else {
} else if (!found_ok) {
lines[matching_lines_loc] <- replacement
}
lines
return(list("new_lines" = lines, "modified" = !found_ok))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return(list("new_lines" = lines, "modified" = !found_ok))
return(list(new_lines = lines, modified = !found_ok))

}
4 changes: 4 additions & 0 deletions man/databricks.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 4 additions & 1 deletion man/snowflake.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading