Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discover dependency tree for Darwin packages #61

Open
ablack3 opened this issue Aug 13, 2024 · 7 comments
Open

Discover dependency tree for Darwin packages #61

ablack3 opened this issue Aug 13, 2024 · 7 comments
Labels
Needs discussion Issue needs discussion and alignment

Comments

@ablack3
Copy link

ablack3 commented Aug 13, 2024

Hi @mvankessel-EMC,

I'm interested to see the entire dependency graph for all Darwin packages. Basically I would like to take a look at a graph where each node is an exported function and each edge represents a dependency between functions.

The only functions I am interested in seeing are those exported by Darwin packages - https://darwin-eu-dev.github.io/Packages/docs/packageStatuses.html

I'd like to keep an eye on this graph as a way of understanding the relationships between packages we are developing and maintaining in Darwin. Is this something that would be in scope for PaRe? If not can you give me some ideas about how to do this based on your experience working on PaRe?

@mvankessel-EMC
Copy link
Collaborator

I.e. something like this:

image

baz() from C imports foo1() from A and bar1() from B.
bar1() from B imports foo2() from A.
bar2() from B imports foo2() from A

@ablack3
Copy link
Author

ablack3 commented Aug 20, 2024

yes exactly what I was thinking

@mvankessel-EMC
Copy link
Collaborator

I messed around a bit to get the dependency hierarchy. It used the GitHub directly, so it updates when the repos change.

getDescriptions <- function(org, name) {
  res <- httr::GET(
    url = sprintf("https://api.github.com/repos/%s/%s/contents/DESCRIPTION", org, name),
    httr::accept_json(),
    httr::add_headers(
      "Accept" = "application/vnd.github.raw+json",
      "Authorization" = sprintf("Bearer %s", Sys.getenv("GITHUB_PAT")),
      "X-GitHub-Api-Version" = "2022-11-28"
    )
  )
  
  content <- res |>
    httr::content()
  
  if (class(content) == "raw") {
    text <- content |>
      rawToChar() |>
      strsplit(split = "\n") |>
      unlist()
    
    d <- tryCatch({
      desc::desc(text = text)
    }, error = function(e) {
      return(NULL)
    })
    return(d)
  } else if (class(content) == "list") {
    if (content$status == "404") {
      return(NULL)
    } else {
      message(content$status)
      return(NULL)
    }
  } else {
    warning("???")
    return(NULL)
  }
}

res <- httr::GET(
  url = "https://api.github.com/orgs/darwin-eu-dev/repos?per_page=100",
  httr::accept_json(),
  httr::add_headers(
    "Accept" = "application/vnd.github+json",
    "Authorization" = sprintf("Bearer %s", Sys.getenv("GITHUB_PAT")),
    "X-GitHub-Api-Version" = "2022-11-28"
  )
)
repos <- httr::content(res)

descs <- lapply(repos, function(repo) {
  getDescriptions(org = "darwin-eu-dev", name = repo$name)
})

descs <- descs[!unlist(lapply(descs, is.null))]

darwinPkgs <- lapply(descs, function(d) {
  d$get_field("Package")
}) |>
  unlist()

df <- lapply(descs, function(d) {
  name <- d$get_field("Package")
  deps <- d$get_deps()
  deps <- deps[!deps$type %in% c("Depends", "Suggests"), ]["package"] |>
    unlist() |>
    as.character()
  deps <- deps[deps %in% darwinPkgs]
  if (length(deps) == 0) {
    deps <- ""
  }
  data.frame(
    pkg = name,
    dependencies = deps
  )
}) |>
  do.call(what = "rbind")


df <- df[df$dependencies != "", ]

g <- igraph::graph_from_data_frame(df)

par(mar = c(0,0,0,0))
plot(g)

Created on 2024-08-20 with reprex v2.1.1

@mvankessel-EMC
Copy link
Collaborator

I filtered out the analytical packages from the DARWIN suite, so it's a little clearer:

image

@ablack3
Copy link
Author

ablack3 commented Aug 22, 2024

Thank you!

@mvankessel-EMC
Copy link
Collaborator

mvankessel-EMC commented Aug 22, 2024

library(dplyr)

pkgs <- c(
  "TreatmentPatterns",
  "IncidencePrevalence",
  "DrugUtilisation",
  "CohortSurvival",
  "DrugExposureDiagnostics",
  "PatientProfiles",
  "CohortCharacteristics",
  "CDMConnector",
  "omopgenerics"
)

repos <- lapply(pkgs, function(pkg) {
  tempDir <- file.path(tempdir(), pkg)
  
  if (!dir.exists(tempDir)) {
    git2r::clone(
      url = sprintf("https://github.com/darwin-eu-dev/%s.git", pkg),
      local_path = tempDir,
      credentials = git2r::cred_token()
    )
  } else {
    message(sprintf("%s already exists, not cloning", tempDir))
  }
  PaRe::Repository$new(tempDir)
})
#> cloning into 'C:\Users\MVANKE~1\AppData\Local\Temp\RtmpaEQjDg/TreatmentPatterns'...
#> Receiving objects:   1% (74/7342),   63 kb
#> Receiving objects:  11% (808/7342),  456 kb
#> Receiving objects:  21% (1542/7342), 1024 kb
#> Receiving objects:  31% (2277/7342), 2977 kb
#> Receiving objects:  41% (3011/7342), 4930 kb
#> Receiving objects:  51% (3745/7342), 5211 kb
#> Receiving objects:  61% (4479/7342), 7452 kb
#> Receiving objects:  71% (5213/7342), 8460 kb
#> Receiving objects:  81% (5948/7342), 8629 kb
#> Receiving objects:  91% (6682/7342), 8741 kb
#> Receiving objects: 100% (7342/7342), 16255 kb, done.
#> ...

funcUse <- lapply(repos, function(repo) {
  repo$getFunctionUse() %>%
    dplyr::mutate(source_pkg = repo$getName())
}) %>%
  dplyr::bind_rows()
#> Started on file: CDMInterface.R
#> Started on file: CharacterizationPlots.R
#> Started on file: computePathways.R
#> ...

df <- funcUse %>%
  dplyr::filter(.data$pkg %in% pkgs) %>%
  dplyr::mutate(call = sprintf("%s::%s", .data$pkg, .data$fun)) %>%
  dplyr::select("source_pkg", "call")

g <- igraph::graph_from_data_frame(df)

par(mar = c(0,0,0,0))
plot(g)

for (pkg in pkgs) {
  print(pkg)
  d <- df %>%
    dplyr::filter(
      .data$source_pkg == pkg,
      !startsWith(.data$call, pkg)
    )
  g <- igraph::graph_from_data_frame(d)
  par(mar = c(0,0,0,0))
  plot(g)
}
#> [1] "TreatmentPatterns"

#> [1] "IncidencePrevalence"

#> [1] "DrugUtilisation"

#> [1] "CohortSurvival"

#> [1] "DrugExposureDiagnostics"

#> [1] "PatientProfiles"

#> [1] "CohortCharacteristics"

#> [1] "CDMConnector"

#> [1] "omopgenerics"

Created on 2024-08-22 with reprex v2.1.1

@mvankessel-EMC mvankessel-EMC added the Needs discussion Issue needs discussion and alignment label Nov 1, 2024
@PRijnbeek
Copy link
Contributor

@mvankessel-EMC I like to revisit this topic with you and see how we can create some insight in Package Dependencies including some that are now in OHDSI organisation we include.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs discussion Issue needs discussion and alignment
Projects
None yet
Development

No branches or pull requests

3 participants