Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple geom and attr spec logic away from R, towards generic JSON #6

Open
4 tasks
kzollove opened this issue Nov 17, 2023 · 4 comments
Open
4 tasks
Assignees
Labels
enhancement New feature or request

Comments

@kzollove
Copy link
Collaborator

  • Determine generic JSON specification definition
  • Create R functionality to translate
  • Create crosswalk between JSON specs and language specific (R, Py, Bash, SQL)
  • Update gaiaR functionality
@kzollove kzollove added the enhancement New feature or request label Nov 17, 2023
@rtmill
Copy link
Collaborator

rtmill commented Apr 5, 2024

TODO:

Elaboration of SQL approach

Robert Miller (Guest) can you update, use this issue as necessary. I would suggest high level approach, but use that ticket however you'd like to organize around this effort

Example common complexities to be addressed:

Data Source:

geom_local_epsg=sf::st_crs(staged, paramaters=TRUE)$epsg)"]}
geom_name=dplyr::select(sf::st_drop_geometry(staged), n = if('NAME' %in% colnames(staged)) 'NAME' else 'NAMELSAD')$n
geom_local_value=sf::st_as_binary(sf::st_as_sf(staged, coords=c('Latitude', 'Longitude'))$geometry

@rtmill
Copy link
Collaborator

rtmill commented Apr 5, 2024

and similar to above, example complexities for attributes

Variable source:
(same example with two pieces)

  1. ["dplyr::filter(staged,Defining Parameter=='Ozone')",
  2. "dplyr::mutate(staged,geom_join_column=paste0(stringr::str_pad(State Code,width=2,pad=0),``stringr::str_pad(County Code`,width=3,pad=0)),

another (handling hard coding in general:
... mutate(staged,geom_join_column=FIPS, attr_concept_id=2000000001, attr_start_date=as.Date('2018-01-01'),attr_end_date=as.Date('2018-12-31'),

@rtmill
Copy link
Collaborator

rtmill commented Apr 5, 2024

Third item to specify:

Lay out implications of staging source data in a database (postgis) rather than current approach of keeping in memory

  • general approach to staging tables; (?) add parameter to specify whether staged source data is persisted or wiped after
  • pros and cons of creating requirement that source data is already in a database table; dependency on this for translation processes
    • what parameters need to be routinely provided for ogr to ensure data is ingested using consistent data types. how complex of a task is this?
    • significance when pulling data from a larger source incrementally, e.g. from APIs, and how to account for that in our staging design

Fourth item:
adding clarity on the "phases" that were mentioned and how the specific functionality falls under each

  1. Ingestion (CLI, ogr, others)
  2. Translation (can we do this comprehensively in SQL?)
  3. Extraction/population of exposure occurrence (arguably out of scope for this conversation)

@kzollove
Copy link
Collaborator Author

kzollove commented Dec 6, 2024

@rtmill

What are your latest thoughts on this? SQL-only specs?

@kzollove kzollove transferred this issue from OHDSI/GIS Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 🏷TODO
Development

No branches or pull requests

2 participants