sigident.preproc (!!! currently under development !!!)

This is the repository of the R package sigident.preproc. It provides data preprocessing functionalities in order to preprare datasets for further usage with the sigident R package: https://github.com/kapsner/sigident

Overview

The preprocessing includes the following steps:

GEO
- downloading of the specified datasets from the GEO database
- optional: downloading the raw CEL files and subsequent GCRMA normalization
- batch effect detection and batch effect correction
- visualization of batch effects

Currently supported input file formats are:

GEO data
- Platform GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array

Installation

You can install sigident.preproc with the following commands in R:

install.packages("remotes")
remotes::install_github("kapsner/sigident.preproc")

Example: download datasets from GEO

First, one has to define some needed variables and create some directories.

library(sigident.preproc)

# define datadir
datadir <- "./geodata/data/"
dir.create(datadir)

# define plotdir
plotdir <- "./plots/"
dir.create(plotdir)

# define idtype
idtype <- "affy"

Then a list needs to be defined, that contains a representation of the studies metadata. In order to get the information required to fill this list, the respective datasets have probably to be downloaded first to manually extract the mapping information.

studiesinfo <- list(
  "GSE18842" = list(
    setid = 1,
    targetcolname = "source_name_ch1",
    targetlevelname = "Human Lung Tumor",
    controllevelname = "Human Lung Control"
    ),
  
  "GSE19804" = list(
    setid = 1,
    targetcolname = "source_name_ch1",
    targetlevelname = "frozen tissue of primary tumor",
    controllevelname = "frozen tissue of adjacent normal"
  ),
  
  "GSE19188" = list(
    setid = 1,
    targetcolname = "characteristics_ch1",
    controllevelname = "tissue type: healthy",
    targetlevelname = "tissue type: tumor",
    use_rawdata = TRUE
  )
)

After defining this metadata list, the function load_geo_data can be executed in order to load and preprocess the specified studies.

load_geo_data(studiesinfo = studiesinfo,
              datadir = datadir,
              plotdir = plotdir,
              idtype = idtype)

All downloaded datasets and resulting objects are assigned to the global environment and are suitable to be used in the subsequent analyses implemented in the R package sigident.

Please view the package's vignette for a more detailled description on how to prepare datasets in order to be suitable for usage with the sigident package.

Since the building the package vignette takes rather long (~ 20 min.), we provide the already built vignettes in this repository.

Notice

The sigident.preproc package is under active development and not on CRAN yet - this means, that from time to time, the API can break, due to extending and modifying its functionality. It can also happen, that previoulsy included functions and/or function arguments are no longer supported. However, a detailed package vignette will be provided alongside with every major change in order to describe the currently supported workflow.

More Infos

about CLEARLY: https://www.transcanfp7.eu/index.php/abstract/clearly.html
about MIRACUM: https://www.miracum.org/
about the Medical Informatics Initiative: https://www.medizininformatik-initiative.de/index.php/de

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.github/workflows		.github/workflows
R		R
ci		ci
data-raw		data-raw
inst/demo_files		inst/demo_files
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.lintr		.lintr
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md
sigident.preproc.Rproj		sigident.preproc.Rproj
tic.R		tic.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sigident.preproc (!!! currently under development !!!)

Overview

Installation

Example: download datasets from GEO

Notice

More Infos

About

Releases

Packages

Languages

License

kapsner/sigident.preproc

Folders and files

Latest commit

History

Repository files navigation

sigident.preproc (!!! currently under development !!!)

Overview

Installation

Example: download datasets from GEO

Notice

More Infos

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages