Skip to content

Commit

Permalink
docs cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
nsheff committed Feb 13, 2024
1 parent affef26 commit ce2971f
Show file tree
Hide file tree
Showing 5 changed files with 45 additions and 43 deletions.
31 changes: 12 additions & 19 deletions docs/pephub/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,46 +5,39 @@
<a href="https://github.com/pepkit/eido" alt="GitHub source code"><img src="https://img.shields.io/badge/source-github-354a75?logo=github"/></a>
</p>

PEPhub is a database, web interface, and API for sharing, retrieving, and validating sample metadata. PEPhub takes advantage of the Portable Encapsulated Projects (PEP) biological metadata standard to store, edit, and access your PEPs in one place. You can view the deployed public instance at <https://pephub.databio.org>.
PEPhub is a database, web interface, and API for sharing, retrieving, and validating sample metadata. PEPhub takes advantage of the Portable Encapsulated Projects (PEP) biological metadata standard to store, edit, and access your PEPs in one place. PEPhub consists of two components:

---

**Deployed public instance**: <a href="https://pephub.databio.org/" target="_blank">https://pephub.databio.org/</a>

**API**: <a href="https://pephub-api.databio.org/api/v1/docs" target="_blank">https://pephub-api.databio.org/api/v1/docs</a>

**Documentation**: <a href="https://pep.databio.org/pephub" target="_blank">https://pep.databio.org/pephub</a>

**Source Code**: <a href="https://github.com/pepkit/pephub" target="_blank">https://github.com/pepkit/pephub</a>
- **Deployed public user interface**: <a href="https://pephub.databio.org/" target="_blank">https://pephub.databio.org/</a>
- **API**: <a href="https://pephub-api.databio.org/api/v1/docs" target="_blank">https://pephub-api.databio.org/api/v1/docs</a>

----
## Overview

### What is PEP?
Portable Encapsulated Projects, commonly known as PEP, represents a collaborative initiative aimed at enhancing the reusability of sample metadata. It is simply **yaml** + **csv** files (or just csv) that contains samples and project metadata. The CSV file, known as the sample table, contains details about individual samples, while the YAML configuration file provides metadata related to projects, amendments, and other sample modifiers.
PEPs serve as a standardized format for running workflows using tools such as [Snakemake](https://snakemake.readthedocs.io/en/stable/), [Common Workflow Language](https://www.commonwl.org/), [Looper](http://pep.databio.org/looper), and other workflow systems.
For further details about PEP and its usage, please visit: [http://pep.databio.org/](http://pep.databio.org/)

Portable Encapsulated Projects (PEPs) are standard format for biological sample metadata. A PEP is simply a **yaml** + **csv** file (or just csv) -- the CSV file is a sample table, while the YAML file provides project-level metadata and sample modifiers. PEPs are a common input for running workflows using tools such as [Snakemake](https://snakemake.readthedocs.io/en/stable/), [Common Workflow Language](https://www.commonwl.org/), [Looper](http://pep.databio.org/looper), and other workflow systems. For more details, read the [PEP specification](http://pep.databio.org/spec/simple-example).

### Validation

PEPhub offers automated project validation through [EIDO](http://pep.databio.org/eido). Users can specify a schema to which the PEP should adhere. All schemas are available on the official website: [https://schema.databio.org/](https://schema.databio.org/). Schemas are particularly useful before running pipelines, as validation provides essential information about PEP compatibility with specific pipelines and highlights any errors in the PEP structure.
PEPhub valides sample metadata with [eido](/eido). Users can specify a schema to which the PEP should adhere. All schemas are available on the official website: [https://schema.databio.org/](https://schema.databio.org/). Schemas are particularly useful before running pipelines, as validation provides essential information about PEP compatibility with specific pipelines and highlights any errors in the PEP structure.

### Search
PEPhub has semantic search functionality. The vector database is populated with important information extracted from the PEP config file. Through the search capability, users can efficiently locate projects, POPs, and Namespaces related to the string they provide.
More information can be found in the [semantic-search](https://pep.databio.org/pephub/semantic-search/).
More information can be found in [PEPhub semantic search docs](semantic-search.md).


### Authorization
One of the key features of PEPhub is the empowerment of users to submit and edit their own PEPs. To facilitate this, we have implemented a robust user authorization system. Users are required to authenticate via GitHub to access PEPhub.
Once authorized, users gain access to a range of features, including the ability to upload, modify, and delete PEPs, and star projects. Authorized users can also designate projects as private, ensuring that only they have visibility, with restricted access for others.

A key feature of PEPhub is that users can submit and edit their own PEPs. To facilitate this, we have implemented a robust user authorization system. Users authenticate via GitHub, which provides access to a range of features, including the ability to upload, modify, and delete PEPs, and star projects. Authorized users can also designate projects as private, ensuring that only they have visibility, with restricted access for others.
Moreover, users have the capability to modify projects within organizational namespaces, which correspond to the GitHub organizations they belong to. This feature facilitates collaborative efforts within organizational groups.

### PEP of PEPs (POP)
The PEP of PEPs, often referred to as a Project of Projects or simply POP, is a specialized project that encompasses multiple projects as samples. Essentially, a POP can be thought of as a grouping of PEPs, allowing users to organize projects for various purposes. This approach offers several advantages: all PEPs related to a specific topic can be consolidated in one central location, streamlining organization and accessibility.

The PEP of PEPs, often referred to as a Project of Projects or simply POP, is a specific type of PEP in which each sample is itself a PEP. Essentially, a POP can be thought of as a grouping of PEPs, allowing users to organize projects for various purposes. This approach offers several advantages: all PEPs related to a specific topic can be consolidated in one central location, streamlining organization and accessibility.

### GEO data
PEPhub has a namespace called GEO. This namespace contains projects from [Gene Expression Omnibus](https://www.ncbi.nlm.nih.gov/geo/) that are downloaded and processed using [GEOfetch](https://geofetch.databio.org/en/latest/). GEOfetch produces a standardized PEP sample table with GSM (GEO sample) and a YAML config that has metadata from GSE (GEO project). Nearly 99% of projects are downloaded to PEPhub. Users can utilize the namespace search to find projects by accession ID and description.

PEPhub has a namespace called GEO. This namespace contains projects from [Gene Expression Omnibus](https://www.ncbi.nlm.nih.gov/geo/) that are downloaded and processed using [GEOfetch](/geofetch). GEOfetch produces a standardized PEP sample table with GSM (GEO sample) and a YAML config that has metadata from GSE (GEO project). Nearly 99% of projects are downloaded to PEPhub. Users can utilize the namespace search to find projects by accession ID and description.
GEO namespace link: [https://pephub.databio.org/geo](https://pephub.databio.org/geo).
Moreover, users can download all project as `tar file` from the GEO namespace using the link available on the geo namespace page.
PEPhub doesn't store actual files in the database. Because of this, if you want to download files, there are two options:
Expand Down
7 changes: 4 additions & 3 deletions docs/pephub/authentication.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# PEPhub Authentication
*Nathan LeRoy, January 13th, 2023*

## Introduction
*pephub* supports authentication. We use [GitHub OAuth](https://docs.github.com/en/developers/apps/building-oauth-apps/authorizing-oauth-apps) as an authorization provider and user/namespace management system. There are two kinds of namespaces in *pephub*: **User namespaces** and **organization namespaces**. User namespaces are just that: namespaces that contain PEPs submitted by a user who has authenticated from GitHub. For example, my GitHub username/namespace is **nleroy917**. So, my *pephub* namespace (once I authenticate) is also **nleroy917**. Organization namespaces contain PEPs submitted by users who belong to that organization on GitHub. For example, I (**nleroy917**) belong to the **databio** organization on GitHub. As such, once authenticated I can read and write PEPs to this namespace on *pephub*.

*PEPhub* supports authentication. We use [GitHub OAuth](https://docs.github.com/en/developers/apps/building-oauth-apps/authorizing-oauth-apps) as an authorization provider and user/namespace management system. There are two kinds of namespaces in *pephub*: **user namespaces** and **organization namespaces**. User namespaces are just that: namespaces that contain PEPs submitted by a user who has authenticated from GitHub. For example, my GitHub username/namespace is **nleroy917**. So, my *pephub* namespace (once I authenticate) is also **nleroy917**. Organization namespaces contain PEPs submitted by users who belong to that organization on GitHub. For example, I (**nleroy917**) belong to the **databio** organization on GitHub. As such, once authenticated I can read and write PEPs to this namespace on *pephub*.

*pephub* supports both **reading** and **writing** PEPs. Just like GitHub, all PEPs are by default available to view by all users. Users may choose to mark a PEP as **private** and any attempts to **read** or **write** to this PEP will require prior authorization. For example, if I submit a new PEP at `nleroy917/yeast-analysis:latest` and mark it as **private**. I must first authenticate, and then I will be able to **read** and **write** to this PEP.

Expand Down Expand Up @@ -193,7 +194,7 @@ This flow should be identical to the flow that GitHub uses to protect repositori

## Writing PEPs

### Submiting a new PEP
### Submitting a new PEP

There are two scenerios for PEP submission: 1) A user submits to their namespace, and 2) A user submits to an organization. Both cases must require authentication. A user may freely submit PEPs to their own namespace. However, only **members** of an organization may submit PEPs to that organization. See below chart:

Expand Down
40 changes: 25 additions & 15 deletions docs/pephub/pephubclient/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,41 +8,51 @@

# PEPHubClient

`PEPHubClient` is a tool to provide Python API and CLI for [PEPhub](https://pephub.databio.org).
`PEPHubClient` is a CLI and Python API for PEPhub. Key features are:

Key features of PEPHubClient include:
- **Authorization**: Users can log in to PEPhub using PEPHubClient, enabling the loading of private projects and the ability to upload projects to PEPhub.
- **Load and Download**: PEPHubClient facilitates the loading and downloading of PEPs directly from PEPhub.
- **Push**: Users can upload PEPs from their local environment to PEPhub.
- **Download**: Users can download public PEPs via CLI or Python API.
- **Authorization**: Users can log in to PEPhub using PEPHubClient via CLI, providing download access to private projects.
- **Upload**: Authenticated users can also upload PEPs to PEPhub.

PEPHubClient supports pephub authorization.
The authorization process is based on pephub device authorization protocol.
To upload projects or to download private projects, user must be authorized through pephub.
PEPHubClient uses PEPhub's device authorization protocol. To upload projects or to download private projects, user must be authorized through PEPhub.

---
### Installation

PEPHubClient is available on PyPI, and the source code can be accessed on GitHub: [https://github.com/pepkit/pephubclient](https://github.com/pepkit/pephubclient)
To install PEPHubClient from PyPI, use the following command:

To install PEPHubClient from PyPi, use the following command:
```bash
pip install pephubclient
```

To install `pephubclient` from the GitHub repository, use the following command:
To install `pephubclient` from the [GitHub repository](https://github.com/pepkit/pephubclient), use the following command:

```bash
pip install git+https://github.com/pepkit/pephubclient.git
```

---
### How to specify url for pephub instance
If you want to use your own pephub instance, you can specify it by setting `PEPHUB_BASE_URL` environment variable.
e.g.
### How to specify URL for PEPhub instance

If you want to use your own PEPhub instance, you can specify it by setting the `PEPHUB_BASE_URL` environment variable. e.g.

```bash
export PEPHUB_BASE_URL=http://localhost:8000/
```

To login, use the `login` argument; to logout, use `logout`.
## Authentication

To login, use the `login` command:

```
phc login
```

To logout, use `logout`:

```
phc logout
```


### Example
Expand Down
4 changes: 2 additions & 2 deletions docs/pephub/pephubclient/cli.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# PEPHubClient (phc)
# PEPHubClient CLI (phc)

PEPHubClient CLI is avaliable as `phc` command. It provides a set of commands to interact with PEPhub.
Installing PEPHubClient provides a CLI through the `phc` command. It provides a set of commands to interact with PEPhub.

```text
$ phc --help
Expand Down
6 changes: 2 additions & 4 deletions docs/pephub/pephubclient/python-api.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
### PEPHubClient as Python API

# Example of usage of the pephubclient modules:
# Using the PEPHubClient Python API

```python
from pephubclient import PEPHubClient
Expand All @@ -19,7 +17,7 @@ print(example_pep)
## 6 samples: 4-1_11102016, 3-1_11102016, 2-2_11102016, 2-1_11102016, 8-3_11152016, 8-1_11152016
## Sections: pep_version, sample_table, name, description

# To upload project use next command:
# To upload a project:
phc.upload(example_pep, namespace="databio", name="example", force=True)

```
Expand Down

0 comments on commit ce2971f

Please sign in to comment.