1
1
# bio-data-to-db: make Uniprot PostgreSQL database
2
2
3
+
3
4
[ ![ image] ( https://img.shields.io/pypi/v/bio-data-to-db.svg )] ( https://pypi.python.org/pypi/bio-data-to-db )
4
5
[ ![ PyPI - Downloads] ( https://img.shields.io/pypi/dm/bio-data-to-db )] ( https://pypi.python.org/pypi/bio-data-to-db )
5
6
[ ![ image] ( https://img.shields.io/pypi/l/bio-data-to-db.svg )] ( https://pypi.python.org/pypi/bio-data-to-db )
@@ -19,6 +20,8 @@ Written in Rust, thus equipped with extremely fast parsers. Packaged for python,
19
20
20
21
So far, there is only one function implemented: ** convert uniprot data to postgresql** . This package focuses more on parsing the data and inserting it into the database, rather than curating the data.
21
22
23
+ [ 📚 Documentation] ( https://deargen.github.io/bio-data-to-db/ )
24
+
22
25
## 🛠️ Installation
23
26
24
27
``` bash
@@ -29,6 +32,8 @@ pip install bio-data-to-db
29
32
30
33
You can use the command line interface or the python API.
31
34
35
+ ### Uniprot
36
+
32
37
``` bash
33
38
# It will create a db 'uniprot' and a table named 'public.uniprot_info' in the database.
34
39
# If you want another name, you can optionally pass it as the last argument.
@@ -61,6 +66,49 @@ create_accession_to_pk_id_table("postgresql://user:password@localhost:5432/unipr
61
66
keywords_tsv_to_postgresql(" ~/Downloads/keywords_all_2024_06_26.tsv" , " postgresql://user:password@localhost:5432/uniprot" )
62
67
```
63
68
69
+ ### BindingDB
70
+
71
+ ``` bash
72
+ # Decode HTML entities and strip the strings in the `assay` table (column: description and assay_name).
73
+ # Currently, only assay table is supported.
74
+ bio-data-to-db bindingdb fix-table assay ' mysql://username:password@localhost/bind'
75
+ ```
76
+
77
+ ``` python
78
+ from bio_data_to_db.bindingdb.fix_tables import fix_assay_table
79
+
80
+ fix_assay_table(" mysql://username:password@localhost/bind" )
81
+ ```
82
+
83
+ ### PostgreSQL Helpers, SMILES, Polars utils and more
84
+
85
+ ``` python
86
+ Some useful functions to work with PostgreSQL.
87
+
88
+ ```python
89
+ from bio_data_to_db.utils.postgresql import (
90
+ create_db_if_not_exists,
91
+ create_schema_if_not_exists,
92
+ set_column_as_primary_key,
93
+ make_columns_unique,
94
+ make_large_columns_unique,
95
+ split_column_str_to_list,
96
+ polars_write_database,
97
+ )
98
+
99
+ from bio_data_to_db.utils.smiles import (
100
+ canonical_smiles_wo_salt,
101
+ polars_canonical_smiles_wo_salt,
102
+ )
103
+
104
+ from bio_data_to_db.utils.polars import (
105
+ w_pbar,
106
+ )
107
+ ```
108
+
109
+ You can find the usage in the [ 📚 documentation] ( https://deargen.github.io/bio-data-to-db/ ) .
110
+
111
+
64
112
## 👨💻️ Maintenance Notes
65
113
66
114
### Install from source
@@ -72,10 +120,14 @@ bash scripts/install.sh
72
120
uv pip install -r deps/requirements_dev.in
73
121
```
74
122
75
- ### Compile requirements (generate lockfiles)
123
+ ### Generate lockfiles
76
124
77
125
Use GitHub Actions: ` apply-pip-compile.yml ` . Manually launch the workflow and it will make a commit with the updated lockfiles.
78
126
127
+ ### Publish a new version to PyPI
128
+
129
+ Use GitHub Actions: ` deploy.yml ` . Manually launch the workflow and it will compile on all architectures and publish the new version to PyPI.
130
+
79
131
### About sqlx
80
132
81
133
Sqlx offline mode should be configured so you can compile the code without a database present.
0 commit comments