Skip to content

Commit 6d9f4ad

Browse files
authored
feat: fix Binding DB assay html encoding, polars_canonical_smiles_wo_salt, ci and mkdocs (#2)
1 parent 87117c6 commit 6d9f4ad

40 files changed

+1209
-26
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
name: Deploy MkDocs on latest commit
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
- master
8+
9+
jobs:
10+
deploy-mkdocs:
11+
uses: deargen/workflows/.github/workflows/deploy-mkdocs.yml@master
12+
with:
13+
deploy-type: latest
14+
requirements-file: deps/lock/x86_64-manylinux_2_28/requirements_docs.txt

README.md

+53-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
# bio-data-to-db: make Uniprot PostgreSQL database
22

3+
34
[![image](https://img.shields.io/pypi/v/bio-data-to-db.svg)](https://pypi.python.org/pypi/bio-data-to-db)
45
[![PyPI - Downloads](https://img.shields.io/pypi/dm/bio-data-to-db)](https://pypi.python.org/pypi/bio-data-to-db)
56
[![image](https://img.shields.io/pypi/l/bio-data-to-db.svg)](https://pypi.python.org/pypi/bio-data-to-db)
@@ -19,6 +20,8 @@ Written in Rust, thus equipped with extremely fast parsers. Packaged for python,
1920

2021
So far, there is only one function implemented: **convert uniprot data to postgresql**. This package focuses more on parsing the data and inserting it into the database, rather than curating the data.
2122

23+
[📚 Documentation](https://deargen.github.io/bio-data-to-db/)
24+
2225
## 🛠️ Installation
2326

2427
```bash
@@ -29,6 +32,8 @@ pip install bio-data-to-db
2932

3033
You can use the command line interface or the python API.
3134

35+
### Uniprot
36+
3237
```bash
3338
# It will create a db 'uniprot' and a table named 'public.uniprot_info' in the database.
3439
# If you want another name, you can optionally pass it as the last argument.
@@ -61,6 +66,49 @@ create_accession_to_pk_id_table("postgresql://user:password@localhost:5432/unipr
6166
keywords_tsv_to_postgresql("~/Downloads/keywords_all_2024_06_26.tsv", "postgresql://user:password@localhost:5432/uniprot")
6267
```
6368

69+
### BindingDB
70+
71+
```bash
72+
# Decode HTML entities and strip the strings in the `assay` table (column: description and assay_name).
73+
# Currently, only assay table is supported.
74+
bio-data-to-db bindingdb fix-table assay 'mysql://username:password@localhost/bind'
75+
```
76+
77+
```python
78+
from bio_data_to_db.bindingdb.fix_tables import fix_assay_table
79+
80+
fix_assay_table("mysql://username:password@localhost/bind")
81+
```
82+
83+
### PostgreSQL Helpers, SMILES, Polars utils and more
84+
85+
```python
86+
Some useful functions to work with PostgreSQL.
87+
88+
```python
89+
from bio_data_to_db.utils.postgresql import (
90+
create_db_if_not_exists,
91+
create_schema_if_not_exists,
92+
set_column_as_primary_key,
93+
make_columns_unique,
94+
make_large_columns_unique,
95+
split_column_str_to_list,
96+
polars_write_database,
97+
)
98+
99+
from bio_data_to_db.utils.smiles import (
100+
canonical_smiles_wo_salt,
101+
polars_canonical_smiles_wo_salt,
102+
)
103+
104+
from bio_data_to_db.utils.polars import (
105+
w_pbar,
106+
)
107+
```
108+
109+
You can find the usage in the [📚 documentation](https://deargen.github.io/bio-data-to-db/).
110+
111+
64112
## 👨‍💻️ Maintenance Notes
65113

66114
### Install from source
@@ -72,10 +120,14 @@ bash scripts/install.sh
72120
uv pip install -r deps/requirements_dev.in
73121
```
74122

75-
### Compile requirements (generate lockfiles)
123+
### Generate lockfiles
76124

77125
Use GitHub Actions: `apply-pip-compile.yml`. Manually launch the workflow and it will make a commit with the updated lockfiles.
78126

127+
### Publish a new version to PyPI
128+
129+
Use GitHub Actions: `deploy.yml`. Manually launch the workflow and it will compile on all architectures and publish the new version to PyPI.
130+
79131
### About sqlx
80132

81133
Sqlx offline mode should be configured so you can compile the code without a database present.
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
816025c3ff73af3261b082ee7e0c71954aa6b20922e17344cfb2f29636733488 requirements.in
1+
2f65dd8deb2842edfead23a6aafb4f4f0b9e9e98982e39216069787d16327901 requirements.in
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
f0f530946f38443ec95d76ac402dc3e3045fe8f7c26220e46b575aa56649503d requirements_docs.in

deps/lock/aarch64-apple-darwin/requirements.txt

+12-1
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,23 @@
22
# uv pip compile requirements.in -o /home/runner/work/bio-data-to-db/bio-data-to-db/deps/lock/aarch64-apple-darwin/requirements.txt --python-platform aarch64-apple-darwin --python-version 3.10
33
click==8.1.7
44
# via typer
5+
connectorx==0.3.3
6+
# via -r requirements.in
57
markdown-it-py==3.0.0
68
# via rich
79
mdurl==0.1.2
810
# via markdown-it-py
9-
numpy==2.0.0
11+
mysqlclient==2.2.4
12+
# via -r requirements.in
13+
numpy==1.26.4
1014
# via
1115
# pandas
1216
# pyarrow
17+
# rdkit
1318
pandas==2.2.2
1419
# via -r requirements.in
20+
pillow==10.4.0
21+
# via rdkit
1522
polars==1.2.0
1623
# via -r requirements.in
1724
psycopg==3.2.1
@@ -26,6 +33,8 @@ python-dateutil==2.9.0.post0
2633
# via pandas
2734
pytz==2024.1
2835
# via pandas
36+
rdkit==2024.3.3
37+
# via -r requirements.in
2938
rich==13.7.1
3039
# via typer
3140
shellingham==1.5.4
@@ -34,6 +43,8 @@ six==1.16.0
3443
# via python-dateutil
3544
sqlalchemy==2.0.31
3645
# via -r requirements.in
46+
tqdm==4.66.4
47+
# via -r requirements.in
3748
typer==0.12.3
3849
# via -r requirements.in
3950
typing-extensions==4.12.2

deps/lock/aarch64-apple-darwin/requirements_dev.txt

+13-2
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ charset-normalizer==3.3.2
66
# via requests
77
click==8.1.7
88
# via typer
9+
connectorx==0.3.3
10+
# via -r requirements.in
911
exceptiongroup==1.2.2
1012
# via pytest
1113
filelock==3.15.4
@@ -24,13 +26,16 @@ maturin==1.7.0
2426
# via -r requirements_dev.in
2527
mdurl==0.1.2
2628
# via markdown-it-py
29+
mysqlclient==2.2.4
30+
# via -r requirements.in
2731
networkx==3.3
2832
# via -r requirements_dev.in
29-
numpy==2.0.0
33+
numpy==1.26.4
3034
# via
3135
# -r requirements_dev.in
3236
# pandas
3337
# pyarrow
38+
# rdkit
3439
# scipy
3540
# trimesh
3641
packaging==24.1
@@ -39,6 +44,8 @@ packaging==24.1
3944
# pytest
4045
pandas==2.2.2
4146
# via -r requirements.in
47+
pillow==10.4.0
48+
# via rdkit
4249
pluggy==1.5.0
4350
# via pytest
4451
polars==1.2.0
@@ -59,6 +66,8 @@ pytz==2024.1
5966
# via pandas
6067
pyyaml==6.0.1
6168
# via huggingface-hub
69+
rdkit==2024.3.3
70+
# via -r requirements.in
6271
requests==2.32.3
6372
# via huggingface-hub
6473
rich==13.7.1
@@ -82,7 +91,9 @@ tomli==2.0.1
8291
# maturin
8392
# pytest
8493
tqdm==4.66.4
85-
# via huggingface-hub
94+
# via
95+
# -r requirements.in
96+
# huggingface-hub
8697
trimesh==4.4.3
8798
# via -r requirements_dev.in
8899
typer==0.12.3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# This file was autogenerated by uv via the following command:
2+
# uv pip compile requirements_docs.in -o /home/runner/work/bio-data-to-db/bio-data-to-db/deps/lock/aarch64-apple-darwin/requirements_docs.txt --python-platform aarch64-apple-darwin --python-version 3.10
3+
babel==2.15.0
4+
# via mkdocs-material
5+
backports-strenum==1.3.1
6+
# via griffe
7+
cairocffi==1.7.1
8+
# via cairosvg
9+
cairosvg==2.7.1
10+
# via mkdocs-material
11+
certifi==2024.7.4
12+
# via requests
13+
cffi==1.16.0
14+
# via cairocffi
15+
charset-normalizer==3.3.2
16+
# via requests
17+
click==8.1.7
18+
# via
19+
# mkdocs
20+
# mkdocstrings
21+
colorama==0.4.6
22+
# via
23+
# griffe
24+
# mkdocs-material
25+
cssselect2==0.7.0
26+
# via cairosvg
27+
defusedxml==0.7.1
28+
# via cairosvg
29+
ghp-import==2.1.0
30+
# via mkdocs
31+
griffe==0.48.0
32+
# via mkdocstrings-python
33+
idna==3.7
34+
# via requests
35+
importlib-metadata==8.2.0
36+
# via mike
37+
importlib-resources==6.4.0
38+
# via mike
39+
jinja2==3.1.4
40+
# via
41+
# mike
42+
# mkdocs
43+
# mkdocs-material
44+
# mkdocstrings
45+
markdown==3.6
46+
# via
47+
# mkdocs
48+
# mkdocs-autorefs
49+
# mkdocs-material
50+
# mkdocstrings
51+
# pymdown-extensions
52+
markupsafe==2.1.5
53+
# via
54+
# jinja2
55+
# mkdocs
56+
# mkdocs-autorefs
57+
# mkdocstrings
58+
mergedeep==1.3.4
59+
# via
60+
# mkdocs
61+
# mkdocs-get-deps
62+
mike==2.1.2
63+
# via -r requirements_docs.in
64+
mkdocs==1.6.0
65+
# via
66+
# -r requirements_docs.in
67+
# mike
68+
# mkdocs-autorefs
69+
# mkdocs-coverage
70+
# mkdocs-gen-files
71+
# mkdocs-literate-nav
72+
# mkdocs-material
73+
# mkdocstrings
74+
mkdocs-autorefs==1.0.1
75+
# via
76+
# -r requirements_docs.in
77+
# mkdocstrings
78+
mkdocs-coverage==1.1.0
79+
# via -r requirements_docs.in
80+
mkdocs-gen-files==0.5.0
81+
# via -r requirements_docs.in
82+
mkdocs-get-deps==0.2.0
83+
# via mkdocs
84+
mkdocs-literate-nav==0.6.1
85+
# via -r requirements_docs.in
86+
mkdocs-material==9.5.30
87+
# via -r requirements_docs.in
88+
mkdocs-material-extensions==1.3.1
89+
# via
90+
# -r requirements_docs.in
91+
# mkdocs-material
92+
mkdocstrings==0.25.2
93+
# via
94+
# -r requirements_docs.in
95+
# mkdocstrings-python
96+
mkdocstrings-python==1.10.7
97+
# via -r requirements_docs.in
98+
packaging==24.1
99+
# via mkdocs
100+
paginate==0.5.6
101+
# via mkdocs-material
102+
pathspec==0.12.1
103+
# via mkdocs
104+
pillow==10.4.0
105+
# via
106+
# cairosvg
107+
# mkdocs-material
108+
platformdirs==4.2.2
109+
# via
110+
# mkdocs-get-deps
111+
# mkdocstrings
112+
pycparser==2.22
113+
# via cffi
114+
pygments==2.18.0
115+
# via mkdocs-material
116+
pymdown-extensions==10.9
117+
# via
118+
# mkdocs-material
119+
# mkdocstrings
120+
pyparsing==3.1.2
121+
# via mike
122+
python-dateutil==2.9.0.post0
123+
# via ghp-import
124+
pyyaml==6.0.1
125+
# via
126+
# mike
127+
# mkdocs
128+
# mkdocs-get-deps
129+
# pymdown-extensions
130+
# pyyaml-env-tag
131+
pyyaml-env-tag==0.1
132+
# via
133+
# mike
134+
# mkdocs
135+
regex==2024.7.24
136+
# via mkdocs-material
137+
requests==2.32.3
138+
# via mkdocs-material
139+
six==1.16.0
140+
# via python-dateutil
141+
tinycss2==1.3.0
142+
# via
143+
# cairosvg
144+
# cssselect2
145+
urllib3==2.2.2
146+
# via requests
147+
verspec==0.1.0
148+
# via mike
149+
watchdog==4.0.1
150+
# via mkdocs
151+
webencodings==0.5.1
152+
# via
153+
# cssselect2
154+
# tinycss2
155+
zipp==3.19.2
156+
# via importlib-metadata
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
816025c3ff73af3261b082ee7e0c71954aa6b20922e17344cfb2f29636733488 requirements.in
1+
2f65dd8deb2842edfead23a6aafb4f4f0b9e9e98982e39216069787d16327901 requirements.in
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
f0f530946f38443ec95d76ac402dc3e3045fe8f7c26220e46b575aa56649503d requirements_docs.in

0 commit comments

Comments
 (0)