Skip to content

opentargets/parquet2json

Repository files navigation

Parquet to NDJSON (newline delimited JSON) converter

Simple CLI app for converting parquet to newline delimited JSON. Built with Polars and Typer.

Instructions with docker or uv

docker

  • Install docker
  • docker build -t p2j .
  • docker run -v /path/to/files:/mnt -it p2j /mnt/<PARQUET> /mnt/<JSON> Or if reading parquet from Google bucket:
  • docker run -v /path/to/files:/mnt -v /path/to/gcp/credentials.json:/app/credentials.json -e GOOGLE_APPLICATION_CREDENTIALS=/app/credentials.json -it p2j gs://<PATH_TO_PARQUET> /mnt/<JSON>

uv

  • Install uv
  • Install to virtual env uv venv; source .venv/bin/activate; uv pip install "git+https://github.com/opentargets/parquet2json"
  • parquet2json <PARQUET_IN> <JSON_OUT>

Examples

Remote parquet to ndjson stdout and pipe to jq

uv run parquet2json gs://<parquet> | jq "."

or

docker run -it p2j gs://<parquet> | jq "."

Help

 Usage: parquet2json [OPTIONS] PARQUET JSON

 Convert parquet file to newline delimited JSON.                           
                                                                           
╭─ Arguments ──────────────────────────────────────────────────────────╮
│ *    parquet      TEXT    Input path/URI to parquet. [default: None] │
│                           [required]                                 │
│      json         [JSON]  Output NDJSON path, or leave empty for     │
│                           STDOUT                                     │
│                           [default: None]                            │
╰──────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────╮
│ --log-level          [critical|fatal|error|  Log level               │
│                      warn|warning|info|debu  [default: INFO]         │
│                      g|notset]                                       │
│ --help       -h                              Show this message and   │
│                                              exit.                   │
╰──────────────────────────────────────────────────────────────────────╯

Copyright

Copyright 2014-2024 EMBL - European Bioinformatics Institute, Genentech, GSK, MSD, Pfizer, Sanofi and Wellcome Sanger Institute

This software was developed as part of the Open Targets project. For more information please see: http://www.opentargets.org

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Releases

No releases published

Packages

No packages published