-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pdf2parquet transform #416
Conversation
transforms/universal/pdf2md/python/src/pdf2md_transform_python.py
Outdated
Show resolved
Hide resolved
|
||
shortname = "pdf2md" | ||
cli_prefix = f"{shortname}_" | ||
pdf2md_modelsdir_key = f"{shortname}_modelsdir" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
|
||
| Parameter | Default | Description | | ||
|------------|----------|--------------| | ||
| `--pdf2md_modelsdir` | `./artifacts` | The location where the models are located or downloaded to | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are input parameters, so they are fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately, we are you using markdown and not raw text in the content column?
|
||
| Parameter | Default | Description | | ||
|------------|----------|--------------| | ||
| `--pdf2md_modelsdir` | `./artifacts` | The location where the models are located or downloaded to | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blublinsky no, we separately have config parameters to the transform and CLI parameters. this is well-established pattern in other transforms. My comment stands.
db915aa
to
6d1919b
Compare
|
||
| Parameter | Default | Description | | ||
|------------|----------|--------------| | ||
| `--pdf2md_modelsdir` | `./artifacts` | The location where the models are located or downloaded to | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we should add the non-CLI config params that do not include pdf2md_ prefix to the README, then the CLI --pdf2md* params can just refer to that documentation. The point here is that there is configuration of the transform that is independent of the CLI. the prefix is used in the CLI to help better distinguish these within the very large set of params that are presented in the cloud/kfp gui to run at scale.
1c38ef9
to
8fd219c
Compare
transforms/universal/pdf2parquet/python/src/pdf2parquet_transform.py
Outdated
Show resolved
Hide resolved
transforms/universal/pdf2parquet/python/src/pdf2parquet_transform.py
Outdated
Show resolved
Hide resolved
transforms/universal/pdf2parquet/python/test/test_pdf2parquet.py
Outdated
Show resolved
Hide resolved
transforms/universal/pdf2parquet/python/src/pdf2parquet_transform.py
Outdated
Show resolved
Hide resolved
transforms/universal/pdf2parquet/python/test/test_pdf2parquet_python.py
Outdated
Show resolved
Hide resolved
transforms/universal/pdf2parquet/ray/src/pdf2parquet_transform_ray.py
Outdated
Show resolved
Hide resolved
transforms/universal/pdf2parquet/python/test/test_pdf2parquet_python.py
Outdated
Show resolved
Hide resolved
transforms/universal/pdf2parquet/python/test/test_pdf2parquet_python.py
Outdated
Show resolved
Hide resolved
7b0ca63
to
8c4d0bf
Compare
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
…s (done automatically), cleanup prints Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]>
25b3aef
to
8e06993
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Why are these changes needed?
This PR adds a transform for converting PDF files to Markdown