Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding hap transform #638

Merged
merged 49 commits into from
Oct 1, 2024
Merged

adding hap transform #638

merged 49 commits into from
Oct 1, 2024

Conversation

ian-cho
Copy link
Collaborator

@ian-cho ian-cho commented Sep 28, 2024

Why are these changes needed?

adding python implementation of hap transform under universal directory for DPK outer. Passed all tests when "make test". please review. Thanks.

.github/workflows/test-universal-hap.yml Outdated Show resolved Hide resolved
transforms/universal/hap/python/src/hap_transform.py Outdated Show resolved Hide resolved
transforms/universal/hap/python/pyproject.toml Outdated Show resolved Hide resolved
transforms/universal/hap/python/README.md Show resolved Hide resolved

* --model_name_or_path - specifies HAP model which should be compatable with HuggingFace's `AutoModelForSequenceClassification`
* --batch_size - modify it based on the infrastructure capacity.
* --max_length - the maximum length for the tokenizer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are are missing the annotation_column and document_column configuraitions.

Also, can you change to the format for documenting configuration used here https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/pdf2parquet/python?


# distribution versions is the same as image version.
set-versions:
$(MAKE) TRANSFORM_PYTHON_VERSION=$(HAP_PYTHON_VERSION) TOML_VERSION=$(HAP_PYTHON_VERSION) .transforms.set-versions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These HAP_*VERSION require additions to .make.versions at the top of the repo.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make the update in .make.versions but going forward, we should change that: it could be enough to define it in Makefile itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, at one point, maybe with kfp, these were required to be available more globally. In addition, the ray/Makefile refers to the HAP_PYTHON_VERSION, so may not be able to change unless we coalesce runtimes into a single project.


venv:: .transforms.python-venv

install:: pip install -r requirements.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a non-standard rule. You should probably remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a luancher-based tests, test_hap_python.py.

Copy link
Collaborator

@touma-I touma-I left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ian-cho We can go over my proposed changes together if any of it does not make sense.

.github/workflows/test-universal-hap.yml Outdated Show resolved Hide resolved
transforms/universal/hap/python/Dockerfile Show resolved Hide resolved
transforms/universal/hap/python/Makefile Outdated Show resolved Hide resolved
transforms/universal/hap/python/pyproject.toml Outdated Show resolved Hide resolved
transforms/universal/hap/python/pyproject.toml Outdated Show resolved Hide resolved
transforms/universal/hap/python/pyproject.toml Outdated Show resolved Hide resolved
transforms/universal/hap/python/pyproject.toml Outdated Show resolved Hide resolved
@ian-cho
Copy link
Collaborator Author

ian-cho commented Oct 1, 2024

@blublinsky (cc @daw3rd )

Hi thanks for fixes! I merge your suggested commits. Several tests still failed and now I am investigating it. Please let me know if you have any ideas on the failed test. Thanks again.

All tests passed. Please check

@touma-I
Copy link
Collaborator

touma-I commented Oct 1, 2024

@ian-cho There was one typo in the pypropoject.toml that was causing the build to fix. This is fixed now as below

authors = [
 { name = "Ian Cho", email = "[email protected]" },
 ]
-[tool.setuptools.dynamic]
-dependencies = {file = ["requirements.txt"]}
+
+dynamic = ["dependencies"]

Copy link
Member

@daw3rd daw3rd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming issues to be created for remaining comments.

super().__init__(config)
self.model_name_or_path = config.get("model_name_or_path")
self.annotation_column = config.get("annotation_column")
self.doc_text_column = config.get("doc_text_column")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should all have defaults or issue an exception if there is no default defined and it is not provided by the user. In general, we should assume (when possible), that the transform class may be instantiated directly by the user. This would mean the CLI is not used and so these could be empty key values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daw3rd agree, it is possible that users directly instantiate the class and there would be empty key values. How about self.model_name_or_path = config.get("model_name_or_path", ibm-granite/granite-guardian-hap-38m)? the same for other two parameters. I can add a warning message if the user does not specify a model, notifying them that the default model is triggered?

@touma-I touma-I merged commit 00d1fea into IBM:dev Oct 1, 2024
6 checks passed
@touma-I touma-I mentioned this pull request Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants