-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: initial project structure and implementation #14
feat: initial project structure and implementation #14
Conversation
hey @Karanjot786, just some initlal thoughts; your CI build is failing. It seems you need to update your poetry.lock file. Just add the "poetry lock" command after installing poetry and you should be good. Although you also have conflicting versions of certain dependencies in your poetry config, you'll also need to resolve those or poetry will throw an error when you try to install or update dependencies. Also I think you should consider using the actions/setup-poetry action to install poetry as that provides a more stable experience, and its actually built specifically for such a situation, as opposed to script you're currently using. Finally, this might be a bit of a nitpick but this PR kinda does a lot for an initial setup, like writing both converters (in a less than ideal way too). And skips a vital initial task, which is setting up your WES and TES models, perhaps this was done intentionally please a short walk through your though process would be much appreciated. @uniqueg, maybe you could weigh in on this. |
Hi @SalihuDickson, I apologize for the PR doing more than intended for an initial setup. My initial thought was to get the basic structure and some key functionality in place to provide a clearer picture of the overall design. However, I understand the importance of breaking it down into more manageable pieces. Regarding the converters, I wanted to provide a working example to show the intended direction, but I agree it may have been premature without fully setting up the WES and TES models first. I'll refocus on setting up the models properly in the next steps. Thank you again for your guidance. |
thank you @Karanjot786, you should go ahead and push you CI fix so get rid of the error. Next you should make sure all your CI checks are passing (your tests are failing aswell). |
Hi @SalihuDickson, Thank you for your feedback. I have resolved the CI pipeline issue and pushed the fix to remove the error. I'll now focus on ensuring all CI checks are passing and address the failing tests. Thanks again for your guidance. |
Hi @SalihuDickson and @uniqueg, I'm sorry for the comment mess earlier. I've now resolved all the issues, and I'm happy to report that all the checks are successful. Thank you for your patience and guidance. |
Hey @Karanjot786, just a few things;
If you feel the issue with the tests is not something you can fix, feel free to ask for help on what to do, that's why I and @uniqueg are here. |
Hi @SalihuDickson , Thank you for your feedback. I apologize for commenting on the test job in the CI workflow. I focused on resolving the major issues causing the CI workflow to fail. I've re-enabled the test jobs, but I've commented out two tests that still need further attention and resolution in the future. I've added detailed instructions on how to run the package, including the required input formats. This should help streamline the review process. I will include some dummy data to facilitate testing without needing additional data. I'll also set the required properties on the input to false and set a default for the output file. I'll make these updates as soon as possible and let me know if I need any more help. |
DescriptionThis PR includes fixes for the integration test and provides detailed instructions on how to run the CrateGen package. How to Run the PackagePrerequisites
Installation
Running the PackageYou can use the CLI tool provided by CrateGen to convert TES or WES data to WRROC format. TES to WRROC Conversion
Running TestsTo ensure everything is working correctly, you can run the tests: poetry run pytest --cov=crategen tests |
hey @Karanjot786, thank you for getting to these so quickly, however there are still a few issues that need resolving;
For future reference these really aren't issues that should be caught by a reviewer, they are fairly basic and can be uncovered by some basic testing before making an MR. Some points I would like you to consider to improve the interaction btw the user and the CLI;
I apologise that this PR is taking time to be approved but the fact that it does so much means there is a lot of functionality to go through and test before moving forward, in the future we can mitigate this by making sure each PR tackles 1 specific issue directly for just 1 conversion (when a conversion is concerned), that way when the functionality has been pinned down you can easily apply it to other conversions. |
Hi @SalihuDickson , Thank you for the detailed feedback and your patience. Here are the steps I will take to address the issues: Update the Instructions:
Input Data Conversion:
ISO 8601 Conversion:
WES to WRROC Converter:
User Interaction Improvements:
I will make these changes and thoroughly test the application before updating the PR. Thank you for your guidance and support in ensuring the quality of this project. |
Hi @SalihuDickson, Updated CLI Implementation:
TES Converter:
WES Converter:
Utility Function:
How to Run the PackageInstall Dependencies:Ensure that all dependencies are installed using Poetry. poetry install Create Sample Input DataPrepare sample JSON files for both TES and WES to be used as input for the CLI. Sample TES Input (tes_example.json):{
"id": "task-id",
"name": "test-task",
"description": "test-description",
"executors": [{"image": "executor-image"}],
"inputs": [{"url": "input-url", "path": "input-path"}],
"outputs": [{"url": "output-url", "path": "output-path"}],
"creation_time": "2023-07-10T14:30:00Z",
"end_time": "2023-07-10T15:30:00Z"
} Sample WES Input (wes_example.json):{
"run_id": "run-id",
"run_log": {
"name": "test-run",
"start_time": "2023-07-10T14:30:00Z",
"end_time": "2023-07-10T15:30:00Z"
},
"state": "COMPLETED",
"outputs": [{"location": "output-location", "name": "output-name"}]
} Run the TES to WRROC ConversionUse the CLI to convert the TES input data to WRROC format. poetry run python -m crategen.cli --input tests/data/input/tes_example.json --output tests/data/output/tes_to_wrroc_output.json --conversion-type tes-to-wrroc Verify the output in Run the WES to WRROC ConversionUse the CLI to convert the WES input data to WRROC format. poetry run python -m crategen.cli --input tests/data/input/wes_example.json --output tests/data/output/wes_to_wrroc_output.json --conversion-type wes-to-wrroc Verify the output in Additional Notes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
besides this minor correction i think this PR is just about good to go. But I want you to move the tests to a new PR, so we can work on them separately. Great work so far.
Hi @SalihuDickson , Thank you for the feedback and guidance.
Please review the updated PR and the new PR for the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +/- fine with merging this as a sandbox to see what goes and what doesn't. And for that purpose, I am also fine with not including tests for now.
However, I see numerous problems with the proposed solutions, and after merging this, I think it is critical to clearly define how the library (and the CLI) will work in more detail.
In particular, I think it is critically important to think about the following:
- There are two opposing but not quite symmetrical features the tool should provide: (1) packaging a WRROC from WES/TES information, possibly enriched by additional information from TRS, DRS, etc., if available; (2) extracting "WES/TES requests" from a WRROC; I think both merit their own separate function in the CLI (
crategen package ...
andcrategen extract ...
) and possibly two abstract classes, a WRROC packager and a GA4GH Cloud API request extractor. - For both, it needs to be clear what exactly the inputs and outputs are, and how they can be reasonably obtained and consumed, respectively. See more info in the points below on this.
- We need to consider that there are multiple hierarchical WRROC profiles and it needs to be clear how WRROC are produced based on these profiles (e.g., from just a TES run, we probably can't create a Provenance Run Crate). Will we just create the highest level WRROC we can from the info we have? If and how do we consume information that may or may not be available, e.g., from TRS/DRS?
- As mentioned already, input (and output) information should be validated. Once that is in place, when packaging a WRROC, the tool could (and probably should) auto-determine if it is passed the data from a WES or TES run and choose the appropriate packager class; no need to ask the user for this. However, this feature can easily be implemented later (but useful to consider).
- For extracting WES/TES API request information, things are more tricky. First of all, an RO-Crate can contain multiple WRROC profile entities. How does the user select the one that is supposed to be processed? Or do we only accept the WRROC entity itself - without the encompassing RO-Crate? That would make it hard for the user. Moreover, a Workflow Run Crate could be converted into a WES run - or one or more TES runs. Again, how does the user select what the tool processes, and how/with what extractor? It is important to design a sensible flow and user interface (both in Python and via the CLI) with reasonable defaults (e.g., process all available WRROC entities and extract a WES call instead of one or more TES calls whenever possible)
- Similarly, for creating WRROC, do we just create the WRROC entities or complete valid RO-Crate files? If the latter, what if an RO-Crate already exists and we only want to add the packaged WRROC entity?
- How is the API request data supposed to be consumed? Think about what would make sense here from a UX perspective.
That being said, all of these don't need to be addressed in this PR. Just keep these in mind when designing the next steps.
Please just fix the addressed code comments and let's get this merged. Then let's see where it goes. But for that to lead to some reasonable answers, I 100% agree with Salihu. We need (valid!) test/dummy data and good documentation/instructions to see how that code performs and where we run into problems. So that (along with tests) would be the work for your next two PRs.
Hi @uniqueg I will implement separate functions in the CLI for packaging ( I will ensure the tool can handle multiple hierarchical WRROC profiles and produce the highest level WRROC possible based on the available information. I'll also consider how to incorporate additional data from TRS/DRS. I will add validation for both input and output data. Additionally, I will work on implementing auto-detection of WES/TES data to select the appropriate packager class without user input. I will design a user interface and flow that allows users to select the specific WRROC entity to be processed, with sensible defaults for processing all available WRROC entities when applicable. I will design the tool to create complete valid RO-Crate files. If an RO-Crate already exists, the tool will add the packaged WRROC entity to the existing file. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @Karanjot786 - it looks good to me now, at least good enough as a checkpoint. We urgently need dummy data and see where we are. And once that works: Tests!
This PR includes the initial project structure and the implemented files for the CrateGen project. This is the initial version of the project and it is not yet complete. Please review the structure and provide any feedback or suggestions for improvement.
The project structure includes:
pyproject.toml
for proper dependency management.Thank you!