-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] adding transform functionality #2498
Conversation
Not sure why ruff-format is failing |
Thanks @otacilio-psf ! As I understand it, I'm curious why this would be preferable, since it feels like |
Hi @jaychia TL;DR: Yes, it helps with code readability and the transform accept a function plus args and kwargs.
The main reason behind is to help with code readability when we are using functions to split unit of works to be later unit tested. Instead of having a lot of dataframe variables (like df1, df_final, df_output, df_final_final), you can chain your transformations (as is already possible)
but would be nice to split the transformations in units of work, or business rules or as it make sense for your project
And instead of call each function for a "step-variable" or awfully nest the functions, you could use transform to call the function to be chained
Let me know if you have any other questions, and also this is what I wanted to avoid my daft test |
Thanks for the elaboration! This seems fine to me, would you be able to run You should be able to do this with running this command from your Daft virtual environment:
Lastly, let's add a unit test in |
@jaychia done 😊 |
great work @otacilio-psf! Merging now :) |
As in PySpark, the transform functionality allows you to split your transformations into units of work, creating a function for each, and then call them on your DataFrame, enabling the ability to chain transformations.
Having your transformations as functions helps in unit testing them.
PySpark reference