Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow data_product_type override in run_data_processor. #1124

Open
davner opened this issue Nov 22, 2024 · 1 comment
Open

Allow data_product_type override in run_data_processor. #1124

davner opened this issue Nov 22, 2024 · 1 comment
Assignees
Labels
Data Services Data Services enhancement New feature or request User Issue Raised by a user

Comments

@davner
Copy link
Contributor

davner commented Nov 22, 2024

Is your feature request related to a problem? Please describe.
Yes, currently the run_data_processor function in the tom_dataproducts module does not support dynamic selection of data processors based on user supplied data_product_type. The data type is tied to the data product. This limitation restricts the flexibility needed in our project that require different processing strategies for different types of data products, especially when these decisions need to be made dynamically from a user interface.

Describe the solution you'd like
I propose extending the run_data_processor function by adding an optional parameter that allows overriding the data product type. This will enable selecting a specific processor at runtime. It would be a small PR of a couple of changes in the function to make sure the correct data_type is referenced and users can still call processor.data_type_override(). Our project requires this override to be applied before the processor is selected, so we cannot use that override option.

def run_data_processor(dp, dp_type_override=None):
    """
    Reads the `data_product_type` from the dp parameter or the override and imports the 
    corresponding `DATA_PROCESSORS` specified in
    `settings.py`, then runs `process_data` and inserts the returned values into the 
    database.

    :param dp: DataProduct which will be processed into a list
    :type dp: DataProduct
    
    :param dp_type_override: Optional. DataProduct type to override with. If None, the type from the `dp` object is used.
    :type dp_type_override: str, optional

    :returns: QuerySet of `ReducedDatum` objects created by the `run_data_processor` call
    :rtype: `QuerySet` of `ReducedDatum`
    """

Additional context
This feature was developed and tested in our local project setup where it proved to be crucial for allowing front-end users to select different processing strategies for different datasets.

If the TOMToolkit team is open to this, I can have a PR submitted for review immediately.

@davner davner added the enhancement New feature or request label Nov 22, 2024
@github-project-automation github-project-automation bot moved this to Triage in TOM Toolkit Nov 22, 2024
@jchate6 jchate6 added the User Issue Raised by a user label Nov 22, 2024
@jchate6 jchate6 self-assigned this Nov 22, 2024
@jchate6
Copy link
Contributor

jchate6 commented Dec 20, 2024

Hi @davner,
Sorry for the delay.
I would very much appreciate a PR implementing this change.

We are planning a complete refactor of data services for the TOMToolkit over the next year that will affect how data processors interact with Reduced Datums. Hopefully these changes will help us move towards those goals, but I'd also appreciate any input or pain points you have while we are mapping out that process.

@jchate6 jchate6 moved this from Triage to Backlog in TOM Toolkit Jan 10, 2025
@jchate6 jchate6 added the Data Services Data Services label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Services Data Services enhancement New feature or request User Issue Raised by a user
Projects
Status: Backlog
Development

No branches or pull requests

2 participants