Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Add more context when UDFs fail #2325

Merged
merged 6 commits into from
Jun 12, 2024
Merged

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented May 29, 2024

This PR adds some context around failures of UDFs, and also has the side-effect of quite nicely isolating the actual exception to just the UDF code so users can be sure that the code failed in the UDF rather than in some part of Daft.

In [4]: daft.from_pydict({"foo": [1]}).with_column("identity", throws_error_if_len_1(daft.col("foo"))).collect()
Project [Stage:2]:   0%|                                                                                                    | 0/1 [00:00<?, ?it/s]---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/code/Daft/daft/udf.py:108, in PartialUDF.__call__(self, evaluated_expressions)
    107 try:
--> 108     result = func(*args, **kwargs)
    109 except Exception as user_function_exception:

Cell In[2], line 4, in throws_error_if_len_1(data)
      3 if len(data) == 1:
----> 4     raise ValueError("I hate len 1 data")
      5 else:

ValueError: I hate len 1 data

The above exception was the direct cause of the following exception:

... 

RuntimeError                              Traceback (most recent call last)
...
File ~/code/Daft/daft/udf.py:110, in PartialUDF.__call__(self, evaluated_expressions)
    108     result = func(*args, **kwargs)
    109 except Exception as user_function_exception:
--> 110     raise RuntimeError(f"User-defined function `{func.__name__}` failed when executing on inputs with lengths: {tuple(len(series) for series in evaluated_expressions)}") from user_function_exception
    112 # HACK: Series have names and the logic for naming fields/series in a UDF is to take the first
    113 # Expression's name. Note that this logic is tied to the `to_field` implementation of the Rust PythonUDF
    114 # and is quite error prone! If our Series naming logic here is wrong, things will break when the UDF is run on a table.
    115 name = evaluated_expressions[0].name()

RuntimeError: User-defined function `throws_error_if_len_1` failed when executing on inputs with lengths: (1,)

@github-actions github-actions bot added the enhancement New feature or request label May 29, 2024
@jaychia jaychia enabled auto-merge (squash) May 29, 2024 21:58
Copy link

codecov bot commented May 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@e836290). Learn more about missing BASE report.
Report is 23 commits behind head on main.

Current head 4ff64ae differs from pull request most recent head 18ee2be

Please upload reports for the commit 18ee2be to get more accurate results.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2325   +/-   ##
=======================================
  Coverage        ?   79.15%           
=======================================
  Files           ?      471           
  Lines           ?    54479           
  Branches        ?        0           
=======================================
  Hits            ?    43121           
  Misses          ?    11358           
  Partials        ?        0           
Files Coverage Δ
daft/udf.py 92.85% <100.00%> (ø)

@jaychia jaychia force-pushed the jay/add-context-for-failing-udfs branch 2 times, most recently from 4ff64ae to b1cc31e Compare May 30, 2024 22:03
@jaychia jaychia force-pushed the jay/add-context-for-failing-udfs branch from b1cc31e to 118058e Compare May 30, 2024 23:18
@jaychia jaychia merged commit cd775f5 into main Jun 12, 2024
42 checks passed
@jaychia jaychia deleted the jay/add-context-for-failing-udfs branch June 12, 2024 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant