-
Notifications
You must be signed in to change notification settings - Fork 14
Assertion failed on taxi and plasticc benchmarks #458
Comments
Modin version used is git revision 8817093cb2f16be3709513b1517d8a4bc679900c. |
Regression or happens always with a debug build? |
I didn't try Release, but I could imagine that if it works in the same way as Debug, failed assertion means NULL pointer, so on the next line code using this pointer should crash https://github.com/intel-ai/hdk/blob/main/omniscidb/QueryEngine/Descriptors/QueryFragmentDescriptor.cpp#L297-L299 |
We run them in the internal performance benchmarking system |
Benchmarks with reduced size datasets could be used for additional smoke testing in CI. They should pass very quickly. My dataset generator script allows generating datasets for our benchmarks of any required size. |
This is a known issue that is going to be fixed by #442 Note, that this bug only appears when the first query you run in your process has multiple steps. We don't see it in our Modin tests and benchmarks (and also HDK CI) because the first test query is usually simple and benchmarks use a warm-up query having a single step. Here is the part of the patch that actually fixes the problem:
But you will have to add a warmup query anyway to avoid unexpected execution time for Taxi Q1. Also, it's surprising that Taxi Q1 triggers the problem. As I said, the problem only appears for queries having more than one step, which means Taxi Q1 on Modin has more than one step. Looking into Modin plans for these queries I see, that for Q1, Q3, and Q4 there is an additional projection after aggregation. This projection replaces UPD: the fix has been merged |
Resolved by #442 |
Probably, the projection is added by this fillna() - https://github.com/modin-project/modin/blob/master/modin/pandas/groupby.py#L899 . If I remove fillna() here, the additional projection disappears. |
I've investigated why we initially added this So I'll soon create a PR to remove it from modin |
When running taxi (taxi modified to work with header csv input) and plasticc benchmarks on HDK debug build I encounter failed assertion
QueryFragmentDescriptor.cpp:297 Check failed: table_info
. This happens in the same place both Linux and Windows so it is not a windows specific problem.Code to reproduce:
Datafile that contains a header and just one line:
The text was updated successfully, but these errors were encountered: