Skip to content

Conversation

@mistercrunch
Copy link
Member

Druid returns NULL as 0, typed as int. This causes pandas to fail
when it tries to sort heterogeneous types.

Druid returns NULL as 0, typed as int. This causes pandas to fail
when it tries to sort heterogeneous types.
Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change pydruid to do this automatically, or maybe convert the zero to None?

cols += query_obj.get('groupby') or []
cols += query_obj.get('columns') or []
cols += query_obj.get('metrics') or []
cols += groupby
Copy link
Contributor

@xrmx xrmx Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cols = [DTTM_ALIAS] + groupby + columns + metrics

may be a bit cheaper and cleaner

cols = [col for col in cols if col in df.columns]
df = df[cols]

for col in groupby + columns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible for some of these columns to not be in df.columns given the list comprehension at line 1249?

@mistercrunch
Copy link
Member Author

@betodealmeida I agree that pydruid should take care of that, but that change may not be backward compatible, so I'm not sure how to handle it.

I have to admit I'm confused to see the related bugs. I feel like it's a new behavior (new version of Druid? PyDruid? error in druid ingestion?) as I we should have had this problem before...

related?
apache/druid#4349

@mistercrunch
Copy link
Member Author

Dug deeper and got to the root cause in #4358

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants