Skip to content

Conversation

@betodealmeida
Copy link
Member

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

The fix introduced in #8226 is not working for some types. We have a query returning the following error:

<U10 cannot be converted to an IntegerDtype

This happens because before creating the Pandas dataframe we cast the data into a Numpy array, and Numpy is casting all columns to the same type. I fixed it by keeping the dtype as "object".

TEST PLAN

Query now runs successfully.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@khtruong

@betodealmeida betodealmeida merged commit 8e1fc2b into apache:master Sep 18, 2019
@codecov-io
Copy link

codecov-io commented Sep 18, 2019

Codecov Report

Merging #8253 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #8253   +/-   ##
=======================================
  Coverage   65.68%   65.68%           
=======================================
  Files         481      481           
  Lines       23348    23348           
  Branches     2572     2572           
=======================================
  Hits        15335    15335           
  Misses       7875     7875           
  Partials      138      138
Impacted Files Coverage Δ
superset/dataframe.py 94.48% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12fb8e7...02f51d4. Read the comment docs.

@DiggidyDave
Copy link
Contributor

what are the cases where they would not be the same type? is that suggestive of a bigger issue?

otherwise, LGTM

@betodealmeida
Copy link
Member Author

what are the cases where they would not be the same type? is that suggestive of a bigger issue?

otherwise, LGTM

@DiggidyDave, we're casting the results from the DB — a list of tuples — into a Numpy array so we can address each column efficiently:

>>> data = [("a", 1), ("b", 10)]
>>> np.array(data)
array([['a', '1'],
       ['b', '10']], dtype='<U2')
>>> np.array(data)[:,0]  # first column
array(['a', 'b'], dtype='<U2')
>>> np.array(data)[:,1]  # second column
array(['1', '10'], dtype='<U2')

Note that the numbers were cast to unicode, since that's the common type between int and unicode.

If we use "object", though:

>>> np.array(data, dtype='object')
array([['a', 1],
       ['b', 10]], dtype=object)

@betodealmeida betodealmeida mentioned this pull request Sep 20, 2019
12 tasks
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.35.0 First shipped in 0.35.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XS 🚢 0.35.0 First shipped in 0.35.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants