Skip to content

Add improved typed casting to BigQuery#8331

Merged
betodealmeida merged 1 commit intoapache:masterfrom
lyft:VIZ-1033
Oct 21, 2019
Merged

Add improved typed casting to BigQuery#8331
betodealmeida merged 1 commit intoapache:masterfrom
lyft:VIZ-1033

Conversation

@betodealmeida
Copy link
Member

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

Add improved type casting for BigQuery, similar to what #8226 did for Presto. This prevents 64 bit integers from being converted to floats by Pandas, losing precision.

TEST PLAN

I verified that int64 values are not cast to float, and that precision is maintained. I also tested that all types are cast correctly:

SELECT
  'string' AS `string`,
  CAST('string' AS BYTES) AS `bytes`,
  100 AS `integer`,
  100.0 AS `float`,
  CAST(1 AS NUMERIC) AS `numeric`,
  CAST(1 AS BOOLEAN) AS `boolean`,
  STRUCT(1) AS `record`,
  TIMESTAMP(DATETIME "2008-12-25 15:30:00", "America/Los_Angeles") AS `timestamp`,
  DATE "2008-12-25" AS `date`,
  TIME "15:30:00" AS `time`,
  DATETIME "2008-12-25 15:30:00" AS `datetime`

Generated CSV:

string bytes integer float numeric boolean record timestamp date time datetime
string b'string' 100 100.0 1.0 True "{""_field_1"": 1}" 2008-12-25 23:30:00+00:00 2008-12-25 15:30:00 2008-12-25 15:30:00

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@betodealmeida betodealmeida added enhancement:request Enhancement request submitted by anyone from the community .database labels Oct 1, 2019
Copy link
Member

@mistercrunch mistercrunch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@robdiciuccio
Copy link
Member

Hey Beto, I've been working on this problem a bit for Postgres and MySQL, and I'm currently testing casting ints as object (strings) rather than the new Int64 type, since the latter is incompatible with PyArrow, which has significant serialization performance benefits. At the same time, it may go towards resolving the JS max integer size issue. Happy to chat with you about this.

@betodealmeida
Copy link
Member Author

Will merge, but work on a better solution with @robdiciuccio.

@betodealmeida betodealmeida merged commit cca689b into apache:master Oct 21, 2019
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.35.0 First shipped in 0.35.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels enhancement:request Enhancement request submitted by anyone from the community size/S 🚢 0.35.0 First shipped in 0.35.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments