Skip to content

Conversation

@aviatesk
Copy link
Contributor

AFAIU the only requirement is update for apache/spark#29983.
In order to be consistent with the previous behavior and pass the
existing test suite, this PR is essentially equavalent to setting
spark.sql.legacy.statisticalAggregate to true.

Now the code is incompatible with spark-2.x or spark-3.0, and so I'd
like to recommend only supporting spark 3.1 and higher and scala 2.12
from now on.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

AFAIU the only requirement is update for <apache/spark#29983>.
In order to be consistent with the previous behavior and pass the
existing test suite, this PR is essentially equavalent to setting
`spark.sql.legacy.statisticalAggregate` to `true`.

Now the code is incompatible with spark-2.x or spark-3.0, and so I'd
like to recommend only supporting spark 3.1 and higher and scala 2.12
from now on.
@twollnik
Copy link
Contributor

twollnik commented Jun 8, 2021

Hi @aviatesk,
Thanks for submitting this PR! Unfortunately we can't make changes that are incompatible with Spark-2.4. Is it possible to keep the backwards compatibility to your knowledge?

@chethanuk
Copy link

@aviatesk When will support for 3.1 gonna get released?

@twollnik
Copy link
Contributor

Sadly, we can't drop 2.x compatibility. We have no immediate plan to support spark 3.1. thanks you anyways for introducing this PR!

@twollnik twollnik closed this Jul 20, 2021
@chethanuk
Copy link

chethanuk commented Jul 21, 2021

Now the code is incompatible with spark-2.x or spark-3.0, and so I'd
like to recommend only supporting spark 3.1 and higher and scala 2.12
from now on.

Let's do this instead, create a separate folder for spark3.1 and keep the required changes in there.

Also, Just ignoring Spark 3.1 bcoz we can't drop 2.x compatibility will not help Spark 3X users.
I mean if support for 3.1X is not available, users like me will have to use ge or other data quality tool if no support

@apython1998
Copy link

If anybody has suggestions for scala spark alternatives to Deequ that supports 3.1.x out of the box, please update this thread.

@lange-labs lange-labs reopened this Aug 5, 2021
@lange-labs lange-labs merged commit 128f21d into awslabs:master Aug 5, 2021
@aviatesk aviatesk deleted the avi/spark3.1 branch August 5, 2021 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants