Skip to content

Scala/Java Spark support for Great Expectations #11336

@praveen-kanamarlapudi

Description

@praveen-kanamarlapudi

Is your feature request related to a problem? Please describe.
We use Scala Spark extensively for our ETL workloads. While Great Expectations supports PySpark, our pipelines are entirely Scala-based, making it challenging to integrate GE.

Describe the solution you'd like
Native Scala Spark support in Great Expectations, allowing expectations to be defined and executed directly within Scala/Spark jobs.

Describe alternatives you've considered

  • Running a separate PySpark job for validation after Scala Spark processing (adds latency, complexity, and extra infrastructure).
  • While having a sperate pyspark job is feasible, for critical metrics, we prefer the Data Quality checks to happen during the ingestion time.

Additional context
Scala Spark remains the dominant language in many big data platforms for ETL pipelines. Adding Scala support in GE would enable broader adoption in JVM-based environments, reduce cross-language overhead, and allow teams to leverage GE’s powerful data validation and documentation features without disrupting existing JVM based workflows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions