Skip to content

Conversation

@david-zlai
Copy link
Contributor

@david-zlai david-zlai commented Mar 12, 2025

Summary

^^^

Tested on AWS. see below:

image

image

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • New Features

    • Introduced a utility for parsing Hive partition strings, streamlining partition management.
    • Enhanced Apache Hudi integration in Spark SQL with updated configurations for catalog support and table write format.
    • Expanded AWS integration settings to support additional development environments, enabling broader network configuration options.
  • Refactor

    • Improved underlying serialization handling to bolster data processing reliability.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2025

Walkthrough

The pull request introduces non-functional changes across various modules. It adds a comment noting a potential partition issue in the Python API, updates a JSON customer value from “canary” to “dev”, and expands AWS integration mappings for “dev”. The changes also reorganize imports and formatting for readability, add a utility method for Hive partition parsing, remove a redundant local method, update Hudi-Spark configurations, extend Kryo registrations, and introduce a new test class for Hudi table operations.

Changes

Files Change Summary
api/py/.../run.py, api/py/.../teams.json Added a comment on the default "ds" assignment and updated "CUSTOMER_ID" value from "canary" to "dev"
cloud_aws/.../EmrSubmitter.scala Added "dev" customer entries for subnet and security group mappings
cloud_aws/.../GlueCatalogTest.scala, cloud_aws/.../HudiTableUtilsTest.scala Reorganized imports and formatting in GlueCatalogTest; introduced a test class to validate Hudi table operations
spark/.../format/Format.scala, spark/.../format/Hive.scala Added new parseHivePartition method to Format trait; removed local partition parsing in Hive to delegate to superclass
cloud_aws/.../hudi_spark_confs.yaml Introduced new Hudi-Spark configuration settings
spark/.../ChrononKryoRegistrator.scala Registered additional classes for Kryo serialization (EmptyList, WriteStatus, OverwriteWithLatestAvroPayload)
spark/.../TableUtils.scala Reorganized import statements and added commented options in repartitionAndWrite

Sequence Diagram(s)

sequenceDiagram
    participant T as HudiTableUtilsTest
    participant S as SparkSession
    participant TU as TableUtils
    participant H as Hudi Framework
    T->>S: Initialize Spark session with Hudi configs
    T->>TU: Create Hudi table with DataFrame
    TU->>H: Write data in Hudi format
    H-->>TU: Confirm table creation
    TU->>T: Verify table exists in Spark catalog
Loading

Suggested reviewers

  • piyush-zlai
  • varant-zlai
  • chewy-zlai
  • nikhil-zlai

Poem

In bytes and code the changes gleam,
A comment, a test—a shared dream.
New paths and tweaks light up the night,
Configs align in a subtle rite.
Cheers to our merge, a code delight!

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 4583093 and 234d53c.

📒 Files selected for processing (1)
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/EmrSubmitter.scala (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • cloud_aws/src/main/scala/ai/chronon/integrations/aws/EmrSubmitter.scala
⏰ Context from checks skipped due to timeout of 90000ms (15)
  • GitHub Check: spark_tests
  • GitHub Check: streaming_tests
  • GitHub Check: streaming_tests
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: groupby_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Comment on lines 39 to 41
case Success(isHudi) =>
logger.info(s"Hudi check: Successfully read the format of table: $tableName as Hudi")
isHudi
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worked here

�[36m2025/03/12 19:17:14�[m �[32mINFO �[m �[32mDefaultFormatProvider.scala:40�[m - Hudi check: Successfully read the format of table: data.plaid_raw as Hudi

@david-zlai david-zlai changed the title Davidhan/hudi emr Add Hudi format Mar 12, 2025
@david-zlai david-zlai marked this pull request as ready for review March 12, 2025 21:46
// Customer specific infra configurations
private val CustomerToSubnetIdMap = Map(
"canary" -> "subnet-085b2af531b50db44"
"canary" -> "subnet-085b2af531b50db44",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we going to need one for plaid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmmm yes we will. @chewy-zlai , we don't know these because it's on their account huh?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something we can thread through from teams.json? Or in general can the cluster configuration stuff be passed in from teams.json?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be threaded over.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah. The subnet is going to be something we have to get from Daniel as they aren't using the VPC we wanted to setup.

main_class=main_class,
)
+ f" --additional-conf-path=additional-confs.yaml --files={s3_file_args}"
+ f" --additional-conf-path={EMR_MOUNT_FILE_PREFIX}additional-confs.yaml --files={s3_file_args}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing this for now to fix. cc @chewy-zlai this is what you ran into earlier.

but in a follow up pr, i'm going to move this away to spark.files

"mode": "backfill",
"dataproc": False,
"ds": today,
"ds": today, # TODO: this breaks if the partition column is not the same as yyyy-MM-dd.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean if the format is not yyyy-MM-dd ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. plaid's date format was yyyyMMdd.

when I set the backfill start date to like 20250216, then the ds here was set to today's date of 2025-02-12. inconsistent formats

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh is this something that's controlled with the spark config?

sparkSession.conf.get("spark.chronon.partition.format", "yyyy-MM-dd")

or does this happen even earlier during compliation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


import org.apache.spark.sql.SparkSession

case object Hudi extends Format {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actaully not sure if we need a new format for HUDI at all, I think we can just use Hive.

.getString(1)
assertEquals("hudi", provider)

tableUtils.insertPartitions(sourceDF, tableName)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fails for me here

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
cloud_aws/src/main/resources/hudi_spark_confs.yaml (1)

1-3: Add newline at end of file.

Missing newline character at end of file per YAML linting rules.

spark.sql.catalog.spark_catalog: "org.apache.spark.sql.hudi.catalog.HoodieCatalog"
spark.sql.extensions: "org.apache.spark.sql.hudi.HoodieSparkSessionExtension"
spark.chronon.table_write.format: "hudi"
+
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 3-3: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between f9f5394 and eb1baeb.

📒 Files selected for processing (4)
  • cloud_aws/src/main/resources/hudi_spark_confs.yaml (1 hunks)
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/HudiTableUtilsTest.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • spark/src/main/scala/ai/chronon/spark/TableUtils.scala
🧰 Additional context used
🧠 Learnings (1)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (1)
Learnt from: nikhil-zlai
PR: zipline-ai/chronon#51
File: spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala:192-200
Timestamp: 2025-03-12T15:28:06.350Z
Learning: Only suggest registering Delta Lake action classes for serialization if they are actually used in the codebase.
🪛 YAMLlint (1.35.1)
cloud_aws/src/main/resources/hudi_spark_confs.yaml

[error] 3-3: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (16)
  • GitHub Check: streaming_tests
  • GitHub Check: streaming_tests
  • GitHub Check: spark_tests
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: groupby_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: fetcher_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (2)
spark/src/main/scala/ai/chronon/spark/ChrononKryoRegistrator.scala (2)

93-93: Appropriate addition for Kryo serialization.

EmptyList registration helps shuffle operations.


215-216: Good Hudi integration enhancement.

Required for proper serialization of Hudi payload classes.

Comment on lines 51 to 53
val back = tableUtils.loadTable(tableName)
val backSet = back.select("id").as[Int].collect().toSet
// assertEquals(input, backSet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix type casting issue in test.

String values ("a"-"d") can't be cast to Int.

  val back = tableUtils.loadTable(tableName)
- val backSet = back.select("id").as[Int].collect().toSet
+ val backSet = back.select("id").as[String].collect().toSet
//      assertEquals(input, backSet)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val back = tableUtils.loadTable(tableName)
val backSet = back.select("id").as[Int].collect().toSet
// assertEquals(input, backSet)
val back = tableUtils.loadTable(tableName)
val backSet = back.select("id").as[String].collect().toSet
// assertEquals(input, backSet)

Comment on lines 52 to 53
val backSet = back.select("id").as[Int].collect().toSet
// assertEquals(input, backSet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Uncomment or remove assertion.

Incomplete test - assertion is commented out.

  val backSet = back.select("id").as[String].collect().toSet
- //      assertEquals(input, backSet)
+ val expectedSet = Set("a", "b", "c", "d")
+ assertEquals(expectedSet, backSet)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val backSet = back.select("id").as[Int].collect().toSet
// assertEquals(input, backSet)
val backSet = back.select("id").as[String].collect().toSet
val expectedSet = Set("a", "b", "c", "d")
assertEquals(expectedSet, backSet)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
cloud_aws/src/test/scala/ai/chronon/integrations/aws/HudiTableUtilsTest.scala (2)

33-37: Consider diversifying test partition data.

All records use same date. Add multiple dates to properly test partitioning.

  val source = Seq(
    ("a", "2025-03-12"),
    ("b", "2025-03-12"),
    ("c", "2025-03-12"),
    ("d", "2025-03-12")
  )

26-58: Consider adding update/delete tests.

Current test only covers basic create/read. Hudi's key features include update and delete operations.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 432b670 and 4583093.

📒 Files selected for processing (2)
  • cloud_aws/src/test/scala/ai/chronon/integrations/aws/HudiTableUtilsTest.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/TableUtils.scala (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • spark/src/main/scala/ai/chronon/spark/TableUtils.scala
⏰ Context from checks skipped due to timeout of 90000ms (16)
  • GitHub Check: streaming_tests
  • GitHub Check: streaming_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: join_tests
  • GitHub Check: join_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: spark_tests
  • GitHub Check: groupby_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: spark_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: non_spark_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: groupby_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (6)
cloud_aws/src/test/scala/ai/chronon/integrations/aws/HudiTableUtilsTest.scala (6)

11-24: Solid test setup.

Properly configures Hudi catalog, extensions, and Kryo registrator.


40-40: Good partitioning setup.

Correctly creates Hudi table with PARQUET format and date-based partitioning.


41-48: Good provider verification.

Properly verifies table exists and uses Hudi provider.


50-50: Previously failing section now fixed.

Line previously had issues per david-zlai's comment, now properly implemented.


53-54: Fixed type issue and uncommented assertion.

Correctly collects both columns as (String, String) tuple and verifies against source data.


55-57: Good cleanup practice.

Properly drops test table in finally block.

local = true,
additionalConfig = Some(
Map(
"spark.sql.catalog.spark_catalog" -> "org.apache.spark.sql.hudi.catalog.HoodieCatalog",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's change this to a different catalog default_hudi, and set that as the default catalog.

        "spark.sql.defaultCatalog" -> "default_hudi",
        "spark.sql.catalog.default_hudi" -> "org.apache.spark.sql.hudi.catalog.HoodieCatalog"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, but what's the reason

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to merge and put up a new PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david-zlai david-zlai merged commit 85306d9 into main Mar 13, 2025
20 checks passed
@david-zlai david-zlai deleted the davidhan/hudi_emr branch March 13, 2025 16:12
@coderabbitai coderabbitai bot mentioned this pull request Mar 28, 2025
4 tasks
kumar-zlai pushed a commit that referenced this pull request Apr 25, 2025
## Summary

^^^

Tested on AWS. see below:


![image](https://github.com/user-attachments/assets/c9bdcc5a-7319-459e-88a4-a32cfa84f1cf)


![image](https://github.com/user-attachments/assets/9c76b8f0-3aef-4215-8db3-4b2535dd745b)


## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a utility for parsing Hive partition strings, streamlining
partition management.
- Enhanced Apache Hudi integration in Spark SQL with updated
configurations for catalog support and table write format.
- Expanded AWS integration settings to support additional development
environments, enabling broader network configuration options.
	
- **Refactor**
- Improved underlying serialization handling to bolster data processing
reliability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kumar-zlai pushed a commit that referenced this pull request Apr 29, 2025
## Summary

^^^

Tested on AWS. see below:


![image](https://github.com/user-attachments/assets/c9bdcc5a-7319-459e-88a4-a32cfa84f1cf)


![image](https://github.com/user-attachments/assets/9c76b8f0-3aef-4215-8db3-4b2535dd745b)


## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a utility for parsing Hive partition strings, streamlining
partition management.
- Enhanced Apache Hudi integration in Spark SQL with updated
configurations for catalog support and table write format.
- Expanded AWS integration settings to support additional development
environments, enabling broader network configuration options.
	
- **Refactor**
- Improved underlying serialization handling to bolster data processing
reliability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

^^^

Tested on AWS. see below:


![image](https://github.com/user-attachments/assets/c9bdcc5a-7319-459e-88a4-a32cfa84f1cf)


![image](https://github.com/user-attachments/assets/9c76b8f0-3aef-4215-8db3-4b2535dd745b)


## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a utility for parsing Hive partition strings, streamlining
partition management.
- Enhanced Apache Hudi integration in Spark SQL with updated
configurations for catalog support and table write format.
- Expanded AWS integration settings to support additional development
environments, enabling broader network configuration options.
	
- **Refactor**
- Improved underlying serialization handling to bolster data processing
reliability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary

^^^

Tested on AWS. see below:


![image](https://github.com/user-attachments/assets/c9bdcc5a-7319-459e-88a4-a32cfa84f1cf)


![image](https://github.com/user-attachments/assets/9c76b8f0-3aef-4215-8db3-4b2535dd745b)


## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a utility for parsing Hive partition strings, streamlining
partition management.
- Enhanced Apache Hudi integration in Spark SQL with updated
configurations for catalog support and table write format.
- Expanded AWS integration settings to support additional development
environments, enabling broader network configuration options.
	
- **Refactor**
- Improved underlying serialization handling to bolster data processing
reliability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 16, 2025
## Summary

^^^

Tested on AWS. see below:


![image](https://github.com/user-attachments/assets/c9bdcc5a-7319-459e-88a4-a32cfa84f1cf)


![image](https://github.com/user-attachments/assets/9c76b8f0-3aef-4215-8db3-4b2535dd745b)


## Cheour clientslist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a utility for parsing Hive partition strings, streamlining
partition management.
- Enhanced Apache Hudi integration in Spark SQL with updated
configurations for catalog support and table write format.
- Expanded AWS integration settings to support additional development
environments, enabling broader network configuration options.
	
- **Refactor**
- Improved underlying serialization handling to bolster data processing
reliability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants