Skip to content

Conversation

@Neuw84
Copy link
Contributor

@Neuw84 Neuw84 commented Apr 29, 2022

Tips

What is the purpose of the pull request

Added support for initializing DeltaStreamer without a defined Spark Master. That will enable the usage of DeltaStreamer on environments such as AWS Glue or other serverless environments where the spark master is inherited and we do not have access to it.

Brief change log

  • Modify HoodieDeltaStreamer class in order to have an option to inherit Spark Master.
  • Modify UtilHelpers class to have an option to start the Spark Context without defined Master.
  • Modify HoodieMultiTableDeltaStreamer to have the same default Spark Master( although right now is not used).

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

Neuw84 added 2 commits April 29, 2022 11:14
Master. That will enable the usage of DeltaStreamer on environments such
as AWS Glue or other serverless environments where the spark master is 
inherited and we do not have access to it.
@Neuw84
Copy link
Contributor Author

Neuw84 commented Apr 29, 2022

@hudi-bot run azure

@yihua yihua self-assigned this Apr 29, 2022
@yihua yihua added priority:medium Moderate impact; usability gaps area:ingest Ingestion into Hudi engine:spark Spark integration labels Apr 29, 2022
@pratyakshsharma
Copy link
Contributor

pratyakshsharma commented May 7, 2022

One high level comment, will it be possible to add test cases for this change?

@Neuw84
Copy link
Contributor Author

Neuw84 commented May 9, 2022

Hi @pratyakshsharma,

I could try to add a test for initialising the Spark Context. However, the change is very simple and looking for the tests I did not see any that test that tests the other constructors.

The only thing is that now, if no Spark master is not defined it will inherit or the application will fail as no Spark context is defined. Previously it would run with local[2] as Spark Master.

What do you have in mind? (looking for some guidance).

@pratyakshsharma
Copy link
Contributor

I am not having anything as of now. Let me check and get back to you. In the mean time, please check CI failures.

@nsivabalan nsivabalan assigned xushiyan and unassigned yihua May 11, 2022
@nsivabalan nsivabalan added priority:high Significant impact; potential bugs and removed priority:medium Moderate impact; usability gaps labels May 11, 2022
@Neuw84
Copy link
Contributor Author

Neuw84 commented May 13, 2022

Hi,

Should I rebase it from latest stable version? Almost every time I merge somethings breaks ( and it´s not because my changes).

Thanks!

@pratyakshsharma
Copy link
Contributor

To be able to merge, you need to rebase it on master itself.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:ingest Ingestion into Hudi engine:spark Spark integration priority:high Significant impact; potential bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants