-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-4697][YARN]System properties should override environment variables #3557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #24045 has started for PR 3557 at commit
|
|
Test build #24045 has finished for PR 3557 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maintain the structure here and just switch the blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, that sounds more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this order was done on purpose. There were a few issues if they are the other way.
See commit 3f779d8 and associated jira https://issues.apache.org/jira/browse/SPARK-1631.
cc @vanzin
I know this was brought up once before but I don't know if anyone fully investigated to see if there is another solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There have been two PRs that tried to "fix" this. The problem here is:
- You cannot create a SparkContext if
spark.app.nameis not set in SparkConf - With this change, if
spark.app.nameis set,SPARK_YARN_APP_NAMEwill never be used.
The fix, in my view, is to stop supporting these env variables at some point.
|
Had one nit, otherwise this change looks right to me. |
|
Test build #24082 has started for PR 3557 at commit
|
|
Test build #24082 has finished for PR 3557 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @WangTaoTheTonic as mentioned by others this breaks the Spark app name ordering detailed here: #539. If you want to make this change then you need to special case the app name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After taking a look at #539 and #699, I am not sure I understand what you means here. The app names observed by RM is set by appContext.setApplicationName(args.appName) while args.appName is decided by SPARK_YARN_APP_NAME than spark.app.name.
I still can't see why change this two's order could affect other aspects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method you're modifying (getExtraClientArguments) is the one that defines the --name argument for ClientArguments. And you're inverting the priority here, so that spark.app.name > SPARK_YARN_APP_NAME. So basically, since spark.app.name is mandatory, SPARK_YARN_APP_NAME becomes useless.
But please test it; make sure both work, both in client and cluster mode. Something might have changed since those fixes went in, although I kinda doubt it.
|
Did you test it to see if they still work? |
|
@tgravescs Ok if you mean whether the app name could still be shown correctly, I will test it. |
|
@WangTaoTheTonic did you have a chance to test this? |
|
Test build #25224 has started for PR 3557 at commit
|
|
Test build #25224 has finished for PR 3557 at commit
|
|
Test PASSed. |
|
I've tested it and results are:
After this patch:
As @vanzin said, the So should we take Note: In test cases I didn't use |
|
Actually that's a little bit surprising. SparkPi.scala sets the app name to "Spark Pi", so I'd expect that to show up when the user does not explicitly override it. Also, there seems to be some disprepancies between client and cluster mode in handling the env variable. |
|
@vanzin
I didn't use the original SparkPi in sparkcode but the one I customized. |
|
I see. Still, the third line of the "before patch" table is unexpected. |
|
where were you looking to get the application name? On the RM or in spark or both? |
|
@vanzin Which one do you mean? Client mode or Cluster mode? |
|
I mean it's unexpected because they're different when they should be the same (in that case, the value of |
|
I am afraid it is not.
So, in the progress, the app name would never get value from the env |
|
I understand all that. I'm saying that's unexpected, in that I'd expect both modes to behave the same. So if there's anything to fix here, that's it. |
|
Oh, I see. But after this patch |
That's why we should not commit this patch. |
|
Or we could just make SPARK_YARN_APP_NAME disappear? As the env variable is not recommended and it will cause different behavior. User can still use Or we should make SPARK_YARN_APP_NAME and spark.app.name a special case in What do you two think? @tgravescs @andrewor14 |
|
Although in general we should honor Spark properties over environment variables, the app name has been a special case and should remain so for backward compatibility (i.e. we need to keep Additionally, it is not intuitive that if you set both |
|
Test build #25402 has started for PR 3557 at commit
|
|
@andrewor14 @vanzin
The names on Driver's UI and RM's UI is consistent. |
|
Test build #25402 has finished for PR 3557 at commit
|
|
Test PASSed. |
|
So it used to be (0.8 and 0.9) that spark on yarn was ran like this. We kept this work in the 1.0 release as well for backwards compatibility. spark-class org.apache.spark.deploy.yarn.Client does moving those to spark submit break the app name for this? I don't know of anyone still using that but if we break it we should consciously decide that is unsupported. |
|
I am afraid in that case user can only set app name via Eh...Sorry to ask the command you listed is using cluster mode? This PR would not impact the app name in cluster mode. In cluster mode |
|
Personally, since there have been 3 stable releases already that have the current semantics, and we effectively treat env variables as deprecated, I'd just not change anything, document that env variables are deprecated, and remove them in a future release (1.4 or later). But if you guys really think this is worth doing, the latest patch looks ok, I guess. |
|
@vanzin I'm not sure if it's on our roadmap to deprecate the environment variables altogether, since there are many others for other modes that are actively being used. That said I would also prefer to remove the env vars eventually, but until then we should keep the semantics in YARN consistent with other modes (i.e. system properties should override env variables). Unfortunately we can't do anything about the app name but to special case it. @sryza @tgravescs any thoughts? |
|
I'm strongly in favor of deprecating the env vars as well. Until we do that, keeping the semantics consistent (and special casing if need be) seems reasonable to me. |
|
I'm in favor of deprecating too but I don't think its good to do it in minor release. Probably 2.0 would be good time. Lets just leave the app name special case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be startsWith, also in the comments we generally capitalize YARN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, you mean capitalize YARN in the comment, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops didn't see your wrote comments, sorry for the noise
|
Hey @WangTaoTheTonic I think this LGTM. By the way, when I merge this I will file a new JIRA that makes the app name behavior consistent across cluster and client modes and add it to the commit title. I'll also fix the few minor comments I left. This is going into master. |
I found some arguments in yarn module take environment variables before system properties while the latter override the former in core module.