-
Notifications
You must be signed in to change notification settings - Fork 46
Spot conf parser #142
base: spot
Are you sure you want to change the base?
Spot conf parser #142
Conversation
"--conf spark.executor.cores=" + SPK_EXEC_CORES, | ||
"--conf spark.executor.memory=" + SPK_EXEC_MEM, | ||
"--conf spark.driver.maxResultSize=" + SPK_DRIVER_MAX_RESULTS, | ||
"--conf spark.yarn.driver.memoryOverhead=" + SPK_DRIVER_MEM_OVERHEAD, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.yarn.driver.memoryOverhead
should be spark.yarn.am.memoryOverhead
based on current spot branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
SPK_DRIVER_MAX_RESULTS= | ||
SPK_EXEC_CORES= | ||
SPK_DRIVER_MEM_OVERHEAD= | ||
SPK_EXEC_MEM_OVERHEAD= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks you fixed it.
TOL = conf.get('DEFAULT','TOL') | ||
|
||
#prepare options for spark-submit | ||
spark_cmd = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the values in spark_cmd and spark_extras are either modified or gone in spot branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm working on rebasing my branch right now to resolve this and the conflicts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was only the one change to the actual spark command that i could find. let me know if there is anything else, pushing the changes now.
Now that we are reviewing this... Obviously these vars are not being used, is there any reason to keep them around? PREPROCESS_STEP = "{0}_pre_lda".format(args.type) HDFS_DOCRESULTS = "{0}/doc_results.csv".format(HPATH) HDFS_WORDRESULTS = "{0}/word_results.csv".format(HPATH) LDA_OUTPUT_DIR = "{1}/{1}".format(args.type, args.fdate) |
I don't see any reason to keep those. |
Moving hdfs_setup and ml_ops to python scripts instead of bash to support new spot.conf
now all variables are being stored in spot.conf, including ingest configurations.
I have left the normal bash scripts for comparison and testing this round.