Skip to content
This repository has been archived by the owner on May 15, 2019. It is now read-only.

Spot conf parser #142

Open
wants to merge 23 commits into
base: spot
Choose a base branch
from
Open

Spot conf parser #142

wants to merge 23 commits into from

Conversation

natedogs911
Copy link
Contributor

Moving hdfs_setup and ml_ops to python scripts instead of bash to support new spot.conf
now all variables are being stored in spot.conf, including ingest configurations.

I have left the normal bash scripts for comparison and testing this round.

"--conf spark.executor.cores=" + SPK_EXEC_CORES,
"--conf spark.executor.memory=" + SPK_EXEC_MEM,
"--conf spark.driver.maxResultSize=" + SPK_DRIVER_MAX_RESULTS,
"--conf spark.yarn.driver.memoryOverhead=" + SPK_DRIVER_MEM_OVERHEAD,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark.yarn.driver.memoryOverhead should be spark.yarn.am.memoryOverhead based on current spot branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

SPK_DRIVER_MAX_RESULTS=
SPK_EXEC_CORES=
SPK_DRIVER_MEM_OVERHEAD=
SPK_EXEC_MEM_OVERHEAD=

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you fixed it.

TOL = conf.get('DEFAULT','TOL')

#prepare options for spark-submit
spark_cmd = [

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the values in spark_cmd and spark_extras are either modified or gone in spot branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm working on rebasing my branch right now to resolve this and the conflicts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was only the one change to the actual spark command that i could find. let me know if there is anything else, pushing the changes now.

@natedogs911
Copy link
Contributor Author

Now that we are reviewing this...

Obviously these vars are not being used, is there any reason to keep them around?

PREPROCESS_STEP = "{0}_pre_lda".format(args.type)
POSTPROCESS_STEP = "{0}_post_lda".format(args.type)

HDFS_DOCRESULTS = "{0}/doc_results.csv".format(HPATH)
LOCAL_DOCRESULTS = "{0}/doc_results.csv".format(LPATH)

HDFS_WORDRESULTS = "{0}/word_results.csv".format(HPATH)
LOCAL_WORDRESULTS = "{0}/word_results.csv".format(LPATH)

LDA_OUTPUT_DIR = "{1}/{1}".format(args.type, args.fdate)

@rabarona
Copy link

I don't see any reason to keep those.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants