Skip to content

Conversation

@nilesh-c
Copy link
Member

@jimkont @sangv please review these small changes and merge the PR to nildev - till now most of Spark's logging was disabled. I'm making it configurable, easier while setting up clusters/troubleshooting.

@nilesh-c nilesh-c changed the title Make logging level configurable Make logging level and number of job submission threads configurable Jul 11, 2014
@nilesh-c
Copy link
Member Author

Also, I made a commit that makes the number of threads in the thread pool used for creating the job submission futures:

implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(extractionJobThreads))
val futures = for (job <- jobs) yield future
                                          {
                                            job.run()
                                          }

Here's the comment used for DistConfig.extractionJobThreads that should provide you some insight:

/**
   * Number of threads to use in the ExecutionContext while calling DistExtractionJob.run() on multiple
   * extraction jobs in parallel.
   *
   * Note that these threads on the driver node do not perform any heavy work except for executing
   * DistExtractionJob.run() which submits the respective Spark job to the Spark master and waits
   * for the job to finish.
   *
   * By default it is set to Integer.MAX_VALUE so that all extraction jobs are submitted to Spark master
   * simultaneously, which uses the configured scheduling mechanism to execute the jobs on the cluster.
   */

@jimkont I remember that long back you had some doubts as to what implications the number of threads have on parallel multiple language extraction. Hope the above comment clarifies it.

@jimkont
Copy link
Member

jimkont commented Jul 11, 2014

So, if I understand correctly, on a multi-language extraction spark will
get a huge list of tasks and the decide how to distribute them
automatically.
I'd like to benchmark this and see the actual gain, otherwise it might be
better to set this to 1 or 2 by default and get better logging and
completed jobs faster

@nilesh-c
Copy link
Member Author

@jimkont Yes, to be a bit pedantic, Spark will get a list of "jobs", each consisting of "stages" (eg. map, reduce etc.) which in turn contain N tasks (parallely distributed) For sake of simplicity, let's assume a job consists of a lot of tasks.

So, depending on the scheduler used, Spark will schedule the jobs across the cluster according to the number of available worker slots.

By default the FIFO scheduler is used. In this case a job is taken from a FIFO queue and executed. If task slots are still free while it is being executed, the next job's tasks are fetched and so on. Otherwise, we may use the FAIR scheduler which tries to give all the jobs equal priority.

And as you said, yeah, it's best to benchmark this. It's probably a good idea to merge this if you're okay with the code, create an issue about the default, and we can just change the default after benchmarking?

nilesh-c added 3 commits July 11, 2014 22:01
…ling threads

The program was not stopping even after SparkContext was stopped and everything. The ExecutorService needs to be shut down too.
… the default Level.WARN

Problem was: getValue() returns null directly instead of passing it through the closure.
…the maps have no such value.

Example: Often in a stage the first TaskEnd event may tell us that the task has failed - in that case doing this.stageIdToTasksComplete(stageId) would cause a NoSuchElementException. Similarly for stageIdToTasksFailed.
jimkont added a commit that referenced this pull request Jul 12, 2014
Make logging level and number of job submission threads configurable
@jimkont jimkont merged commit d67f035 into nildev2 Jul 12, 2014
@nilesh-c nilesh-c deleted the feature-loglevel-config branch July 26, 2014 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants