Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce scheduling configurations. #5232

Merged
merged 9 commits into from
May 31, 2022

Conversation

style95
Copy link
Member

@style95 style95 commented May 10, 2022

Description

This is to make a couple of scheduling factors configurable.

Related issue and scope

  • I opened an issue to propose and discuss this change (#????)

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Scheduler
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

Unmarshal(responseEntity).to[String].map(response => {
InvokerEnabled.parseJson(response) shouldEqual InvokerEnabled(true)
})
Unmarshal(responseEntity)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unclear why scalaFmt suddenly requires this kind of change.

@@ -116,7 +112,7 @@ class MemoryQueue(private val etcdClient: EtcdClient,
private val durationChecker: DurationChecker,
private val action: FullyQualifiedEntityName,
messagingProducer: MessageProducer,
config: WhiskConfig,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this is no longer being used.

scheduling {
stale-threshold = "100 milliseconds"
check-interval = "100 milliseconds"
drop-interval = "10 seconds"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another config I forgot to mention we'll need to tune so good to see it configurable here. Does 10 seconds make sense as the default if the built in max action duration is 60 seconds? If the namespace or action is throttled and can't create more containers, then may be waiting up to 60 seconds to process new activations. On the flip side I think this is used to be able to stop the scheduler once all queues are drained before stopping so I also understand why you would want this to be on the lower side

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see okay. Maybe the max retention ms default should be the same as the default completion ack timeout which I think is three minutes based on the timeout calculation in code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so do you want me to update the default retention time to 3 mins?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think that would be best unless you see a reason why it should be 1 min since you're more familiar with the architecture

@@ -911,7 +902,7 @@ class MemoryQueue(private val etcdClient: EtcdClient,
private def getStaleActivationNum(count: Int, queue: Queue[TimeSeriesActivationEntry]): Int = {
if (queue.isEmpty || Duration
.between(queue.head.timestamp, Instant.now)
.compareTo(StaleDuration) < 0) count
.compareTo(schedulingConfig.staleThreshold.toJava) < 0) count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is toJava needed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FiniteDuration of scala does not extend Comparable.
Since it compares duration based on Duration of java, it requires that.

@@ -64,6 +64,11 @@ whisk {
grpc {
tls = "false"
}
scheduling {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is already under scheduler I think this isn't providing any more descriptiveness. How about calling the config decision-maker?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These configurations are related to both MemoryQueue and DecisionMaker.
The two intervals are not used by the decision-maker.

scheduling:
staleThreshold: "{{ scheduler_scheduling_staleThreshold | default('100 milliseconds') }}"
checkInterval: "{{ scheduler_scheduling_checkInterval | default('100 milliseconds') }}"
dropInterval: "{{ scheduler_scheduling_dropInterval | default('10 seconds') }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about dropStaleActivationTime for this config? Or is the config here the interval at which to check whether there are stale activations to drop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just an interval to check to see if there are any stale activations.

@@ -64,6 +64,11 @@ whisk {
grpc {
tls = "false"
}
scheduling {
stale-threshold = "100 milliseconds"
Copy link
Contributor

@ningyougang ningyougang May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the value is FiniteDuration type, how about change it to stale-duration-threshold?
(btw, check-interval or drop-interval, from the name, we can know it is a FiniteDuration type easily)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought its value includes the unit so we can easily figure out its type.

@bdoyle0182
Copy link
Contributor

LGTM

@jiangpengcheng
Copy link
Contributor

looks like there is a compile error

@style95
Copy link
Member Author

style95 commented May 19, 2022

It's surprising that scalaFmt could not detect the error.

> Task :core:scheduler:compileScala
Pruning sources from previous analysis, due to incompatible CompileSetup.
/Users/style95/git/openwhisk/core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/queue/MemoryQueue.scala:58: Unused import
import scala.language.postfixOps
                      ^
one error found

> Task :core:scheduler:compileScala FAILED

FAILURE: Build failed with an exception.

I've removed the unused import.

@bdoyle0182
Copy link
Contributor

Is it just the build blocking this right now?

@style95
Copy link
Member Author

style95 commented May 29, 2022

Is it just the build blocking this right now?

I think so.

@style95
Copy link
Member Author

style95 commented May 30, 2022

It's weird that Travis complains about what I have not changed such as Logging.scala.

/home/travis/build/apache/openwhisk/common/scala/src/main/scala/org/apache/openwhisk/common/Logging.scala:311: Widening conversion from Long to Double is deprecated because it loses precision. Write `.toDouble` instead.
      token.gauge.update(value)

@codecov-commenter
Copy link

codecov-commenter commented May 30, 2022

Codecov Report

Attention: Patch coverage is 57.89474% with 8 lines in your changes missing coverage. Please review.

Project coverage is 75.18%. Comparing base (1e09049) to head (3197612).
Report is 139 commits behind head on master.

Files with missing lines Patch % Lines
.../openwhisk/core/loadBalancer/FPCPoolBalancer.scala 0.00% 5 Missing ⚠️
...rg/apache/openwhisk/core/scheduler/Scheduler.scala 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5232      +/-   ##
==========================================
- Coverage   80.16%   75.18%   -4.99%     
==========================================
  Files         238      238              
  Lines       14009    14032      +23     
  Branches      567      578      +11     
==========================================
- Hits        11231    10550     -681     
- Misses       2778     3482     +704     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -63,8 +63,6 @@ dependencies {
compile "com.google.code.findbugs:jsr305:3.0.2"
compile "io.fabric8:kubernetes-client:${gradle.kube_client.version}"

compile "org.scala-lang.modules:scala-java8-compat_${gradle.scala.depVersion}:1.0.2"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this change as it introduces a compatibility issue.

@bdoyle0182
Copy link
Contributor

Looks like the build passed, I'm good with merging in

@style95 style95 merged commit 4ef94d7 into apache:master May 31, 2022
JesseStutler pushed a commit to JesseStutler/openwhisk that referenced this pull request Jul 13, 2022
* Introduce scheduling configurations.

* Apply SchedulingConfig to MemoryQueue.

* Apply SchedulingConfig to SchedulingDecisionMaker.

* Apply ScalaFmt

* Remove unused import

* Change configs.

* Fix test cases.

* Apply scalaFmt

* Remove Java8-compat dependency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants