Skip to content

Conversation

@ashishkumar50
Copy link
Contributor

What changes were proposed in this pull request?

While creating pipeline we are using time from java.util time package. During pipeline scrub we should use same java.util time package for correct comparison. So that pipeline which is in allocated state for long time can be removed.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7463

Please replace this section with the link to the Apache JIRA)

How was this patch tested?

Code has been built locally and manually tested.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sumitagrawl for reporting the problem, @ashishkumar50 for the patch.

  1. The same instance of MonotonicClock is used in several places, so the same problem may affect other services. We should fix all of them.
  2. Clock was introduced here to improve the tests (quicker and more deterministic than sleep or waitFor). Reverting to Instant.now() eliminates those advantages. It also breaks TestPipelineStateManagerImpl, which relies on TestClock.

I think we should try to fix MonotonicClock, or replace it with SystemClock for production use.

@adoroszlai adoroszlai requested a review from sodonnel December 15, 2022 15:44
@sodonnel
Copy link
Contributor

sodonnel commented Dec 15, 2022

I don't recall why I didn't use SystemClock, and instead created MonotonicClock. Perhaps I was trying to replace some calls to Time.monoticNow() that were originally there. Looks that way in the original PR:

#2425

Strange it has taken so long for this to be noticed, as we have been using it for a long time.

I wonder can we just replace all occurrences of MontonicClock with Java.time.SystemClock to fix this?

@sumitagrawl
Copy link
Contributor

I don't recall why I didn't use SystemClock, and instead created MonotonicClock. Perhaps I was trying to replace some calls to Time.monoticNow() that were originally there. Looks that way in the original PR:

#2425

Strange it has taken so long for this to be noticed, as we have been using it for a long time.

I wonder can we just replace all occurrences of MontonicClock with Java.time.SystemClock to fix this?

As I have checked, SystemClock will give system time wrt January 1, 1970 UTC. The use will fix this problem.

And MontonicClock will give time epoc wrt system start -- This will be useful to measure performance / time taken for a method

@sodonnel @adoroszlai Do we need change the clock instance for this to SystemClock ? or replace MonotonicClock all places.

@adoroszlai
Copy link
Contributor

I don't recall why I didn't use SystemClock, and instead created MonotonicClock. Perhaps I was trying to replace some calls to Time.monoticNow() that were originally there

I guess MonotonicClock is better for comparing instants within the same process. Any service that gets timestamps from other process via messages, or from its own serialized state, needs to avoid it.

@kerneltime
Copy link
Contributor

We should not use Instant.now() but revert to using only monotonic clocks in all places.

@sumitagrawl
Copy link
Contributor

sumitagrawl commented Dec 16, 2022

We should not use Instant.now() but revert to using only monotonic clocks in all places.

We need use SystemClock() as object inside method, because pipeline Time is serialized to DB also. But clock instance is also used for checking time taken, but way of usecase.

In Recon, its using Instance.now().

So,

  1. create a new clock as system clock and with existing Monotonic clock, use systemClock for this usages
  2. Same change for Recon also to remove Instance.now()

OR current changes in PR is ok as similar to Recon.

Please suggest the changes ....
@kerneltime @adoroszlai @sodonnel

@sodonnel
Copy link
Contributor

I think we should replace MonotonicClock with SystemClock anywhere it is created and passed into objects to be used. It will achieve the intended goal and be more safe to use.

@sodonnel
Copy link
Contributor

We should not use Instant.now() but revert to using only monotonic clocks in all places.

We need use SystemClock() as object inside method, because pipeline Time is serialized to DB also. But clock instance is also used for checking time taken, but way of usecase.

In Recon, its using Instance.now().

This is because we didn't change all usage of time to the new model. We have been fixing things on a case by case basis, and trying to ensure new code follows the new pattern.

1. create a new clock as system clock and with existing Monotonic clock, use systemClock for this usages

2. Same change for Recon also to remove Instance.now()

OR current changes in PR is ok as similar to Recon.

Existing change is not OK. It also breaks the tests. Lets just try to replace all occurrences on MonotonicClock in the project with java.util.SystemClock instead. It should be a drop in replacement.

Intellij reckons there are 24 occurrences of it through the code base, which isn't too many to fix.

@kerneltime
Copy link
Contributor

@sodonnel is java.util.SystemClock guaranteed to be monotonic; I do not quite understand the need to not use the monotonic clock. I would assume, we want to replace all instances with a clock that does not time travel.

@sodonnel
Copy link
Contributor

@kerneltime The monotonic clock as it stands cant be used to compare times across JVM restarts. Its starting point is some meaningless instance that can be used to compare duration within the same JVM instance. I feel this could result in more unexpected bugs like this one.

If not SystemClock, then what do you suggest?

@ashishkumar50
Copy link
Contributor Author

Hi @kerneltime @sodonnel please suggest the change to be done for this issue.

@sodonnel
Copy link
Contributor

For PipelineManager we need to switch from MonotonicClock to an instance of java.util.SystemClock instead.

Feels like going forward with Monotonic clock in other places is a risk too, but for now we can fix the immediate problem as above.

Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - we can commit after a green CI run.

@sodonnel sodonnel merged commit c7785fa into apache:master Dec 20, 2022
errose28 added a commit to errose28/ozone that referenced this pull request Dec 21, 2022
* master: (88 commits)
  HDDS-7463. SCM Pipeline scrubber never able to cleanup allocated pipeline. (apache#4093)
  HDDS-7683. EC: ReplicationManager - UnderRep maintenance handler should not request nodes if none needed (apache#4109)
  HDDS-7635. Update failure metrics when allocate block fails in preExecute. (apache#4086)
  HDDS-7565. FSO purge directory for old bucket can update quota for new bucket (apache#4021)
  HDDS-7654. EC: ReplicationManager - merge mis-rep queue into under replicated queue (apache#4099)
  HDDS-7621. Update SCM term in datanode from heartbeat without any commands (apache#4101)
  HDDS-7649. S3 multipart upload EC release space quota wrong for old version (apache#4095)
  HDDS-7399. Enable specifying external root ca (apache#4053)
  HDDS-7398. Tool to remove old certs from the scm db (apache#3972)
  HDDS-6650. S3MultipartUpload support update bucket usedNamespace. (apache#4081)
  HDDS-7605. Improve logging in Container Balancer (apache#4067)
  HDDS-7616. EC: Refactor Unhealthy Replicated Processor (apache#4063)
  HDDS-7426. Add a new acceptance test for Streaming Pipeline. (apache#4019)
  HDDS-7478. [Ozone-Streaming] NPE in when creating a file with o3fs. (apache#3949)
  HDDS-7425. Add documentation for the new Streaming Pipeline feature. (apache#3913)
  HDDS-7438. [Ozone-Streaming] Add a createStreamKey method to OzoneBucket. (apache#3914)
  HDDS-7431. [Ozone-Streaming] Disable data steam by default. (apache#3900)
  HDDS-6955. [Ozone-streaming] Add explicit stream flag in ozone shell (apache#3559)
  HDDS-6867.  [Ozone-Streaming] PutKeyHandler should not use streaming to put EC key. (apache#3516)
  HDDS-6842. [Ozone-Streaming] Reduce the number of watch requests in StreamCommitWatcher. (apache#3492)
  ...
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Feb 6, 2023
…line. (apache#4093)

(cherry picked from commit c7785fa)
Change-Id: I7308a10f8ff61baf6cfd92031f622187a7c14cc7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants