Skip to content

Conversation

@elek
Copy link
Member

@elek elek commented Apr 1, 2021

JIRA: https://issues.apache.org/jira/browse/HDDS-5056

What changes were proposed in this pull request?

BackgroundPipelineCreator in SCM tries to pipelines with regular intervals. This check is very simple, if new pipeline can be created, it will create them.

However, if the pipelines couldn't be created (which is very likely) an error message is printed out:

scm_1       | 2021-03-31 15:34:57,004 [RatisPipelineUtilsThread - 0] ERROR pipeline.PipelineManagerV2Impl: Failed to create pipeline of type RATIS and factor THREE. Exception: Pipeline creation failed because nodes are engaged in other pipelines and every node can only be engaged in max 1 pipelines. Required 3. Found 0

Today it's printed out on ERROR level, but this message doesn't provide any new information I think the PipelineManager.createPipeline shouldn't use ERROR level especially as the exception is re-thrown:

    lock.lock();
   try {
     Pipeline pipeline = pipelineFactory.create(type, factor);
     stateManager.addPipeline(pipeline.getProtobufMessage(
         ClientVersions.CURRENT_VERSION));
     recordMetricsForPipeline(pipeline);
     return pipeline;
   } catch (IOException ex) {
     LOG.error("Failed to create pipeline of type {} and factor {}. " +
         "Exception: {}", type, factor, ex.getMessage());
     metrics.incNumPipelineCreationFailed();
     throw ex;
   } finally {
     lock.unlock();
   }

It should be the responsibility of the caller to log the exception or ignore it (in case of BackgroundPipelineCreator)

How was this patch tested?

Full CI tests.

@elek elek requested a review from bshashikant April 1, 2021 09:51
@elek elek changed the title Avoid false-positive error messages during pipeline creations HDDS-5056. Avoid false-positive error messages during pipeline creations Apr 1, 2021
Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM.
Thanks, @elek for taking care of this.

@elek
Copy link
Member Author

elek commented Apr 7, 2021

Thanks the review @bharatviswa504 and @bshashikant. Build become green. I am merging it now..

@elek elek merged commit 534eefa into apache:master Apr 7, 2021
errose28 added a commit to errose28/ozone that referenced this pull request Apr 7, 2021
* HDDS-3698-nonrolling-upgrade:
  HDDS-5056. Avoid false positiver error messages during pipeline creations (apache#2105)
  HDDS-5027. [SCM HA Security] Handle leader changes during bootstrap. (apache#2113)
  HDDS-5032. Fix findbugs (apache#2120)
  HDDS-5062. Add a config to bypass clusterId validation for bootstrapping SCM. (apache#2114)
  HDDS-5011. Introduce Java based ReplicationConfig implementation (apache#2089)
  HDDS-4925. Introduce ContainerBalancer in SCM with start/stop capabilities. (apache#2097)
errose28 added a commit to errose28/ozone that referenced this pull request Apr 7, 2021
* HDDS-3698-nonrolling-upgrade:
  HDDS-5056. Avoid false positiver error messages during pipeline creations (apache#2105)
  HDDS-5027. [SCM HA Security] Handle leader changes during bootstrap. (apache#2113)
  HDDS-5032. Fix findbugs (apache#2120)
  HDDS-5062. Add a config to bypass clusterId validation for bootstrapping SCM. (apache#2114)
  HDDS-5011. Introduce Java based ReplicationConfig implementation (apache#2089)
  HDDS-4925. Introduce ContainerBalancer in SCM with start/stop capabilities. (apache#2097)
errose28 added a commit to errose28/ozone that referenced this pull request Apr 9, 2021
* HDDS-3698-nonrolling-upgrade: (150 commits)
  HDDS-5056. Avoid false positiver error messages during pipeline creations (apache#2105)
  HDDS-5027. [SCM HA Security] Handle leader changes during bootstrap. (apache#2113)
  HDDS-5032. Fix findbugs (apache#2120)
  HDDS-5062. Add a config to bypass clusterId validation for bootstrapping SCM. (apache#2114)
  HDDS-5011. Introduce Java based ReplicationConfig implementation (apache#2089)
  HDDS-4925. Introduce ContainerBalancer in SCM with start/stop capabilities. (apache#2097)
  fix project name in NOTICE.txt (apache#2112)
  HDDS-5066. Use fixed vesion from pnpm to build recon (apache#2115)
  HDDS-5014. Add non-rolling upgrade design docs.
  HDDS-5035. Use default config values to solve generated config file conflict (apache#2087)
  HDDS-5032. DN stopped to load containers on volume after a container load exception. (apache#2109)
  HDDS-4504. Datanode deletion config should be based on number of blocks (apache#1885)
  Fix ozone-ha acceptance test.
  HDDS-5058. Make getScmInfo retry for a duration.
  HDDS-4506. Support query parameter based v4 auth in S3g (apache#1628)
  HDDS-4553. ChunkInputStream should release buffer as soon as last byte in the buffer is read (apache#2062)
  HDDS-5022. SCM get roles command should provide Ratis Leader/Follower… (apache#2098)
  HDDS-5033. SCM may not be able to know full port list of Datanode after Datanode is started. (apache#2090)
  HDDS-3752. Fix o3fs list bucket contents issue when without tailing "/" (apache#2088)
  HDDS-4901. Remove OmOzoneAclMap from OmVolumeArgs to avoid OzoneAcl conversions (apache#1992)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants