Skip to content

[GOBBLIN-2173] Avoid Adhoc flow spec addition for non leasable entity #4076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 19, 2024
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
import org.apache.gobblin.metrics.ServiceMetricNames;
import org.apache.gobblin.runtime.api.FlowSpec;
import org.apache.gobblin.runtime.api.FlowSpecSearchObject;
import org.apache.gobblin.runtime.api.LeaseUnavailableException;
import org.apache.gobblin.runtime.api.SpecNotFoundException;
import org.apache.gobblin.runtime.spec_catalog.AddSpecResponse;
import org.apache.gobblin.runtime.spec_catalog.FlowCatalog;
Expand Down Expand Up @@ -256,7 +257,10 @@ public CreateKVResponse<ComplexResourceKey<FlowId, FlowStatusId>, FlowConfig> cr
responseMap = this.flowCatalog.put(flowSpec, true);
} catch (QuotaExceededException e) {
throw new RestLiServiceException(HttpStatus.S_503_SERVICE_UNAVAILABLE, e.getMessage());
} catch (Throwable e) {
} catch(LeaseUnavailableException e){
throw new RestLiServiceException(HttpStatus.S_409_CONFLICT, e.getMessage());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually exception messages are designed for logging, more than for end-user consumption, so probably not appropriate to blindly return that. (it's sometimes done for a 5xx error, as above... but even that can be inadvisable.)

anyway, the 409 above might offer a better template:

return new CreateKVResponse<>(new RestLiServiceException(HttpStatus.S_409_CONFLICT,
    "FlowSpec with URI " + flowSpec.getUri() + " was launched less than N secs ago, no action will be taken"));

(to provide N we may wish to tunnel the value of epsilon... or at least how many secs remain before a subsequent launch would be possible)

also: when do we want to return (as that 409 above does), vs. throw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated as discussed offline

}
catch (Throwable e) {
// TODO: Compilation errors should fall under throwable exceptions as well instead of checking for strings
log.warn(String.format("Failed to add flow configuration %s.%s to catalog due to", flowConfig.getId().getFlowGroup(), flowConfig.getId().getFlowName()), e);
throw new RestLiServiceException(HttpStatus.S_500_INTERNAL_SERVER_ERROR, e.getMessage());
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.gobblin.runtime.api;

/**
* An {@link RuntimeException} thrown when lease cannot be acquired on provided entity.
*/
public class LeaseUnavailableException extends RuntimeException {
public LeaseUnavailableException(String message) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beyond clearly naming for callers, impl-wise, this definitely relates to a flow, so that should be a ctor param. consider whether to allow a catcher to reach in to access the details as instance member(s) or merely to use internally in the ctor, to contextualize the message passed along to super.

super(message);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,13 @@ public interface DagManagementStateStore {
*/
void updateDagNode(Dag.DagNode<JobExecutionPlan> dagNode) throws IOException;

/**
* Returns true if lease can be acquired on entity provided in leaseParams.
* @param leaseParams uniquely identifies the flow, the present action upon it, the time the action was triggered,
* and if the dag action event we're checking on is a reminder event
*/
boolean isLeaseAcquirable(DagActionStore.LeaseParams leaseParams) throws IOException;

/**
* Returns the requested {@link org.apache.gobblin.service.modules.flowgraph.Dag.DagNode} and its {@link JobStatus}.
* Both params are returned as optional and are empty if not present in the store.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,11 @@ public LeaseAttemptStatus tryAcquireLease(DagActionStore.LeaseParams leaseParams
throw new RuntimeException(String.format("Unexpected LeaseAttemptStatus (%s) for %s", leaseAttemptStatus.getClass().getName(), leaseParams));
}

@Override
public boolean isLeaseAcquirable(DagActionStore.LeaseParams leaseParams) throws IOException {
return decoratedMultiActiveLeaseArbiter.isLeaseAcquirable(leaseParams);
}

@Override
public boolean recordLeaseSuccess(LeaseAttemptStatus.LeaseObtainedStatus status)
throws IOException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,17 @@ public interface MultiActiveLeaseArbiter {
LeaseAttemptStatus tryAcquireLease(DagActionStore.LeaseParams leaseParams, boolean adoptConsensusFlowExecutionId)
throws IOException;

/**
* This method checks if lease can be acquired on provided flow in lease params
* returns true if entry for the same flow does not exists within epsilon time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very reasonable method-level javadoc... but it turns out epsilon is not mentioned anywhere in class-level javadoc, so this method description lacks context.

so, please add the class-level info. mentioning the name 'epsilon' is fine, but definitely also give it a more specific name, like "Lease Consolidation Period".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

* in leaseArbiterStore, else returns false
* @param leaseParams uniquely identifies the flow, the present action upon it, the time the action
* was triggered, and if the dag action event we're checking on is a reminder event
* @return true if lease can be acquired on the flow passed in the lease params, false otherwise
*/
boolean isLeaseAcquirable(DagActionStore.LeaseParams leaseParams)
Copy link
Contributor

@phet phet Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method name itself suggests a pre-check capability (e.g. first check whether it's acquirable and if so, then tryAcquireLease... being assured of success).

of course, because check-then-act patterns are susceptible to race conditions, we'd never actually provide such an API - let's not confuse anyone!

how about boolean existsSimilarLeaseWithinConsolidationPeriod(LeaseParams)?

(or existsEquivalentLeaseWithinConsolidationPeriod)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and apologies that I probably wasn't explaining clearly when earlier suggesting names like existsLeasableEntity (to mean that "another one already exists, historically")

throws IOException;

/**
* This method is used to indicate the owner of the lease has successfully completed required actions while holding
* the lease of the dag action event. It marks the lease as "no longer leasing", if the eventTimeMillis and
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ public class MySqlDagManagementStateStore implements DagManagementStateStore {
// todo - these two stores should merge
private DagStateStoreWithDagNodes dagStateStore;
private DagStateStoreWithDagNodes failedDagStateStore;
private MultiActiveLeaseArbiter multiActiveLeaseArbiter;
private final JobStatusRetriever jobStatusRetriever;
private boolean dagStoresInitialized = false;
private final UserQuotaManager quotaManager;
Expand All @@ -79,13 +80,14 @@ public class MySqlDagManagementStateStore implements DagManagementStateStore {

@Inject
public MySqlDagManagementStateStore(Config config, FlowCatalog flowCatalog, UserQuotaManager userQuotaManager,
JobStatusRetriever jobStatusRetriever, DagActionStore dagActionStore) {
JobStatusRetriever jobStatusRetriever, DagActionStore dagActionStore, MultiActiveLeaseArbiter multiActiveLeaseArbiter) {
this.quotaManager = userQuotaManager;
this.config = config;
this.flowCatalog = flowCatalog;
this.jobStatusRetriever = jobStatusRetriever;
this.dagManagerMetrics.activate();
this.dagActionStore = dagActionStore;
this.multiActiveLeaseArbiter = multiActiveLeaseArbiter;
}

// It should be called after topology spec map is set
Expand Down Expand Up @@ -168,6 +170,11 @@ public synchronized void updateDagNode(Dag.DagNode<JobExecutionPlan> dagNode)
this.dagStateStore.updateDagNode(dagNode);
}

@Override
public boolean isLeaseAcquirable(DagActionStore.LeaseParams leaseParams) throws IOException {
return multiActiveLeaseArbiter.isLeaseAcquirable(leaseParams);
}

@Override
public Optional<Dag<JobExecutionPlan>> getDag(Dag.DagId dagId) throws IOException {
return Optional.ofNullable(this.dagStateStore.getDag(dagId));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,16 @@ else if (leaseValidityStatus == 2) {
}
}

/*
Determines if a lease can be acquired for the given flow. A lease is acquirable if
no existing lease record exists in arbiter table or the record is older then epsilon time
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably no need for this comment here in the impl, but if you want one, bring it into line w/ the orig from the interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, since javadoc already there for the interface, removed comment from here

@Override
public boolean isLeaseAcquirable(DagActionStore.LeaseParams leaseParams) throws IOException {
Optional<GetEventInfoResult> infoResult = getExistingEventInfo(leaseParams);
return infoResult.isPresent() ? !infoResult.get().isWithinEpsilon() : true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idiomatic:

return infoResult.map(result -> !result.isWithinEpsilon()).getOrElse(true);

}

/**
* Checks leaseArbiterTable for an existing entry for this dag action and event time
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import java.util.Properties;
import java.util.concurrent.TimeUnit;

import org.apache.gobblin.runtime.api.LeaseUnavailableException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

Expand Down Expand Up @@ -78,6 +79,7 @@ public class Orchestrator implements SpecCatalogListener, Instrumentable {
protected final SpecCompiler specCompiler;
protected final TopologyCatalog topologyCatalog;
private final JobStatusRetriever jobStatusRetriever;
private final DagManagementStateStore dagManagementStateStore;

protected final MetricContext metricContext;

Expand All @@ -100,6 +102,7 @@ public Orchestrator(Config config, TopologyCatalog topologyCatalog, Optional<Log
this.topologyCatalog = topologyCatalog;
this.flowLaunchHandler = flowLaunchHandler;
this.sharedFlowMetricsSingleton = sharedFlowMetricsSingleton;
this.dagManagementStateStore = dagManagementStateStore;
this.jobStatusRetriever = jobStatusRetriever;
this.specCompiler = flowCompilationValidationHelper.getSpecCompiler();
// todo remove the need to set topology factory outside of constructor GOBBLIN-2056
Expand All @@ -125,6 +128,7 @@ public AddSpecResponse onAddSpec(Spec addedSpec) {
_log.info("Orchestrator - onAdd[Topology]Spec: " + addedSpec);
this.specCompiler.onAddSpec(addedSpec);
} else if (addedSpec instanceof FlowSpec) {
validateAdhocFlowLeasability((FlowSpec) addedSpec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "validate"/"verify" are good for methods returning a boolean. the entire purpose of this void method is to throw an exception. clearly indicate that with stronger naming, like "failIf..." or "enforce..."

_log.info("Orchestrator - onAdd[Flow]Spec: " + addedSpec);
return this.specCompiler.onAddSpec(addedSpec);
} else {
Expand All @@ -133,6 +137,31 @@ public AddSpecResponse onAddSpec(Spec addedSpec) {
return new AddSpecResponse<>(null);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an FYI, this also gets called during updating a flow. But since we have a condition of checking the flow is scheduled or not and we don't expect users to update an adhoc flow, we should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since this can still be called for adhoc flows, it would be good to test what the behaviour is. No need to handle it specially, but to know what the behaviour is would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the callout, will add it for the test suite


/*
validates if lease can be acquired on the provided flowSpec,
else throw LeaseUnavailableException
*/
private void validateAdhocFlowLeasability(FlowSpec flowSpec) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add javadoc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

if (!flowSpec.isScheduled()) {
Config flowConfig = flowSpec.getConfig();
String flowGroup = flowConfig.getString(ConfigurationKeys.FLOW_GROUP_KEY);
String flowName = flowConfig.getString(ConfigurationKeys.FLOW_NAME_KEY);

DagActionStore.DagAction dagAction = DagActionStore.DagAction.forFlow(flowGroup, flowName,
FlowUtils.getOrCreateFlowExecutionId(flowSpec), DagActionStore.DagActionType.LAUNCH);
DagActionStore.LeaseParams leaseParams = new DagActionStore.LeaseParams(dagAction, System.currentTimeMillis());
_log.info("validation of lease acquirability of adhoc flow with lease params: " + leaseParams);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep it brief! (we just made improvements in that vein #4074 )

maybe:

 _log.info("checking adhoc lease acquirability {}" + leaseParams);

try {
if (!dagManagementStateStore.isLeaseAcquirable(leaseParams)) {
throw new LeaseUnavailableException("Lease already occupied by another execution of this flow");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an info log here with flowGroup, flowName.. it would be useful in debugging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added lease params which contains details of flow name and flow group

}
} catch (IOException exception) {
_log.error(String.format("Failed to query leaseArbiterTable for existing flow details: %s", flowSpec), exception);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we called dagManagementStateStore.isLeaseAcquirable(leaseParams)... who said anything about "leaseArbiterTable"? :)

(anyway, the table's name is dynamically set in config).

instead:

_log.error("unable to check whether lease acquirable " + leaseParams, ex);

(also on the line below)

throw new RuntimeException("Error querying leaseArbiterTable", exception);
}
}
}

public void onDeleteSpec(URI deletedSpecURI, String deletedSpecVersion) {
onDeleteSpec(deletedSpecURI, deletedSpecVersion, new Properties());
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import java.util.Map;
import java.util.Set;

import org.mockito.Mockito;
import org.testng.Assert;
import org.testng.annotations.AfterClass;
import org.testng.annotations.BeforeClass;
Expand Down Expand Up @@ -59,6 +60,7 @@
public class MySqlDagManagementStateStoreTest {

private ITestMetastoreDatabase testDb;
private static MultiActiveLeaseArbiter leaseArbiter;
private MySqlDagManagementStateStore dagManagementStateStore;
private static final String TEST_USER = "testUser";
public static final String TEST_PASSWORD = "testPassword";
Expand All @@ -68,6 +70,7 @@ public class MySqlDagManagementStateStoreTest {
@BeforeClass
public void setUp() throws Exception {
// Setting up mock DB
this.leaseArbiter = mock(MultiActiveLeaseArbiter.class);
this.testDb = TestMetastoreDatabaseFactory.get();
this.dagManagementStateStore = getDummyDMSS(this.testDb);
}
Expand All @@ -92,6 +95,16 @@ public static <T> boolean compareLists(List<T> list1, List<T> list2) {
return true;
}

@Test
public void testcanAcquireLeaseOnEntity() throws Exception{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camel case typo... (but anyway, canAcquireLeaseOnEntity is not the name of the method)

Mockito.when(leaseArbiter.isLeaseAcquirable(Mockito.any(DagActionStore.LeaseParams.class))).thenReturn(true);
String flowName = "testFlow";
String flowGroup = "testGroup";
DagActionStore.DagAction dagAction = new DagActionStore.DagAction(flowName, flowGroup, System.currentTimeMillis(), "testJob", DagActionStore.DagActionType.LAUNCH);
DagActionStore.LeaseParams leaseParams = new DagActionStore.LeaseParams(dagAction);
Assert.assertTrue(dagManagementStateStore.isLeaseAcquirable(leaseParams));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where's the test to exercise false path?

}

@Test
public void testAddDag() throws Exception {
Dag<JobExecutionPlan> dag = DagTestUtils.buildDag("test", 12345L);
Expand Down Expand Up @@ -150,9 +163,11 @@ public static MySqlDagManagementStateStore getDummyDMSS(ITestMetastoreDatabase t
TopologySpec topologySpec = LaunchDagProcTest.buildNaiveTopologySpec(TEST_SPEC_EXECUTOR_URI);
URI specExecURI = new URI(TEST_SPEC_EXECUTOR_URI);
topologySpecMap.put(specExecURI, topologySpec);
MultiActiveLeaseArbiter multiActiveLeaseArbiter = Mockito.mock(MultiActiveLeaseArbiter.class);
leaseArbiter = multiActiveLeaseArbiter;
MySqlDagManagementStateStore dagManagementStateStore =
new MySqlDagManagementStateStore(config, null, null, jobStatusRetriever,
MysqlDagActionStoreTest.getTestDagActionStore(testMetastoreDatabase));
MysqlDagActionStoreTest.getTestDagActionStore(testMetastoreDatabase), multiActiveLeaseArbiter);
dagManagementStateStore.setTopologySpecMap(topologySpecMap);
return dagManagementStateStore;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
public class MysqlMultiActiveLeaseArbiterTest {
private static final long EPSILON = 10000L;
private static final long MORE_THAN_EPSILON = (long) (EPSILON * 1.1);
private static final long LESS_THAN_EPSILON = (long) (EPSILON * 0.90);
// NOTE: `sleep`ing this long SIGNIFICANTLY slows tests, but we need a large enough value that exec. variability won't cause spurious failure
private static final long LINGER = 20000L;
private static final long MORE_THAN_LINGER = (long) (LINGER * 1.1);
Expand All @@ -53,6 +54,8 @@ public class MysqlMultiActiveLeaseArbiterTest {
private static final String CONSTANTS_TABLE = "constants_store";
private static final String flowGroup = "testFlowGroup";
private static final String flowGroup2 = "testFlowGroup2";
private static final String flowGroup3 = "testFlowGroup3";
private static final String flowGroup4 = "testFlowGroup4";
private static final String flowName = "testFlowName";
private static final String jobName = "testJobName";
private static final long flowExecutionId = 12345677L;
Expand All @@ -70,6 +73,14 @@ public class MysqlMultiActiveLeaseArbiterTest {
new DagActionStore.DagAction(flowGroup2, flowName, flowExecutionId, jobName, DagActionStore.DagActionType.LAUNCH);
private static final DagActionStore.LeaseParams
launchLeaseParams2 = new DagActionStore.LeaseParams(launchDagAction2, false, eventTimeMillis);
private static final DagActionStore.DagAction launchDagAction3 =
new DagActionStore.DagAction(flowGroup3, flowName, flowExecutionId, jobName, DagActionStore.DagActionType.LAUNCH);
private static final DagActionStore.LeaseParams
launchLeaseParams3 = new DagActionStore.LeaseParams(launchDagAction3, false, eventTimeMillis);
private static final DagActionStore.DagAction launchDagAction4 =
new DagActionStore.DagAction(flowGroup4, flowName, flowExecutionId, jobName, DagActionStore.DagActionType.LAUNCH);
private static final DagActionStore.LeaseParams
launchLeaseParams4 = new DagActionStore.LeaseParams(launchDagAction4, false, eventTimeMillis);
private static final Timestamp dummyTimestamp = new Timestamp(99999);
private ITestMetastoreDatabase testDb;
private MysqlMultiActiveLeaseArbiter mysqlMultiActiveLeaseArbiter;
Expand Down Expand Up @@ -201,6 +212,33 @@ public void testAcquireLeaseSingleParticipant() throws Exception {
<= sixthObtainedStatus.getLeaseAcquisitionTimestamp());
}

/*
test to verify if leasable entity is unavailable before epsilon time
to account for clock drift
*/
@Test
public void testWhenLeasableEntityUnavailable() throws Exception{
LeaseAttemptStatus firstLaunchStatus =
mysqlMultiActiveLeaseArbiter.tryAcquireLease(launchLeaseParams3, true);
Assert.assertTrue(firstLaunchStatus instanceof LeaseAttemptStatus.LeaseObtainedStatus);
completeLeaseHelper(launchLeaseParams3);
Thread.sleep(LESS_THAN_EPSILON);
Assert.assertFalse(mysqlMultiActiveLeaseArbiter.isLeaseAcquirable(launchLeaseParams3));
Copy link
Contributor

@phet phet Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whole idea is that a "similar" (but NOT same) lease isn't itself already within epsilon. hence, be sure to test LeaseParams that were NOT just given to tryAcquireLease

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created new launch param launchLeaseParams3_similar

}

/*
test to verify if leasable entity exists post epsilon time
*/
@Test
public void testWhenLeasableEntityAvailable() throws Exception{
LeaseAttemptStatus firstLaunchStatus =
mysqlMultiActiveLeaseArbiter.tryAcquireLease(launchLeaseParams4, true);
Assert.assertTrue(firstLaunchStatus instanceof LeaseAttemptStatus.LeaseObtainedStatus);
completeLeaseHelper(launchLeaseParams4);
Thread.sleep(MORE_THAN_EPSILON);
Assert.assertTrue(mysqlMultiActiveLeaseArbiter.isLeaseAcquirable(launchLeaseParams4));
}

/*
Tests attemptLeaseIfNewRow() method to ensure a new row is inserted if no row matches the primary key in the table.
If such a row does exist, the method should disregard the resulting SQL error and return 0 rows updated, indicating
Expand Down
Loading