Skip to content

Conversation

@codope
Copy link
Member

@codope codope commented Sep 17, 2024

Change Logs

  • Support dropping index through SQL
  • Add a test to drop sec index and func index (RLI is already covered)
  • Some renames and refactoring

Impact

Users can now drop index using SQL

Risk level (write none, low medium or high below)

low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Sep 17, 2024
@codope codope force-pushed the drop-index-hudi-7563 branch from 5fc8434 to 0ea02b9 Compare September 18, 2024 17:47
@nsivabalan nsivabalan self-assigned this Sep 20, 2024
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left few minor comments

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few minor nits

@codope codope force-pushed the drop-index-hudi-7563 branch from 0ea02b9 to 235756d Compare September 21, 2024 04:05
Comment on lines +117 to +129
if (!indexExists(metaClient, indexName)) {
if (ignoreIfNotExists) {
return;
} else {
throw new HoodieFunctionalIndexException("Index does not exist: " + indexName);
}
}

LOG.info("Dropping index {}", indexName);
HoodieIndexDefinition indexDefinition = metaClient.getIndexMetadata().get().getIndexDefinitions().get(indexName);
try (SparkRDDWriteClient writeClient = HoodieCLIUtils.createHoodieWriteClient(
sparkSession, metaClient.getBasePath().toString(), mapAsScalaImmutableMap(buildWriteConfig(metaClient, indexDefinition)), toScalaOption(Option.empty()))) {
writeClient.dropIndex(Collections.singletonList(indexName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, this logic does not have much specifics to engine itself, so it can be abstracted to the index client by plugging in the engine-specific write client.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need the engine-specific write client to call the base API BaseHoodieWriteClient.dropIndex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is BaseHoodieWriteClient to abstract the write logic. Engine-specific implementation only needs to instantiate the engine-specific write client. The index client should only care about calling APIs in BaseHoodieWriteClient, which is the case here (need to generalize HoodieCLIUtils.createHoodieWriteClient too).

HoodieSparkIndexClient.getInstance(sparkSession).drop(metaClient, indexName, ignoreIfNotExists)
} catch {
case _: IllegalArgumentException =>
SecondaryIndexManager.getInstance().drop(metaClient, indexName, ignoreIfNotExists)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why drop here again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is legacy code due to incomplete RFC-52. SecondaryIndexManager was introduced in RFC-52 but it's just a wrapper code and does not really manage any index underneath. From RFC, it's supposed to support index built on third party libraries such as Lucene. But, we have not yet added any support so far. In my opinion, we should remove all that code. Just keeping it here to make some tests pass. If you agree, I can take the cleanup as a followup later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Let's have JIRA to track that.

@codope codope force-pushed the drop-index-hudi-7563 branch from 235756d to e8a0631 Compare September 21, 2024 06:18
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

HoodieSparkIndexClient.getInstance(sparkSession).drop(metaClient, indexName, ignoreIfNotExists)
} catch {
case _: IllegalArgumentException =>
SecondaryIndexManager.getInstance().drop(metaClient, indexName, ignoreIfNotExists)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Let's have JIRA to track that.

@yihua yihua dismissed nsivabalan’s stale review September 24, 2024 00:35

Comment is addressed.

@yihua yihua merged commit d5a314e into apache:master Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants