Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#6238] improvement(storage): Improve get role performance when roles is bound to many metadata. #6455

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

FourFriends
Copy link

What changes were proposed in this pull request?

fix issue #6238
improve performance when a single role is bound to many metadata.

Why are the changes needed?

Use batch queries when getting role securable object full names instead of loop queries to get each securable object full name.

Fix: #6238

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests and integration tests have all passed, this feature has been running internally at Xiaomi for two weeks.

@FourFriends
Copy link
Author

In xiaomi company, we find when roles is bounded to many securable objects, then get role is very slow, so we try so solve get role function.

@FourFriends FourFriends changed the title [#6238] Improve performance when roles is bound to many metadata. [#6238] Improve get role performance when roles is bound to many metadata. Feb 14, 2025
}
});

// Since there are many comparisons of RoleEntity in the unit tests,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Since there are many comparisons of RoleEntity in the unit tests,
// Since there are many comparisions of RoleEntity in the unit tests,


public static Map<Long, String> getMetadataObjectFullNames(Long metalakeId, List<Long> ids) {
Map<Long, String> catalogIdAndNameMap = getCatalogIdAndNameMap(metalakeId);
Map<Long, Map<Long, String>> schemaIdAndNameMap = getSchemaIdAndNameMap(metalakeId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to get all the schema of the metalake. We can get the schema id list according to the fileset list.

.forEach(
(type, objects) -> {
// If the type is Fileset, use the batch retrieval interface;
// otherwise, use the single retrieval interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can add TODO to get other securable objects using batch retrieving?

@jerqi jerqi changed the title [#6238] Improve get role performance when roles is bound to many metadata. [#6238] improvement(storage): Improve get role performance when roles is bound to many metadata. Feb 14, 2025
.collect(Collectors.toList());

Map<Long, String> filesetIdAndNameMap =
getMetadataObjectFullNames(po.getMetalakeId(), filesetIds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the name should be getFilesetObjectFullNames, right?

// Since there are many comparisons of RoleEntity in the unit tests,
// and the order after grouping by is different each time,
// the results are sorted by fullName here to ensure consistent query results.
securableObjects.sort(Comparator.comparing(MetadataObject::fullName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a temporary solution to make it work in UTs. It's the UTs that we should change. If there are no major performance issues, it's okay with me.

@@ -35,6 +36,24 @@ public String listSchemaPOsByCatalogId(@Param("catalogId") Long catalogId) {
+ " WHERE catalog_id = #{catalogId} AND deleted_at = 0";
}

public String listSchemaPOsByCatalogIds(@Param("catalogIds") List<Long> catalogIds) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be listSchemaPOsBySchemaIds.


filesetPOs.forEach(
filesetPO -> {
String catalogName = catalogIdAndNameMap.get(filesetPO.getCatalogId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will catalogName or schemaName be null if catalog or schema is deleted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants