Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize purview search logic #564

Merged
merged 5 commits into from
Aug 15, 2022
Merged

Conversation

enya-yx
Copy link
Collaborator

@enya-yx enya-yx commented Aug 9, 2022

Description

  • Improves 'get_project' and similar methods' query logic to reduce time cost

  • The main reason that causes the slowness is we called purview APIs ( such as 'AtlasClient.get_entity') many times, especially in some recursion. It leads to a lot of HTTP requests so that we need to wait for relatively long time for their responses. I found by calling the API 'AtlasClient.get_entity_lineage' we can get all information we need. Then we just need to build edges and entities including their relationships and attributes based on the returned results which costs less time comparing with waiting for many responses from purview.

  • Test data: (tested from backend)
    registry = PurviewRegistry(azure_purview_name = "feathrazuretest3-purview1")
    entity_id_by_name = registry.get_entity_id("enya_test_registry")
    start = time.time()
    project_pre = registry.get_project_origin(entity_id) // previous 'get_project' API
    print("duration1: ", time.time()-start)
    start = time.time()
    project_curr = registry.get_project(entity_id) // current 'get_project' API
    print("duration2: ", time.time()-start)

    duration1: 33.04559826850891
    duration2: 2.3997888565063477

How was this patch tested?

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to clarify your proposed changes.

Dependencies

  • No. You can skip the rest of this section.

  • Yes. Make sure to list all the dependencies and licenses.

@xiaoyongzhu
Copy link
Member

Thanks @enya0405 . can you update the PR description to reflect:

What was the previous issue that caused it to be slow and what's the improved way?

Also can you put some numbers on the updated logic (like how much time does it take to load the projects)?

@Yuqing-cat Yuqing-cat merged commit 7e44522 into feathr-ai:main Aug 15, 2022
ahlag pushed a commit to ahlag/feathr that referenced this pull request Aug 26, 2022
* optimize purview search logic

Co-authored-by: Enya-Yx <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants