Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indexes that will help federated module ingest be faster #1500

Merged
merged 2 commits into from
Feb 26, 2021

Conversation

eiffel777
Copy link
Contributor

These indexes are being added to speed up the fed-ingest pipeline in the federation module.

Both indexes are used by the instance-jobhosts action.

Description

Adding indexes so ingestors in the federated module perform better

Tests performed

Tested in docker and with federated module

Checklist:

  • The pull request description is suitable for a Changelog entry
  • The milestone is set correctly on the pull request
  • The appropriate labels have been added to the pull request

@eiffel777 eiffel777 added enhancement Enhancement of the functionality of an existing feature Category:General General labels Feb 25, 2021
@eiffel777 eiffel777 added this to the 9.5.0 milestone Feb 25, 2021
@eiffel777 eiffel777 self-assigned this Feb 25, 2021
@jpwhite4
Copy link
Member

Did you experiment with adding any compound indexes on origin_id, resource_id, host_id? What queries are using these indexes?

@eiffel777
Copy link
Contributor Author

@jpwhite4

I don't think I tested a compound index. Looking it back over it I could add resource_id to each index. That might make it a little faster. Let me do some testing and see if there is a difference.

Both indexes are used in the same query in the federated module. The etl_action_def.d/ file where the query is defined is here

Here is the join that uses host_origin_id_idx

{
        "schema": "${DESTINATION_SCHEMA}",
        "name": "hosts",
        "alias": "dh",
        "on": "dh.resource_id = drf.id AND sh.id = dh.host_origin_id"
}

The job_id_origin_id_idx is used for this join.

{
        "schema": "${DESTINATION_SCHEMA}",
        "name": "job_tasks",
        "alias": "djt",
        "on": "djt.job_id_origin_id = sjh.job_id AND djt.resource_id = drf.id"
}

@eiffel777
Copy link
Contributor Author

I did some testing with adding the resource_id to the indexes and it doesn't help much. It's only a couple of seconds so it doesn't seem like it would be very helpful to add them.

@eiffel777 eiffel777 merged commit 338c7fb into ubccr:xdmod9.5 Feb 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category:General General enhancement Enhancement of the functionality of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants