Skip to content

[APM] Service groups - dynamic query refresh#140406

Merged
ogupte merged 9 commits intoelastic:mainfrom
gbamparop:service-groups-dynamic-refresh
Sep 20, 2022
Merged

[APM] Service groups - dynamic query refresh#140406
ogupte merged 9 commits intoelastic:mainfrom
gbamparop:service-groups-dynamic-refresh

Conversation

@gbamparop
Copy link
Copy Markdown
Contributor

@gbamparop gbamparop commented Sep 9, 2022

  • Create migration for service group saved objects, removing service names
  • Dynamic refresh of services in service groups
  • Compute the service count for a group at runtime (hardcoded to last 24 hours)
  • Users will be restricted to filtering by dimensions available in metric documents
  • Dimensions: agent.name, service.name, service.language.name, service.environment, labels.
  • Update x-pack/plugins/apm/server/routes/services/get_services/get_sorted_and_filtered_services.ts to work with dynamic refresh
  • Add tests for saved object migrations

This is a draft PR and all the changes should be revisited. observability:apmServiceGroupMaxNumberOfServices is sometimes used outside of the context of service groups while in other queries max service count is hardcoded. We could rename the setting and use it in all service queries.

Closes #133112

service-groups-dynamic

@gbamparop gbamparop added the Team:APM - DEPRECATED Use Team:obs-ux-infra_services. label Sep 9, 2022
@ogupte ogupte force-pushed the service-groups-dynamic-refresh branch from c55a431 to e3cbff6 Compare September 19, 2022 13:30
@ogupte ogupte self-assigned this Sep 19, 2022
@ogupte ogupte force-pushed the service-groups-dynamic-refresh branch from eef2aa5 to c95ec06 Compare September 19, 2022 20:40
@ogupte ogupte added v8.5.0 release_note:skip Skip the PR/issue when compiling release notes labels Sep 19, 2022
@ogupte ogupte marked this pull request as ready for review September 19, 2022 20:41
@ogupte ogupte requested a review from a team September 19, 2022 20:41
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/apm-ui (Team:APM)

Copy link
Copy Markdown
Contributor

@cauemarcondes cauemarcondes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the server side

@ogupte ogupte enabled auto-merge (squash) September 20, 2022 02:49
@kibana-ci
Copy link
Copy Markdown

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
apm 3.0MB 3.0MB +334.0B

Saved Objects .kibana field count

Every field in each saved object type adds overhead to Elasticsearch. Kibana needs to keep the total field count below Elasticsearch's default limit of 1000 fields. Only specify field mappings for the fields you wish to search on or query. See https://www.elastic.co/guide/en/kibana/master/saved-objects-service.html#_mappings

id before after diff
apm-service-group 6 5 -1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ogupte

Comment on lines +156 to +160
const {
field: {
spec: { name: fieldName },
},
} = querySuggestion;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is 100% personal preference but my brain always struggle a bit with nested destructurings. Feel free to leave as-is if you prefer that.

Suggested change
const {
field: {
spec: { name: fieldName },
},
} = querySuggestion;
const fieldName = querySuggestion.field.spec.name;

Comment on lines +42 to +49
const { start, end } = useMemo(
() =>
getDateRange({
rangeFrom: 'now-24h',
rangeTo: 'now',
}),
[]
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is on the service group overview page (aka the page listing all the service groups), right?

@gbamparop and I talked about that we didn't currently have a timepicker, and we didn't see any need to add any. It also means that the lookback time-period for calculating the service counts is hardcoded to 24 hours.

I think this is a fine compromise between simplicity and functionality. But we should remember to flag this to product (and possible docs to document it).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this is on the service group overview page.

async ({
id,
kuery,
}): Promise<{ id: string; servicesCount: number }> => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this type annotation necessary ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that it isn't, we can remove it

}: {
setup: Setup;
kuery: string;
maxNumberOfServices: number;
Copy link
Copy Markdown
Contributor

@sorenlouv sorenlouv Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is maxNumberOfServices unused here?

Copy link
Copy Markdown
Contributor Author

@gbamparop gbamparop Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unused, we can add a size (service inventory and service map already have max size specified).

const MODAL_HEADER_HEIGHT = 122;
const MODAL_FOOTER_HEIGHT = 80;

const suggestedFieldsWhitelist = [
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use allowlist here?

t.partial({
serviceGroupId: t.string,
}),
t.undefined,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually want t.union([ t.type(), t.partial() ]) here?

...(serviceGroup?.serviceNames ?? []),
];
const serviceNames = compact([serviceName]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this is an empty array? Does it return a service map for all services or no service map at all?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns a service map for all services. I think that was there before to handle service-specific service maps

end,
maxNumberOfServices,
serviceGroup,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we pass both serviceNames and serviceGroup?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Service name is used when we are in the context of a service. We could rename it to serviceName. Service group is passed to apply a kuery filter for the group

color: string;
}

const migrateApmServiceGroups850: SavedObjectMigrationFn<
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a convention to put the version in the end like that? Perhaps we can make it a little clearer:

Suggested change
const migrateApmServiceGroups850: SavedObjectMigrationFn<
const migrateApmServiceGroups_v8_5_0: SavedObjectMigrationFn<

Copy link
Copy Markdown
Contributor Author

@gbamparop gbamparop Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is a convention but It's aligned with https://docs.elastic.dev/kibana-dev-docs/tutorials/saved-objects#writing-migrations

}

const migrateApmServiceGroups850: SavedObjectMigrationFn<
ApmServiceGroupsPre850,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ApmServiceGroupsPre850,
ApmServiceGroupsPre_v8_5_0,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'll hit v85.0 anytime soon but by using separators we at least don't run the risk of confusion in the future :D

@ogupte ogupte merged commit 256f2e7 into elastic:main Sep 20, 2022
@kibanamachine kibanamachine added the backport:skip This PR does not require backporting label Sep 20, 2022
}),
serviceGroup
? getServiceNamesFromServiceGroup(serviceGroup)
? getServiceNamesFromServiceGroup({
Copy link
Copy Markdown
Contributor

@dgieselaar dgieselaar Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't useful anymore. The intent of this function is to quickly get a list of sorted and filtered services, but as this now aggregates over all the data, including raw transactions and spans, it is likely to be slower than the request that gets the actual statistics. We should remove it, I think.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, we wouldn't get the benefit of terms enum. Removing it though could result in preloaded service names being removed once we get the "final" response right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably shouldn't preload at all if a service group is selected. So, simply returning an empty array here might suffice. But I don't know how it specifically works. terms_enum doesn't make sense if a service group is selected, we shouldn't use it at all in that case (I think that's already happening today).

id,
kuery,
}): Promise<{ id: string; servicesCount: number }> => {
const servicesCount = await getServicesCounts({
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you compared performance of a single request w/ filter aggregations vs multiple requests?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I haven't checked filter aggregations, @ogupte have you? If not, we could have a look

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we should probably fix this since it can get out of hand with many service groups.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some findings on this comparison in a related issue (#141242): #141242 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:APM - DEPRECATED Use Team:obs-ux-infra_services. v8.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[APM] Service Groups Beta: Dynamic Refresh

8 participants