Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New stack count is incorrect #746

Open
PhyxionNL opened this issue Oct 16, 2020 · 28 comments
Open

New stack count is incorrect #746

PhyxionNL opened this issue Oct 16, 2020 · 28 comments
Labels
awaiting reply Requires additional information bug

Comments

@PhyxionNL
Copy link
Contributor

"NEW STACKS 1" is shown, but the error/new page shows 2 stacks (2 regressed). How is this calculated?

@niemyjski niemyjski added the bug label Oct 16, 2020
@niemyjski
Copy link
Member

It happens via a terms aggregation in the event controller https://github.com/exceptionless/Exceptionless/blob/master/src/Exceptionless.Web/Controllers/EventController.cs#L289-L308 If you turn up the logging to Verbose level and make an api request you can see all the queries being executed that you can run right against elastic. We have a migration that runs for deduping stacks which may need to be run with the migration job? Do you have an easy way for us to reproduce this?

@PhyxionNL
Copy link
Contributor Author

I'll give it try next week or so. Now that some more data has arrived, it says NEW STACKS 2, but the error/new page shows 4 stacks. It seems to show the same stacks as the frequent pages. Maybe there's a difference between how those two are determined? In Firefox I can see stack_frequent and stack_new correctly requested but they both seem to return the same data. The first_occurrence/last_occurrence in Elastic is correctly set for the stacks (I already verified this).

@niemyjski
Copy link
Member

It shouldn't be, it should be doing an aggregation across events. I guess you could see this if a stack is missing or has been soft deleted (we account for soft deletes). Can you run the cleanup job if you are running your jobs out of process.

@niemyjski
Copy link
Member

Were you able to narrow this down any further?

@niemyjski niemyjski added the awaiting reply Requires additional information label Oct 18, 2020
@PhyxionNL
Copy link
Contributor Author

I'll see if I can squeeze some testing in today, otherwise it'll be later this week. It seems to me like stack_frequent and stack_new are returning exactly the same data.

@PhyxionNL
Copy link
Contributor Author

So I ran the cleanup and I now have 15 stacks and 10 new stacks (instead of 15 and 11), but new stacks page still returns 15 stacks. The data that it's returning is exactly the same as the frequent page; the only difference is the order in which the data is returned.

What is considered a "new" stack? I have some stacks there that are 4 months old, surely that's no longer new?

@niemyjski
Copy link
Member

The stack was created during the time filter. If you are looking at all time I think they would be all new?

@PhyxionNL
Copy link
Contributor Author

The stack was created during the time filter

What do you mean with this? I always have "All Time" selected, but I thought maybe there's a difference with the how new is counted above the graph vs what's actually retrieved. All the type/error/new pages do for me is reorder the stacks based on First column.

@niemyjski
Copy link
Member

Like if you just created an instance and you have 4 months of data. If you are viewing all 4 months of data in the dashboard then all would be new. However if you were looking at say 1 week with a filter, new would be only those stacks created in the last week.

@PhyxionNL
Copy link
Contributor Author

Ah, alright, but how would those numbers ever be different from normal/frequent stacks then? If I ignore a stack, or fix it, they do not show up there either.

@niemyjski
Copy link
Member

They probably won't. It's running the same queries for all dashboards. We may need to tweak each dashboard

@PhyxionNL
Copy link
Contributor Author

Then I'm still wondering why there's a discrepancy in the numbers.

{
"aggregations": {
"cardinality_stack": {
"value": 20.0,
"data": {
"@type": "value"
}
},
"terms_first": {
"items": [
{
"key": 1,
"key_as_string": "true",
"total": 14,
"data": {
"@type": "object"
}
}
],
(very long JSON here)

I assume these are the numbers are used on the dashboard? They're 20 normal / 14 new for me. I can't really find where these numbers are calculated but it seems to me that that's not calculated properly (the normal ones are).

@niemyjski
Copy link
Member

Sorry for the late reply, are you still seeing this in the latest releases? Yes, these come from an elasticsearch terms aggregation that is passed to the controller via the aggs query string. These are calculated in real time against elastic. We've been doing tons of work and we are currently working on a stack filtering issue as well.

@PhyxionNL
Copy link
Contributor Author

Yes, I still have the same problem. For one project, I have 6 stacks and 2 new stacks in the header, but new stacks shows all six. It happens with all projects. Some also show STACKS: X, NEW STACKS: 0, but then NEW STACKS page shows all X.

@niemyjski
Copy link
Member

Is there any chance you could attach or send us the 6 stacks in question and then let us know how many events match those stacks and just some quick meta about those events. As well as the exact query you are seeing sent to the server for these. You can remove any sensitive information and attach here or send them to me on discord that would be a huge help (or if you could try creating some unit tests and submit a pr based on what you are seeing that would be amazing).

@PhyxionNL
Copy link
Contributor Author

How do I go about copying the useful information/stacks? I can copy the responses of /events?filter=project:asdf+type:error+(status:open+OR+status:regressed)&limit=15&mode=stack_frequent&offset=60m and also the /count responses although I'm not sure how useful that count info is. The /events requests both return the same data but in a different order.

@niemyjski
Copy link
Member

niemyjski commented Nov 24, 2020

The /api/v2/events and /api/v2/stacks` end points will work for getting the data as we don't have view models for those end points. Just make sure the mode query string is not set.

@PhyxionNL
Copy link
Contributor Author

?

@niemyjski
Copy link
Member

I updated my previous comment for more clarity.

@niemyjski
Copy link
Member

@PhyxionNL I'm really sorry it's taken so long to dig into this and get it resolved, we didn't forget about this. I'm in the process of creating some tests around the data you submitted just to ensure there are no issues as there could be a bug with how we are resolving new stacks on the dashboard.

We changed this up when we moved to a status filter. We used to go off of the stack.first_occurrence and now we do a aggregation on events and then load stacks resolved by that agg query into the same stack summary model. I'm writing a test to ensure we are looking at event.is_first_occurrence and if that event is deleted we look still ensure we are looking at stacks created within a time period.

@PhyxionNL
Copy link
Contributor Author

@niemyjski No problem, did these tests and my zip help track down the problem?

@niemyjski
Copy link
Member

We are still digging into it but yes your data did help, feel free to remove the attachment above. It's returning more stacks then it should and we're going to dig into it.

ejsmith added a commit that referenced this issue Apr 15, 2021
* #746 - Add more test coverage for resolving new stack counts

* Fixed some bugs with test data builder

* Update deps. Some k8s updates.

* Progress on new stack filter issue.

* More progress

* Don't apply retention filter to stack id filter when inverted

* Got the main test passing. Not sure if the other failures are correct or not.

* Fixed one failing unit test

* Some minor changes

* Disable AD Windows build warnings #493

* Added ability to generate many events using TotalOccurrences

* Refactored how additional events are created.

* Added new test for posting null session identity name

* WIP - Event Stack Filter Tests

* Update Deps

* Fixed some build messages

* Fix issue with message bus broker async fire and forget.

* Working on stack inverting issues

* Update ES docker to 7.12

* Progress in stack filter refactor

* Fix a couple tests

* Fixing more tests

* Fix remaining tests. Update repos.

* Remove repos and parser projects

* Update deps / respond to feedback

Co-authored-by: Blake Niemyjski <[email protected]>
@niemyjski
Copy link
Member

@PhyxionNL Can you please confirm this is fixed in 7.1. If not, please let me know and we'll reopen this. Sorry for the delay.

@PhyxionNL
Copy link
Contributor Author

Can be reopened, it's still not fixed. With 7.1.1 I have new stacks showing "40" here while there are "59" in total.

@PhyxionNL
Copy link
Contributor Author

Also, on frequent pages I no longer get pagination buttons with All Time as it only loads in a super small amount. If I select "Last 30 Days" I get a lot more content (and pagination) than with All Time. Which doesn't seem logical to me.

@niemyjski niemyjski reopened this Sep 16, 2021
@PhyxionNL
Copy link
Contributor Author

@niemyjski Any feedback on the All Time bug above? This is not working properly at all since 7.1. It's such a strange bug, but I've noticed that if search for * -status:fixed -status:ignored then everything shows up as it should be.

@niemyjski
Copy link
Member

niemyjski commented Oct 13, 2021

I haven't had time to look into this and have not experienced this myself in any of our accounts. Is there any chance, you could join our discord and we can talk about this further?

@PhyxionNL
Copy link
Contributor Author

I haven't had time to look into this and have not experienced this myself in any of our accounts. Is there any chance, you could join our discord and we can talk about this further?

Yes, will do :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting reply Requires additional information bug
Development

No branches or pull requests

2 participants