Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix thinSnaps #1105

Merged
merged 4 commits into from
Oct 11, 2022
Merged

Fix thinSnaps #1105

merged 4 commits into from
Oct 11, 2022

Conversation

stephen-soltesz
Copy link
Contributor

@stephen-soltesz stephen-soltesz commented Oct 10, 2022

This change completes fixes the bug reported by @NotSpecial - #1104 with an update to the unit test to check that every FinalSnapshot matches the last snapshot of the raw, thinned snapshots.


This change is Reviewable

@coveralls
Copy link
Collaborator

coveralls commented Oct 10, 2022

Pull Request Test Coverage Report for Build 7420

  • 6 of 6 (100.0%) changed or added relevant lines in 1 file are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.05%) to 67.24%

Files with Coverage Reduction New Missed Lines %
active/active.go 2 90.63%
Totals Coverage Status
Change from base Build 7409: 0.05%
Covered Lines: 3323
Relevant Lines: 4942

💛 - Coveralls

Copy link
Contributor

@cristinaleonr cristinaleonr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 1 approvals obtained

// Guarantee that FinalSnapshot matches the last raw snapshot.
if diff := deep.Equal(row.A.FinalSnapshot, row.Raw.Snapshots[len(row.Raw.Snapshots)-1]); diff != nil {
t.Errorf("TestTCPParser.ParseAndInsert() FinalSnapshot and last snapshot differ: %s", strings.Join(diff, "\n"))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice, I think this is a very good way to test this. I had one additional thought:

Maybe we should check that the number of thinned snapshots is correct (per row, not in summary as before)?

If it helps, the expected number of thinned snapshots is: 2 + math.floor((len(row.Raw.Snapshots) - 2) / 10), i.e. the first and last element are always included, and every tenth other element.

N.B.: I am not 100% sure how thinSnaps behaves when raw.Snapshots is only 1 element long. Will this element be duplicated? It should not happen, but I am not 100% sure.

func thinSnaps(orig []snapshot.Snapshot) []snapshot.Snapshot {
n := len(orig)
if n == 0 {
Copy link

@NotSpecial NotSpecial Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this comment: I am not 100% sure how thinSnaps behaves when the list has only one element.

Theoretically, the for loop should not do anything, as for i=0 and n=1 the condition i < n-1 is false, but maybe it's worth checking?

To be super safe, one could change the initial check I am commenting here to if n<=2. I.e. the first and last element are always included (or the single element, if there is only one); and only if we have more than two elements, we apply the thinning logic.

Copy link
Contributor Author

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an explicit test for the thinSnaps function with edge case conditions. I like your first implementation (handling the last element unconditionally for all lengths). The tests should give us much more confidence that this behavior is correct.

The same test cases (correctly) fail for the original (buggy) implementation of thinSnaps.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @NotSpecial)


parser/tcpinfo_test.go line 176 at r1 (raw file):

Previously, NotSpecial (Alexander Dietmüller) wrote…

Ah, nice, I think this is a very good way to test this. I had one additional thought:

Maybe we should check that the number of thinned snapshots is correct (per row, not in summary as before)?

If it helps, the expected number of thinned snapshots is: 2 + math.floor((len(row.Raw.Snapshots) - 2) / 10), i.e. the first and last element are always included, and every tenth other element.

N.B.: I am not 100% sure how thinSnaps behaves when raw.Snapshots is only 1 element long. Will this element be duplicated? It should not happen, but I am not 100% sure.

I like this idea. Unfortunately, this information (true snapshot length) is not available to the unit test currently. Though this feels like an oversight for data usage also. I can imagine the A record could include some additional summary information like TotalSnaps or SnapCount or SnapshotLength or similar to indicate that the len(raw.Snapshots) is not expected to be equal to the total snapshots due to the thinning... Today there is nothing to indicate this explicitly. But, this changes the scope of this change from a bug fix to a schema change...

Would you accept an open issue for more metadata to be added in the future?

Copy link

@NotSpecial NotSpecial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me, the explicit test is useful!

Copy link
Contributor Author

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @NotSpecial)


parser/tcpinfo_test.go line 176 at r1 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

I like this idea. Unfortunately, this information (true snapshot length) is not available to the unit test currently. Though this feels like an oversight for data usage also. I can imagine the A record could include some additional summary information like TotalSnaps or SnapCount or SnapshotLength or similar to indicate that the len(raw.Snapshots) is not expected to be equal to the total snapshots due to the thinning... Today there is nothing to indicate this explicitly. But, this changes the scope of this change from a bug fix to a schema change...

Would you accept an open issue for more metadata to be added in the future?

Created #1106

@stephen-soltesz stephen-soltesz merged commit 1700f84 into main Oct 11, 2022
@stephen-soltesz stephen-soltesz deleted the sandbox-soltesz-fix-thinsnaps branch October 11, 2022 16:13
@stephen-soltesz
Copy link
Contributor Author

The parser updated with this fix was deployed to staging around 2022-10-18 20:20:00 - parse times after that have FinalSnapshot and last Snapshots with matching timestamps (notmatching == 0).

SELECT
  date,
  COUNT(*) as total,
  COUNTIF(a.FinalSnapshot.Timestamp != raw.Snapshots[SAFE_ORDINAL(ARRAY_LENGTH(raw.Snapshots))].Timestamp) as notmatching
FROM mlab-staging.ndt.tcpinfo
WHERE
  date between "2021-04-02" AND "2021-05-10"
  and parser.Time > TIMESTAMP("2022-10-18 00:00:00")
GROUP BY date
ORDER BY date 

After deployment to production the historical reprocessing will take about 16 days to cover all dates. Daily data will be update daily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants