Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix global prefix not working with update marker #42

Merged
merged 2 commits into from
Jun 29, 2023

Conversation

ahouene
Copy link
Contributor

@ahouene ahouene commented Jun 20, 2023

When using both the update marker and the global prefix options, the marker would be written without the global prefix, and we'd try to read it with the global prefix.

This PR makes sure the global prefix is always used with the update marker.

Closes #40

@ahouene ahouene requested a review from wojas June 20, 2023 14:22
@ahouene ahouene self-assigned this Jun 20, 2023
@franklouwers
Copy link

franklouwers commented Jun 25, 2023

I've been LS running with this patch since Thursday. The good news: it works. The better news: it reduces cost dramatically, so it works as intended.

CleanShot 2023-06-25 at 10 16 41@2x

Now to the bad news:

there's a memory (and maybe CPU as well) issue.

I have a simple 2 node test setup, with 3 zones with very few updates, so just to test stability.

Tje setup had been running fairly stable for about 10 days without the marker functionality. I then built a custom LS with this patched-simpleblob included, and enabled the marker feature on June 21st. As you can see from the graph below, a pattern started emerging at that time: there's a mem leak which starts to happen at that same time, leaking to very frequent OOMs.

CleanShot 2023-06-25 at 10 18 33

Unfortunately I only have memory graphs from one of the nodes (it's a test setup), but do see the OOMs on the second node as well. The second node also exhibits increased CPU load.

As you can see on the graphs, this behaviour only started when I enabled the marker-functionality, with this patch and it didn't happen before.

CleanShot 2023-06-25 at 10 22 27

The 2 test servers are running:

  • Ubuntu 22.04 LTS
  • dnsdist latest stable release
  • pdns auth latest stable release (with only lmdb backend enabled)
  • Lightningstream master with this patched simpleblob in

@ahouene
Copy link
Contributor Author

ahouene commented Jun 28, 2023

Thank you for the detailed report!

I created #44 to track this memory issue. I think I have found a culprit, that is related to GetObject operations, which matches the metrics you provided.

@franklouwers
Copy link

franklouwers commented Jun 28, 2023

@ahouene Seems to work:

mem, cloud provider D:
CleanShot 2023-06-28 at 14 24 53@2x

cpu, cloud provider V:
CleanShot 2023-06-28 at 14 29 47@2x

Comment on lines 123 to 131
defer func() {
if !b.hasGlobalPrefix() {
return
}
for i, blob := range blobList {
s, _ := b.cutGlobalPrefix(blob.Name)
blobList[i].Name = s
}
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this correctly, by putting all of this in a defer, you are changing the cached values and calling it over and over on existing cached values.

Don't do magic in defer, it will bite you.

@ahouene ahouene force-pushed the prefix-marker-fix branch from 72dcdd8 to 1b35697 Compare June 28, 2023 15:10
Copy link
Member

@wojas wojas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, a lot clearer now!

@ahouene ahouene merged commit a0dd21b into PowerDNS:main Jun 29, 2023
@wojas wojas added this to the v0.2.4 milestone Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Marker functionality not working correctly
3 participants