Toggle primary read-only when disk capacity hits threshold #59

davissp14 · 2023-02-02T02:26:55Z

This PR ensures the primary is made read-only when disk capacity exceeds 90%.

I've introduced a new PG health check called disk-capacity. This health check has quite a bit of overlap with the existing VM check communicating disk usage, but given the implications and message I feel it should be separated.

How it works

On an interval defined by the health check, we will calculate disk usage and evaluate whether capacity has exceeded our pre-defined threshold of 90% ( should be made configurable in the future ).

When capacity exceeds the pre-defined threshold, we will set a readonly.lock file on the filesystem and establish a connection to each user-defined table ( anything that's not Postgres or Repmgr) and make it readonly. When disk usage falls below the defined threshold, either through file cleanup or volume extension, we will work to turn all tables back to read/write and then clear the readonly.lock file on success.

The readonly.lock file is used for two things:

If the readonly.lock file is present, we know the database(s) have already been made readonly so there's no need to needlessly establish any more connections.
It allows us to coordinate intentions across the codebase.

Limitations

Read-only mode is currently enabled at runtime and will be cleared on restart. That being said, there will be a small window on boot where the primary will accept writes before it is reconfigured back to read-only. If this becomes a problem, we can look into addressing this.

#56

davissp14 · 2023-02-02T03:56:04Z

internal/flycheck/pg.go

+			return "", fmt.Errorf("failed to turn primary readonly: %s", err)
+		}
+
+		return "", fmt.Errorf("%0.1f%% - extend your volume to re-enable writes", usedPercentage)


Not exactly sure why, but i'm getting MISSING here.

[✗] disk-capacity: 93.2%!e(MISSING)xtend your volume to re-enable writes (2.92ms)

This has been fixed with:

DAlperin · 2023-02-02T16:21:46Z

internal/flycheck/pg.go

+// Primary will be made read-only when disk capacity reaches this percentage.
+const diskCapacityPercentageThreshold = 90.0
+


I've been meaning to make health-check stuff configurable but that can be out of scope of this PR

DAlperin · 2023-02-02T16:23:38Z

internal/flycheck/pg.go

 		return connectionCount(ctx, localConn)
 	})

+	if member.Role == flypg.PrimaryRoleName && member.Active {


Do we want to expose this healthcheck on replicas without setting them to readonly?

I feel like it might be useful info

There's a VM check that communicates general capacity that should cover that. It makes me think though that maybe we need a new name for the check.

davissp14 added 2 commits February 1, 2023 20:06

Initial take on toggling readwrite/read-only based on disk capacity

64b034e

Cleanup

6b2060a

davissp14 requested a review from DAlperin February 2, 2023 03:11

davissp14 added 2 commits February 1, 2023 21:35

Few fixes

00b96b2

Cleanup

9a4f04e

davissp14 commented Feb 2, 2023

View reviewed changes

Bug fix that was introducing (MISSING) in failed health check output

32a7daf

DAlperin reviewed Feb 2, 2023

View reviewed changes

Make it clear that readonly mode is enabled

ffe51d0

DAlperin reviewed Feb 2, 2023

View reviewed changes

davissp14 merged commit ce0f3c8 into master Feb 2, 2023

davissp14 deleted the disk-capacity-check branch February 25, 2023 01:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Toggle primary read-only when disk capacity hits threshold #59

Toggle primary read-only when disk capacity hits threshold #59

Uh oh!

davissp14 commented Feb 2, 2023 •

edited

Loading

Uh oh!

davissp14 Feb 2, 2023

Uh oh!

davissp14 Feb 2, 2023

Uh oh!

DAlperin Feb 2, 2023

Uh oh!

DAlperin Feb 2, 2023

Uh oh!

DAlperin Feb 2, 2023

Uh oh!

davissp14 Feb 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// Primary will be made read-only when disk capacity reaches this percentage.
		const diskCapacityPercentageThreshold = 90.0

Toggle primary read-only when disk capacity hits threshold #59

Toggle primary read-only when disk capacity hits threshold #59

Uh oh!

Conversation

davissp14 commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How it works

Limitations

Uh oh!

davissp14 Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

davissp14 Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

DAlperin Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

DAlperin Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

DAlperin Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

davissp14 Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

davissp14 commented Feb 2, 2023 •

edited

Loading