Skip to content

Add Test Coverage for Multiplexed Session Goroutine Leak#13529

Open
MASA-JAPAN wants to merge 1 commit intogoogleapis:mainfrom
MASA-JAPAN:test/spanner-multiplexed-session-leak
Open

Add Test Coverage for Multiplexed Session Goroutine Leak#13529
MASA-JAPAN wants to merge 1 commit intogoogleapis:mainfrom
MASA-JAPAN:test/spanner-multiplexed-session-leak

Conversation

@MASA-JAPAN
Copy link
Copy Markdown
Contributor

Summary

Add regression tests for issue #13396 to verify that the multiplexedSessionReq channel is properly closed when the session pool closes, preventing the createMultiplexedSession goroutine from leaking.

Background

The Bug (Issue #13396)

When a Spanner client with multiplexed sessions enabled is closed, a background goroutine could leak if the channel it was waiting on was not properly closed.

Root Cause:

  • The createMultiplexedSession() goroutine runs in a loop waiting for requests on the multiplexedSessionReq channel
  • When sessionPool.close() was called, this channel was not being closed
  • The goroutine remained blocked forever, waiting for messages that would never come
  • This resulted in a goroutine leak

The Fix:
A single line was added to sessionPool.close():

close(p.multiplexedSessionReq)

This allows the for range loop to exit and the goroutine to terminate cleanly.

Why Tests Are Needed

While the fix was implemented in commit 1805e89, no tests were added at that time. Without test coverage, this bug could silently reappear if:

  • Code is refactored
  • The cleanup logic is modified
  • New features are added that change the session pool lifecycle

Changes

Files Modified

  • spanner/session_test.go - Added two new test functions

Tests Added

1. TestSessionPool_CreateMultiplexedSession_NoGoroutineLeak

Purpose: Verifies that the multiplexed session goroutine is properly cleaned up when the session pool closes.

How it works:

  1. Creates a client with multiplexed sessions enabled (starts the background goroutine)
  2. Sends a request to the multiplexedSessionReq channel to verify it's open and working
  3. Closes the client/pool
  4. Verifies that the channel is properly closed by attempting to receive from it
  5. If the channel is closed (ok == false), the test passes
  6. If the channel is still open or times out, the test fails (indicating a leak)

What it catches:

  • Goroutine leaks caused by the channel not being closed
  • The createMultiplexedSession goroutine blocking forever
  • Memory leaks from leaked goroutines in production applications

2. TestSessionPool_MultiplexedSessionReqChannelClosed

Purpose: A complementary test that focuses specifically on channel closure mechanics.

How it works:

  1. Creates a client with multiplexed sessions enabled
  2. Verifies the channel is open by sending a request
  3. Closes the pool
  4. Verifies the channel is closed using receive semantics

Why have both tests:

Testing

Running the Tests

# Run just the new tests
go test -race -v -timeout 15m -short -run "TestSessionPool_CreateMultiplexedSession_NoGoroutineLeak|TestSessionPool_MultiplexedSessionReqChannelClosed"

# Run the full test suite
go test -race -v -timeout 15m -short ./...

Test Results

=== RUN   TestSessionPool_CreateMultiplexedSession_NoGoroutineLeak
--- PASS: TestSessionPool_CreateMultiplexedSession_NoGoroutineLeak (0.17s)
=== RUN   TestSessionPool_MultiplexedSessionReqChannelClosed
--- PASS: TestSessionPool_MultiplexedSessionReqChannelClosed (0.07s)
PASS
ok      cloud.google.com/go/spanner    1.934s

Design Decisions

Why Test Channel Closure Instead of Goroutine Count?

Initially, I attempted to verify the fix by counting goroutines before and after closing the pool:

initialGoroutines := runtime.NumGoroutine()
// ... create and close client ...
finalGoroutines := runtime.NumGoroutine()
// Check if goroutines returned to baseline

Problem: This approach was unreliable in test environments due to:

  • Background goroutines from the test framework
  • Garbage collector goroutines
  • Other tests running in parallel (t.Parallel())
  • Non-deterministic goroutine lifecycle timing

Solution: Test the direct cause (channel closure) rather than the symptom (goroutine count):

  • When a channel is closed, for range loops over it exit immediately
  • Testing channel closure is deterministic and reliable
  • The test directly validates the fix that was implemented

Why Use select with Timeout?

select {
case _, ok := <-pool.multiplexedSessionReq:
    if ok {
        t.Fatal("channel is still open")
    }
case <-time.After(100 * time.Millisecond):
    t.Fatal("channel is not closed")
}

This pattern provides:

  • Immediate feedback if the channel is closed (no blocking)
  • Timeout protection if the channel is still open but empty (detects the bug)
  • Clear failure messages for debugging

Impact

What This Prevents

Without these tests:

// Production code with the bug:
func handleRequest() {
    client := createSpannerClient()  // Starts goroutine
    defer client.Close()              // Goroutine leaks!
    // ... use client ...
}

// After 1,000 requests:
// - 1,000 leaked goroutines
// - Increasing memory usage
// - Eventually: Out of memory crash

With these tests:

  • Any code change that breaks channel closure will fail CI
  • Developers are immediately alerted to goroutine leaks
  • Production applications stay healthy

Regression Prevention

These tests serve as executable documentation that:

  1. Explain what the bug was
  2. Demonstrate how the fix works
  3. Prevent the bug from being reintroduced
  4. Help future contributors understand multiplexed session lifecycle

Related Issues

Checklist

  • Tests pass locally
  • Tests follow existing patterns in session_test.go
  • Code follows conventional commit message format
  • Added comprehensive documentation in test comments
  • Verified tests fail without the fix (by temporarily reverting the close() call)
  • Verified tests pass with the fix

Additional Notes

This is my first contribution to this repository. I chose to add test coverage for a recent bug fix as a way to:

  1. Learn the codebase
  2. Provide value through regression prevention
  3. Practice Go testing best practices

I'm open to any feedback or suggestions for improvement!


Testing Strategy Reference:

  • Uses t.Parallel() for concurrent test execution
  • Uses setupMockedTestServerWithConfig() helper for consistent test setup
  • Follows existing patterns in the codebase for session pool testing
  • Includes detailed comments explaining the test logic and rationale

@MASA-JAPAN MASA-JAPAN requested review from a team January 2, 2026 03:19
@MASA-JAPAN MASA-JAPAN requested a review from a team as a code owner January 2, 2026 03:19
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @MASA-JAPAN, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces critical regression tests to address a previously fixed goroutine leak within the Spanner client's multiplexed session management. The tests validate that the multiplexedSessionReq channel is correctly closed when the session pool is terminated, thereby ensuring that the associated background goroutine exits gracefully and prevents resource leaks. This enhancement fortifies the codebase against future regressions of this specific issue.

Highlights

  • Regression Test for Goroutine Leak: Adds comprehensive regression tests for issue fix(spanner): fix createMultiplexedSession goroutine leak #13396, which involved a goroutine leak in the Spanner client's multiplexed session pool. These tests ensure the createMultiplexedSession goroutine terminates cleanly when the session pool is closed.
  • Channel Closure Verification: The new tests specifically verify that the multiplexedSessionReq channel is properly closed when the session pool is shut down. This directly addresses the root cause of the previous leak, where the goroutine remained blocked waiting on an unclosed channel.
  • Two New Test Functions: Introduces two distinct test functions: TestSessionPool_CreateMultiplexedSession_NoGoroutineLeak and TestSessionPool_MultiplexedSessionReqChannelClosed. The first provides a holistic check of goroutine cleanup, while the second offers a focused validation of channel closure mechanics.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds valuable regression tests for a previously fixed goroutine leak. The tests correctly focus on verifying channel closure, which is a robust way to confirm the fix. The test descriptions and PR summary are exceptionally clear and well-documented. My review includes a couple of suggestions to improve the new tests: one to remove potentially unnecessary time.Sleep calls for a more robust and faster test, and another to refactor the second test to remove confusing and ineffective defer/recover logic. Overall, this is a great contribution that improves the project's test coverage.

Comment thread spanner/session_test.go
Comment on lines +2263 to +2269
time.Sleep(50 * time.Millisecond)

// Close the pool
client.Close()

// Give some time for cleanup
time.Sleep(100 * time.Millisecond)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These time.Sleep calls can likely be removed to make the test faster and more robust.

  1. The sleep on line 2263 (time.Sleep(50 * time.Millisecond)) seems unnecessary. The preceding select block already confirms that a message can be sent to the channel. The test doesn't depend on the request being fully processed, only that the channel was open, so waiting here isn't required.
  2. The sleep on line 2269 (time.Sleep(100 * time.Millisecond)) is redundant. The final select block (lines 2273-2281) already waits for up to 100ms for the channel to be closed. By having a sleep and a timeout, the test waits longer than necessary. You can rely solely on the select with its timeout to handle any small delay in closing the channel after client.Close() returns.

Removing these sleeps would make the test more efficient without compromising its correctness.

Comment thread spanner/session_test.go
Comment on lines +2320 to +2338
func() {
defer func() {
if r := recover(); r != nil {
t.Fatal("Sending to multiplexedSessionReq panicked, which means channel might not be closed properly or send was attempted incorrectly")
}
}()

// Try to receive from the closed channel
// A receive from a closed channel returns immediately with zero value
select {
case _, ok := <-pool.multiplexedSessionReq:
if ok {
t.Fatal("multiplexedSessionReq channel is still open after pool close")
}
// Channel is properly closed
case <-time.After(100 * time.Millisecond):
t.Fatal("multiplexedSessionReq channel is not closed after pool close")
}
}()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This anonymous function with defer/recover is unnecessary and its logic doesn't align with the implementation.

  • The recover is set up to handle a panic, which typically occurs when sending to a closed channel.
  • However, the code inside the select block receives from the channel. Receiving from a channel (whether open or closed) never panics, so the recover block is unreachable.
  • The select block on its own is a correct and sufficient way to verify if the channel is closed. The surrounding func() wrapper and defer/recover logic can be removed for clarity and correctness. This would also make the comments on lines 2318-2319, which mention panics, obsolete.

I suggest replacing this entire block with just the core select logic.

	// Verify the channel is closed by attempting to receive.
	// A receive from a closed channel returns immediately with a zero value and ok=false.
	select {
	case _, ok := <-pool.multiplexedSessionReq:
		if ok {
			t.Fatal("multiplexedSessionReq channel is still open after pool close")
		}
		// Channel is properly closed.
	case <-time.After(100 * time.Millisecond):
		t.Fatal("multiplexedSessionReq channel is not closed after pool close")
	}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant