Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce tx idle timeout in postgres connections #4598

Merged
merged 3 commits into from
Apr 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions utils/misc/dbutils.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
"fmt"
"net/url"
"os"
"time"

"github.com/lib/pq"

Expand All @@ -19,15 +20,20 @@ func GetConnectionString(c *config.Config, componentName string) string {
port := c.GetInt("DB.port", 5432)
password := c.GetString("DB.password", "ubuntu") // Reading secrets from
sslmode := c.GetString("DB.sslMode", "disable")
idleTxTimeout := c.GetDuration("DB.IdleTxTimeout", 5, time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep this timeout limited to 120 Secs as from the graph in the desc max txn time seems to be 60-65 secs only? Keeping high value of timeout may still result in similar issues we are trying to solve(even-though for a smaller time)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timeout affects connections waiting for a client query within an open transaction for longer than the specified amount of time.

So if the client opens a transaction, runs a query, keeps the transaction open, it does something with the results and that "something" takes longer than 120 seconds, then the connection is going to be terminated. This would mean that the client will quite likely have to restart from the beginning.

I know the max in the graph shows as 60-65 but there might be edge cases where it could go above 120 I guess. Hard to be 100% sure without checking all metrics of all time and even then, code changes so you might experience that behaviour anyway at some point. This is the reason why 5 minutes seems safer to me and still reasonable enough not to cause incidents due to connections left alive forever.


// Application Name can be any string of less than NAMEDATALEN characters (64 characters in a standard PostgreSQL build).
// There is no need to truncate the string on our own though since PostgreSQL auto-truncates this identifier and issues a relevant notice if necessary.
appName := DefaultString("rudder-server").OnError(os.Hostname())
if len(componentName) > 0 {
appName = fmt.Sprintf("%s-%s", componentName, appName)
}
return fmt.Sprintf("host=%s port=%d user=%s "+
"password=%s dbname=%s sslmode=%s application_name=%s",
host, port, user, password, dbname, sslmode, appName)
"password=%s dbname=%s sslmode=%s application_name=%s "+
" options='-c idle_in_transaction_session_timeout=%d'",
host, port, user, password, dbname, sslmode, appName,
idleTxTimeout.Milliseconds(),
)
}

// SetAppNameInDBConnURL sets application name in db connection url
Expand Down
70 changes: 70 additions & 0 deletions utils/misc/dbutils_test.go
Original file line number Diff line number Diff line change
@@ -1,8 +1,16 @@
package misc_test

import (
"database/sql"
"fmt"
"testing"
"time"

"github.com/ory/dockertest/v3"
"github.com/stretchr/testify/require"

"github.com/rudderlabs/rudder-go-kit/config"
"github.com/rudderlabs/rudder-go-kit/testhelper/docker/resource/postgres"
"github.com/rudderlabs/rudder-server/utils/misc"
)

Expand Down Expand Up @@ -58,3 +66,65 @@ func TestSetApplicationNameInDBConnectionURL(t *testing.T) {
})
}
}

func TestIdleTxTimeout(t *testing.T) {
pool, err := dockertest.NewPool("")
require.NoError(t, err)
postgresContainer, err := postgres.Setup(pool, t)
require.NoError(t, err)

conf := config.New()
conf.Set("DB.host", postgresContainer.Host)
conf.Set("DB.user", postgresContainer.User)
conf.Set("DB.name", postgresContainer.Database)
conf.Set("DB.port", postgresContainer.Port)
conf.Set("DB.password", postgresContainer.Password)

txTimeout := 2 * time.Millisecond

conf.Set("DB.IdleTxTimeout", txTimeout)

dsn := misc.GetConnectionString(conf, "test")

db, err := sql.Open("postgres", dsn)
require.NoError(t, err)

var sessionTimeout string
err = db.QueryRow("SHOW idle_in_transaction_session_timeout;").Scan(&sessionTimeout)
require.NoError(t, err)
require.Equal(t, txTimeout.String(), sessionTimeout)

t.Run("timeout tx", func(t *testing.T) {
tx, err := db.Begin()
require.NoError(t, err)

var pid int
err = tx.QueryRow(`select pg_backend_pid();`).Scan(&pid)
require.NoError(t, err)

_, err = tx.Exec("select 1")
require.NoError(t, err)
t.Log("sleep double the timeout to close connection")
time.Sleep(2 * txTimeout)

err = tx.Commit()
require.EqualError(t, err, "driver: bad connection")

var count int
err = db.QueryRow(`SELECT count(*) FROM pg_stat_activity WHERE pid = $1`, pid).Scan(&count)
require.NoError(t, err)

require.Zero(t, count)
})

t.Run("successful tx", func(t *testing.T) {
tx, err := db.Begin()
require.NoError(t, err)
_, err = tx.Exec("select 1")
require.NoError(t, err)
_, err = tx.Exec(fmt.Sprintf("select pg_sleep(%f)", txTimeout.Seconds()))
require.NoError(t, err)

require.NoError(t, tx.Commit())
})
}
Loading