Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: prepared update query crashes cockroach node #14473

Closed
rjnn opened this issue Mar 30, 2017 · 8 comments · Fixed by #14481
Closed

sql: prepared update query crashes cockroach node #14473

rjnn opened this issue Mar 30, 2017 · 8 comments · Fixed by #14481
Assignees

Comments

@rjnn
Copy link
Contributor

rjnn commented Mar 30, 2017

I'm getting a crash running the OLTPBench TPCC workload. I'm still trying to narrow down which query is causing this, but here's a stack trace of the cockroach node crashing:

panic: makeInternalPlanner called with a transaction without timestamps [recovered]
        panic: makeInternalPlanner called with a transaction without timestamps [recovered]
        panic: makeInternalPlanner called with a transaction without timestamps

goroutine 3523884 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc4201ce0a0)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:199 +0x94
panic(0x5409b40, 0xc42acd0930)
        /usr/local/go/src/runtime/panic.go:489 +0x2cf
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*v3Conn).serve.func1(0xc4236c0c00, 0x8085980, 0xc4244b4630)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/v3.go:348 +0x74
panic(0x5409b40, 0xc42acd0930)
        /usr/local/go/src/runtime/panic.go:489 +0x2cf
github.com/cockroachdb/cockroach/pkg/sql.makeInternalPlanner(0x563de5a, 0xd, 0xc42665c3c0, 0x5631c6f, 0x4, 0xc4200ac3d0, 0x643d240)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/planner.go:114 +0x70c
github.com/cockroachdb/cockroach/pkg/sql.LeaseStore.Acquire(0x5fc70e0, 0xc42014d810, 0xc420428ec0, 0x3ff0000000000000, 0xc420428ec0, 0xc4200ac000, 0x0, 0x0, 0x0, 0xc4200ac3d0, ...)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/lease.go:157 +0x2d8
github.com/cockroachdb/cockroach/pkg/sql.(*tableState).acquireNodeLease(0xc422008aa0, 0x8085980, 0xc42406b320, 0xc42665c3c0, 0xc400000000, 0xc4203e6480, 0x0, 0xc400000000, 0x0, 0x0, ...)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/lease.go:725 +0x1d1
github.com/cockroachdb/cockroach/pkg/sql.(*tableState).acquireFromStoreLocked(0xc422008aa0, 0x8085980, 0xc42406b320, 0xc42665c3c0, 0x0, 0xc4203e6480, 0xc42b99baf0, 0xc424e780a0)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/lease.go:619 +0xa8
github.com/cockroachdb/cockroach/pkg/sql.(*tableState).acquire(0xc422008aa0, 0x8085980, 0xc42406b320, 0xc42665c3c0, 0x0, 0xc4203e6480, 0x0, 0x0, 0x0)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/lease.go:577 +0x12e
github.com/cockroachdb/cockroach/pkg/sql.(*LeaseManager).Acquire(0xc4203e6480, 0x8085980, 0xc42406b320, 0xc42665c3c0, 0x6f, 0xc420948dc0, 0x400c8a3, 0xc42b99bcd0)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/lease.go:1146 +0x86
github.com/cockroachdb/cockroach/pkg/sql.(*planner).getTableLeaseByID(0xc425fea000, 0x8085980, 0xc42406b320, 0xc40000006f, 0xc420948dc0, 0x0, 0x0)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/table.go:379 +0x101
github.com/cockroachdb/cockroach/pkg/sql.(*planner).fillFKTableMap(0xc425fea000, 0x8085980, 0xc42406b320, 0xc423a5ae40, 0x14b0791e6e1b242a, 0x0)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/planner.go:253 +0xb8
github.com/cockroachdb/cockroach/pkg/sql.(*planner).Update(0xc425fea000, 0x8085980, 0xc42406b320, 0xc425a84e00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/update.go:177 +0x515
github.com/cockroachdb/cockroach/pkg/sql.(*planner).prepare(0xc425fea000, 0x8085980, 0xc42406b320, 0x5fdfbe0, 0xc425a84e00, 0x0, 0x0, 0x0, 0x0)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/plan.go:418 +0x559
github.com/cockroachdb/cockroach/pkg/sql.(*Executor).Prepare(0xc4202aa680, 0xc423cf1aff, 0x93, 0xc42085bb00, 0xc423a5aa80, 0x0, 0x0, 0x0)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/executor.go:466 +0x637
github.com/cockroachdb/cockroach/pkg/sql.PreparedStatements.New(0xc42085bb00, 0xc4244b46f0, 0xc4202aa680, 0xc423cf1afe, 0x0, 0xc423cf1aff, 0x93, 0xc423a5aa80, 0xa7, 0x0, ...)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/prepare.go:76 +0x63
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*v3Conn).handleParse(0xc4236c0c00, 0xc4236c0c28, 0xc426b62c00, 0x50)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/v3.go:546 +0x432
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*v3Conn).serve(0xc4236c0c00, 0x8085980, 0xc4244b4630, 0xc426463e70, 0x5400, 0xc420338310, 0x0, 0x0)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/v3.go:434 +0x8f0
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*Server).ServeConn(0xc4203381c0, 0x8661028, 0xc4255bea80, 0x5fe84c0, 0xc42a936820, 0x0, 0x0)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/server.go:415 +0x927
github.com/cockroachdb/cockroach/pkg/server.(*Server).Start.func9.1(0x5fe84c0, 0xc42a936820)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/server/server.go:587 +0x153
github.com/cockroachdb/cockroach/pkg/util/netutil.(*Server).ServeWith.func1(0xc4201ce0a0, 0xc420154058, 0x5fe84c0, 0xc42a936820, 0xc42029c7b0)
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/util/netutil/net.go:136 +0x95
created by github.com/cockroachdb/cockroach/pkg/util/netutil.(*Server).ServeWith
        /Users/arjun/goworkspace/src/github.com/cockroachdb/cockroach/pkg/util/netutil/net.go:138 +0x239

This is a JDBC created PreparedStatement execution, and it's not always reliably reproducible. I think it is dependent on some particular values being set.

@rjnn rjnn changed the title sql: prepared query crashes the cluster sql: prepared query crashes cockroach node Mar 30, 2017
@jordanlewis jordanlewis changed the title sql: prepared query crashes cockroach node sql: prepared update query crashes cockroach node Mar 30, 2017
@tamird
Copy link
Contributor

tamird commented Mar 30, 2017 via email

@rjnn
Copy link
Contributor Author

rjnn commented Mar 30, 2017

@jordanlewis and I spelunked a bit in the code. It's definitely caused by a Prepare Update that's part of a transaction involving many other statements. We can track it down to executor.go:449, which has the following TODO:

Prepare needs a transaction because it needs to retrieve db/table
descriptors for type checking.
TODO(andrei): is this OK? If we're preparing as part of a SQL txn, how do
we check that they're reading descriptors consistent with the txn in which
they'll be used?

It appears that we can now empirically answer that question in the negative.

@benesch
Copy link
Contributor

benesch commented Mar 30, 2017

The TODO is actually unrelated, I think; we can just set dummy timestamps on the transaction and punt on the issue in the TODO.

@jordanlewis
Copy link
Member

Okay, I've tracked down this issue. The reproduction is the following statements, where the final INSERT is prepared first. The panic (sometimes) occurs during preparation of the final statement.

CREATE TABLE t (i INT PRIMARY KEY);
CREATE TABLE u (i INT REFERENCES t(i));
INSERT INTO t (i) VALUES (1);
INSERT INTO u (i) VALUES (1);
UPDATE u SET i = $1;

It's difficult to reproduce because it only occurs when:

  1. The update is to a table that has foreign keys
  2. The update is prepared first
  3. No cached table lease exists for at least one of the foreign keys' tables at the time of the prepare

If those conditions occur, then when the server creates an internal planner to attempt to look up the table descriptor, it panics because the input transaction is missing timestamps.

@tlvenn
Copy link
Contributor

tlvenn commented Apr 3, 2017

This error is also happenning to me lately when I run my test suite depending in which order the tests are run. For a given order, the DB crashes 100% of the time.

@tlvenn
Copy link
Contributor

tlvenn commented Apr 3, 2017

If that helps, this happens to me for the Elixir Sandbox project I created.

You can checkout the no-override branch and simply run mix test --seed 600906 to crash the DB.

@jordanlewis
Copy link
Member

If you like, you can try patch #14481 to see if that helps you before it's merged.

@tlvenn
Copy link
Contributor

tlvenn commented Apr 3, 2017

Just tried, all good now 👍

cockroachdb_sandbox> mix test --seed 600906
Excluding tags: [:pending, :without_conflict_target]

.............

Finished in 1.2 seconds
14 tests, 0 failures, 1 skipped

andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 5, 2017
14473 was a panic when preparing a statement, under certain conditions.

- add a targeted test for cockroachdb#14473
  - The previous commit that added the test was unclear about the
  conditions that it meant to test. It also had too many statements
  without explanation of which one is expected to crash and why. It was
  also affecting other unrelated subtests by introducing an unnecessary
  testing knob. Part of the problem was that it needed to test a very
  particular set of conditions, and it was doing that from the wrong
  level (pgwire vs sql).

- revert a funky testing knob: the knob caused the LeaseStore to always
acquire a lease when queried by id, but not when queried by name. Also
there are other ways to remove leases without the need for this knob.

Here's some details about the exact conditions necessary for the panic in 14473:

- The panic occurs when the LeaseStore attempts to create a fake planner
to execute internal sql statements. If the client.Txn object doesn't
have the OrigTimestamp field set -> panic. Preparing statements didn't
use to set that field.

- In most cases, however, attempting to acquire a table lease is done by
using a table name. The process of resolving the name performs a KV Get,
which means that the OrigTimestamp is initialized by the TxnCoordSender.

- So, in order to crash we need to (also) acquire some leases by id.
This is done, e.g., by edits on tables with FKs (they reference the
parents by ID).

- In conclusion, we need to be in a particular situation where the
prepare doesn't need to resolve any names (all tables referenced by name
have a lease in the cache), but also try to reference by ID a table
that's not in the cache.
andreimatei added a commit to andreimatei/cockroach that referenced this issue Apr 6, 2017
14473 was a panic when preparing a statement, under certain conditions.

- add a targeted test for cockroachdb#14473
  - The previous commit that added the test was unclear about the
  conditions that it meant to test. It also had too many statements
  without explanation of which one is expected to crash and why. It was
  also affecting other unrelated subtests by introducing an unnecessary
  testing knob. Part of the problem was that it needed to test a very
  particular set of conditions, and it was doing that from the wrong
  level (pgwire vs sql).

- revert a funky testing knob: the knob caused the LeaseStore to always
acquire a lease when queried by id, but not when queried by name. Also
there are other ways to remove leases without the need for this knob.

Here's some details about the exact conditions necessary for the panic in 14473:

- The panic occurs when the LeaseStore attempts to create a fake planner
to execute internal sql statements. If the client.Txn object doesn't
have the OrigTimestamp field set -> panic. Preparing statements didn't
use to set that field.

- In most cases, however, attempting to acquire a table lease is done by
using a table name. The process of resolving the name performs a KV Get,
which means that the OrigTimestamp is initialized by the TxnCoordSender.

- So, in order to crash we need to (also) acquire some leases by id.
This is done, e.g., by edits on tables with FKs (they reference the
parents by ID).

- In conclusion, we need to be in a particular situation where the
prepare doesn't need to resolve any names (all tables referenced by name
have a lease in the cache), but also try to reference by ID a table
that's not in the cache.
andreimatei added a commit to andreimatei/cockroach that referenced this issue May 1, 2017
14473 was a panic when preparing a statement, under certain conditions.

- add a targeted test for cockroachdb#14473
  - The previous commit that added the test was unclear about the
  conditions that it meant to test. It also had too many statements
  without explanation of which one is expected to crash and why. It was
  also affecting other unrelated subtests by introducing an unnecessary
  testing knob. Part of the problem was that it needed to test a very
  particular set of conditions, and it was doing that from the wrong
  level (pgwire vs sql).

- revert a funky testing knob: the knob caused the LeaseStore to always
acquire a lease when queried by id, but not when queried by name. Also
there are other ways to remove leases without the need for this knob.

Here's some details about the exact conditions necessary for the panic in 14473:

- The panic occurs when the LeaseStore attempts to create a fake planner
to execute internal sql statements. If the client.Txn object doesn't
have the OrigTimestamp field set -> panic. Preparing statements didn't
use to set that field.

- In most cases, however, attempting to acquire a table lease is done by
using a table name. The process of resolving the name performs a KV Get,
which means that the OrigTimestamp is initialized by the TxnCoordSender.

- So, in order to crash we need to (also) acquire some leases by id.
This is done, e.g., by edits on tables with FKs (they reference the
parents by ID).

- In conclusion, we need to be in a particular situation where the
prepare doesn't need to resolve any names (all tables referenced by name
have a lease in the cache), but also try to reference by ID a table
that's not in the cache.
andreimatei added a commit that referenced this issue May 1, 2017
sql: replace the test for #14473 with a better one
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants