Algod: Modify `EvalTracer` design and improve testing for failures by jasonpaulos · Pull Request #5071 · algorand/go-algorand

jasonpaulos · 2023-01-28T01:09:55Z

Summary

Follow up to #4438, which introduced the new EvalTracer interface for tracing transaction and program evaluation.

This PR improves how evaluation failures are reported to EvalTracers. Prior to this, if a failure occurred in a deeply nested inner transaction, evaluation would immediately stop, and the AfterOpcode and AfterProgram tracer methods would get invoked with the error, starting with the deeply nested inner transaction, and unwinding up the layers to the top-level transaction which encompasses the error. The issue is that the AfterTxn and AfterTxnGroup methods would not get invoked at all during failures, so an EvalTracer would have no idea that the context ever changed from the inner transaction which spawned the failure.

This PR reconciles the situation by making it so that AfterTxn and AfterTxnGroup also get called when an error happens in an inner transaction context. Now EvalTracers will see the correct context changes when evaluation halts because of an error.

(You might be wondering why we want EvalTracers to get called after an error happens at all, and the answer to this is so that they can gather detailed information about the failure, such as partially-completed ApplyDeltas).

Test Plan

New tests are added which cover transaction and program failures. Specifically, the mocktracer package now contains test scenarios which cover many possible failures in an app call and its inner transactions.

…failures

jasonpaulos · 2023-01-28T01:26:25Z

+		ad.EvalDelta = transactions.EvalDelta{}
 		return errors.New("Approval program failed")
 	}
 	ad.EvalDelta = cx.txn.EvalDelta


The real ledger does not set EvalDeltas if an error happens, however the test ledger implicitly does right now. This is because ad is pointer to cx.txn.ApplyData, so this line is a noop.

I added some lines above this to explicitly clear the AD's EvalDelta if a failure happens, and that's good enough to get my tests to pass.

maybe add this comment inline

jasonpaulos · 2023-01-28T01:29:38Z

 // along with go-algorand.  If not, see <https://www.gnu.org/licenses/>.

-package logic
+package logic_test


By changing to the logic_test package, we can now import the mocktracer package without circular import issues!

codecov · 2023-01-28T01:35:15Z

Codecov Report

Merging #5071 (0b50423) into master (a688c23) will increase coverage by 0.02%.
The diff coverage is 95.45%.

@@            Coverage Diff             @@
##           master    #5071      +/-   ##
==========================================
+ Coverage   53.44%   53.46%   +0.02%     
==========================================
  Files         431      431              
  Lines       54364    54373       +9     
==========================================
+ Hits        29056    29073      +17     
+ Misses      23053    23043      -10     
- Partials     2255     2257       +2

Impacted Files	Coverage Δ
data/transactions/logic/debugger.go	`68.75% <ø> (+1.56%)`	⬆️
data/transactions/logic/tracer.go	`57.14% <0.00%> (+19.64%)`	⬆️
data/transactions/logic/eval.go	`90.40% <100.00%> (-0.08%)`	⬇️
ledger/internal/eval.go	`49.67% <100.00%> (+0.19%)`	⬆️
ledger/blockqueue.go	`82.25% <0.00%> (-2.69%)`	⬇️
ledger/tracker.go	`74.26% <0.00%> (-0.85%)`	⬇️
ledger/testing/randomAccounts.go	`56.26% <0.00%> (-0.62%)`	⬇️
ledger/acctupdates.go	`68.99% <0.00%> (-0.25%)`	⬇️
network/wsNetwork.go	`64.98% <0.00%> (-0.19%)`	⬇️
... and 6 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

bbroder-algo · 2023-01-28T15:48:49Z

What about a second paradigm where you separate out errors into its own class, i.e partial-ApplyData on an error is stored in a special erroredApplyData, and instead of calling AfterTxn or AfterProgram you go straight to an AfterError method.

jasonpaulos · 2023-01-30T17:48:40Z

@bbroder-algo having a separate function for errors, like AfterError, was something I considered, but I decided against it for these reasons:

It's necessary to trigger an error at every txn depth, since the errors will contain the partial ApplyData for that level of inner txn. E.g. if we want to know the ApplyData of a child txn and its parent at failure, the tracer method must be invoked twice.
If a tracer is interested in listening to an event, it probably also wants to know if that event failed. So it seems simpler to report errors in the After* hooks rather than introducing a new set of hooks that most tracers would want to implement anyway.

jannotti

It seems fine - I tried to pay most of my attention to places I thought might change or slow down execution. I put relatively less effort into ensuring I believe the traces are correct.

One place you might to test is that all the handling is correct if an opcode panics. That seems like a challenging case, especially an opcode in an inner txn.

TestPanic shows nice trick to have such a test.

jasonpaulos · 2023-02-02T00:14:19Z

+		// Due to the LIFO behavior of defer statements, if we want the tracer to catch panics, we
+		// need to have a `recover` defer statement defined after the tracer-invoking defer
+		// statement. However, since it's possible for the tracer itself to panic, we still want to
+		// keep the outmost `recover` defer statement defined at the top of this function.
+		defer func() {
+			if x := recover(); x != nil {
+				err = makePanicError(x, cx.EvalParams, "Eval")
+				pass = false
+			}
+		}()


Definitely looking for feedback on this. As the comment mentions, go defer statements are executed LIFO, so without this additional recovery, the above defer which invokes tracer methods will not see any PanicErrors, as those are only created by the defer at the top of this function.

By adding another defer layer that can recover from panics, we make them visible to the tracer.

An alternative here would be to just move the single recovery defer from the beginning of the function to after we define the tracer deferred logic. I opted against it since I didn't want to decrease coverage of potential panicking code, especially since it's in theory possible for tracer methods themselves to panic.

(There's a new test case, TestEvalWithTracerPanic, which confirm a tracer method panicking gets caught during eval.)

I think I follow what's going on, but let me ask in order to be sure. You are trying to catch the panic "early", and build the stack trace so that you have it available to give out to the tracer. Might you simply call the Tracer with a simpler error here, and then panic again with x?

That is, only one defer, something like:

defer func() { x := recover(); if x != nil { // A panic error occurred during the eval loop. Report it now. cx.Tracer.AfterOpcode(cx, fmt.Errorf("some nonsense")) } // Ensure we update the tracer before exiting cx.Tracer.AfterProgram(cx, err) if x != nil { panic(x) } }()

I recognize this gives AfterOpcode slightly less information to go on. But I'm trying to make it so that a panic is always handled through the single stack trace creating handler. Done this way, when there's a Tracer, we build the stack here, and we don't go through the existing panic handler, since we're no longer panicking.

My motivation is that I think I can see a future where we really want to preserve the difference of "normal" error returns from evaluation vs a panic, if/when we consider supporting a itxn_try_submit which I think should continue to fail for panics, rather than cleanly reporting 0 for a failed txn.

Good idea. I implemented it in 912d5de, let me know if you see any issues

algochoi

Looks good, I mostly went through the tests using a debugger to double-check that they behave as expected. I'll defer the logic/eval nuances to JJ.

bbroder-algo · 2023-02-14T23:45:19Z

+// possible during each inner transaction, as well as before all inners, between the two inner
+// groups, and after all inners. For app call failures, there are scenarios for both rejection and
+// runtime errors, which should invoke tracer hooks slightly differently.
+func GetTestScenarios() map[string]TestScenarioGenerator {


bbroder-algo · 2023-02-14T23:50:28Z

testing is excellent, passes for me, covers the cases discussed in the PR, I spent additional time staring at the changes to eval and the new panic handling/code segment relocations.

bbroder-algo

as above

jannotti · 2023-02-15T21:42:54Z

+				if pe, ok := err.(PanicError); ok {
+					require.Equal(t, panicString, pe.PanicValue)
+					pes := pe.Error()
+					require.True(t, strings.Contains(pes, "panic"))


There's require.Contains() if you'd like.

Modify EvalTracer design and improve testing for transaction/program …

7877000

…failures

jasonpaulos added the Enhancement label Jan 28, 2023

jasonpaulos commented Jan 28, 2023

View reviewed changes

jasonpaulos requested a review from jannotti January 30, 2023 17:48

Remove old test code

020502e

jannotti reviewed Jan 31, 2023

View reviewed changes

Comment thread ledger/internal/eval.go Outdated

Comment thread data/transactions/logic/debugger_eval_test.go Outdated

jasonpaulos added the Team Scytale label Feb 1, 2023

jasonpaulos requested a review from algochoi February 1, 2023 21:18

jasonpaulos added 2 commits February 1, 2023 15:43

Restructure tests and test for panics

ac2883d

Rename returnErr to err

b51e198

jasonpaulos commented Feb 2, 2023

View reviewed changes

jasonpaulos requested a review from jannotti February 2, 2023 17:03

algochoi reviewed Feb 2, 2023

View reviewed changes

Comment thread ledger/internal/eval_test.go

jasonpaulos added 6 commits February 6, 2023 14:40

Add test comment

872e746

Merge branch 'master' into evaltracer-failure-handling

e3e634f

Merge branch 'master' into evaltracer-failure-handling

548d7e7

Merge branch 'master' into evaltracer-failure-handling

a0ead05

Update panic handling

912d5de

typo

0b50423

barnjamin reviewed Feb 14, 2023

View reviewed changes

Comment thread data/transactions/logic/eval.go

bbroder-algo reviewed Feb 14, 2023

View reviewed changes

bbroder-algo self-requested a review February 15, 2023 00:13

bbroder-algo approved these changes Feb 15, 2023

View reviewed changes

algochoi approved these changes Feb 15, 2023

View reviewed changes

jannotti reviewed Feb 15, 2023

View reviewed changes

jannotti approved these changes Feb 15, 2023

View reviewed changes

jannotti merged commit eba02a6 into algorand:master Feb 15, 2023

jasonpaulos deleted the evaltracer-failure-handling branch February 15, 2023 21:47

jasonpaulos mentioned this pull request Feb 15, 2023

Algod: Additional simulation result information #4439

Merged

onetechnical mentioned this pull request Mar 1, 2023

DevOps: alphanet master remerge #5173

Merged

This was referenced Mar 10, 2023

go-algorand 3.15.0-beta Release PR #5190

Closed

go-algorand 3.15.0-beta Release PR #5194

Merged

Algo-devops-service mentioned this pull request Mar 17, 2023

go-algorand 3.15.0-stable Release PR #5218

Merged

Conversation

jasonpaulos commented Jan 28, 2023

Summary

Test Plan

Uh oh!

jasonpaulos Jan 28, 2023

Choose a reason for hiding this comment

Uh oh!

bbroder-algo Feb 14, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jasonpaulos Jan 28, 2023

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jan 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bbroder-algo commented Jan 28, 2023

Uh oh!

jasonpaulos commented Jan 30, 2023

Uh oh!

jannotti left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jasonpaulos Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jannotti Feb 14, 2023

Choose a reason for hiding this comment

Uh oh!

jasonpaulos Feb 14, 2023

Choose a reason for hiding this comment

Uh oh!

algochoi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bbroder-algo Feb 14, 2023

Choose a reason for hiding this comment

Uh oh!

bbroder-algo commented Feb 14, 2023

Uh oh!

bbroder-algo left a comment

Choose a reason for hiding this comment

Uh oh!

jannotti Feb 15, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov Bot commented Jan 28, 2023 •

edited

Loading

jasonpaulos Feb 2, 2023 •

edited

Loading