Test failed: System.Text.Tests.TranscodingStreamTests.Read_Span(bufferLength: 1) #47444

VincentBu · 2021-01-26T01:30:18Z

Run: runtime 20210125.57

Failed tests:

net6.0-Linux-Release-arm-CoreCLR_checked-(Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440
        - System.Text.Tests.TranscodingStreamTests.Read_Span
                - System.Text.Tests.TranscodingStreamTests.Read_Span(bufferLength: 1)

Error message:

Assert.Equal() Failure
                                 ↓ (pos 3492055)
Expected: ···bcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghij···
Actual:   ···bcdefghijklmnopqrstuVwxyzabcdefghijklmnopqrstuvwxyzabcdefghij···
                                 ↑ (pos 3492055)

stack trace:
   at System.Text.Tests.TranscodingStreamTests.<RunReadTest>g__RunOneTestIteration|22_0(Int32 stringLength, <>c__DisplayClass22_0& ) in /_/src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 372
   at System.Text.Tests.TranscodingStreamTests.RunReadTest(Func`3 callback) in /_/src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 335
   at System.Text.Tests.TranscodingStreamTests.Read_Span(Int32 bufferLength) in /_/src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 312

The text was updated successfully, but these errors were encountered:

ghost · 2021-01-26T01:30:22Z

Tagging subscribers to this area: @tarekgh, @krwq
See info in area-owners.md if you want to be subscribed.

Issue Details

Run: runtime 20210125.57

Failed tests:

net6.0-Linux-Release-arm-CoreCLR_checked-(Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440
        - System.Text.Tests.TranscodingStreamTests.Read_Span
                - System.Text.Tests.TranscodingStreamTests.Read_Span(bufferLength: 1)

Error message:

Assert.Equal() Failure
                                 ↓ (pos 3492055)
Expected: ···bcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghij···
Actual:   ···bcdefghijklmnopqrstuVwxyzabcdefghijklmnopqrstuvwxyzabcdefghij···
                                 ↑ (pos 3492055)

stack trace:
   at System.Text.Tests.TranscodingStreamTests.<RunReadTest>g__RunOneTestIteration|22_0(Int32 stringLength, <>c__DisplayClass22_0& ) in /_/src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 372
   at System.Text.Tests.TranscodingStreamTests.RunReadTest(Func`3 callback) in /_/src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 335
   at System.Text.Tests.TranscodingStreamTests.Read_Span(Int32 bufferLength) in /_/src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 312

Author:	VincentBu
Assignees:	-
Labels:	`arch-arm32`, `area-System.Text.Encoding`, `os-linux`, `untriaged`
Milestone:	-

GrabYourPitchforks · 2021-01-26T01:50:33Z

No case conversion occurs as part of these tests. Is this failure reproducible? Are bits of memory randomly being flipped as we succumb to the slow, silent entropy of the universe?

stephentoub · 2021-01-26T03:42:14Z

Are bits of memory randomly being flipped as we succumb to the slow, silent entropy of the universe?

Having a morbid day, are we? 😄

danmoseley · 2021-01-26T19:15:40Z

This test runs in this configuration (this queue, at least) on many different machines:

dcount_MachineName

230

In that last 4 months it has failed twice -- both on the same box in much the same way

Bad memory on ddvsotx2l280 ? cc @MattGal

https://engsrvprod.kusto.windows.net/engineeringdata?query=H4sIAAAAAAAEAG2PQWvDMAyF74H8B9FLN3DLunsGHSOQwgZrAjsOzxaJm0QOlkIX2I%2bfk451h11s6b1PD6lCliPy2AlDmnzByTuC1pHNHBEGePOhLQR7Bk%2b%2fTWH%2fZQ%2f%2bY8HiX9gZODcYEHJHjhu08JABkD%2ffbHb3d%2fYWrkQ1DQhZBqty4hi%2frfBT4sPC2ypoYuOto7qUgLpf5NV19hml8XaZPqK27%2bWg6Y99uW2217l23Xp2huBPaARK0UHQKngagxbnSS2bqJ9MBXlwSLabXnQ%2fq9o0jvDSvI44LuVuxpl1HcUYaNq4sYn1PtRjjySs4DHeYJo0SZNvsaswnG4BAAA%3d&web=0

TestResults
| join kind=inner WorkItems on WorkItemId
| join kind=inner Jobs on JobId
| where Finished >= now(-120d)
| where Type == "System.Text.Tests.TranscodingStreamTests"
| where Method == "Read_Span"
| where Result == 'Fail'
| project Started, Duration, Type, Method, FriendlyName, MachineName, QueueName1, Message, StackTrace, Arguments, Branch

Started	Duration	Type	Method	FriendlyName	MachineName	QueueName1	Message	StackTrace	Arguments	Branch
2020-12-23 21:07:36.0500000	3.757	System.Text.Tests.TranscodingStreamTests	Read_Span	System.Text.Encoding.Tests	ddvsotx2l280	ubuntu.1804.armarch.open	Assert.Equal() Failure\n ↓ (pos 4401399)\nExpected: ···vwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcd···\nActual: ···vwxyzabcdefghijklmnoPqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcd···\n ↑ (pos 4401399)	at System.Text.Tests.TranscodingStreamTests.g__RunOneTestIteration\|22_0(Int32 stringLength, <>c__DisplayClass22_0& ) in //src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 372 at System.Text.Tests.TranscodingStreamTests.RunReadTest(Func`3 callback) in //src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 335 at System.Text.Tests.TranscodingStreamTests.Read_Span(Int32 bufferLength) in /_/src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 312	bufferLength: 1	refs/pull/46351/merge
2021-01-25 21:31:04.3900000	2.032	System.Text.Tests.TranscodingStreamTests	Read_Span	System.Text.Encoding.Tests	ddvsotx2l280	ubuntu.1804.armarch.open	Assert.Equal() Failure\n ↓ (pos 3492055)\nExpected: ···bcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghij···\nActual: ···bcdefghijklmnopqrstuVwxyzabcdefghijklmnopqrstuvwxyzabcdefghij···\n ↑ (pos 3492055)	at System.Text.Tests.TranscodingStreamTests.g__RunOneTestIteration\|22_0(Int32 stringLength, <>c__DisplayClass22_0& ) in //src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 372 at System.Text.Tests.TranscodingStreamTests.RunReadTest(Func`3 callback) in //src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 335 at System.Text.Tests.TranscodingStreamTests.Read_Span(Int32 bufferLength) in /_/src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 312	bufferLength: 1	refs/heads/master

danmoseley · 2021-01-26T19:16:49Z

It might be interesting to query failure % of tests in general on ddvsotx2l280 compared to other machines with the same configuration and OS.

danmoseley · 2021-01-27T01:39:21Z

For the record, the machine seems middle of the pack by rate of test failures or segfaults, using this query

WorkItems
| where Finished >= now(-120d)
| where QueueName == "ubuntu.1804.armarch.open"
| summarize Pass=countif(ExitCode == 0), Fail=countif(ExitCode == 1), Abort=countif(ExitCode != 0 and ExitCode != 1), SegV= countif(ExitCode==139) by MachineName
| extend FailRate = round(toreal(Fail)/(Pass+Fail+Abort), 4) * 100, AbortRate = round(toreal(Abort)/(Pass+Fail+Abort), 4) * 100, SegVRate = round(toreal(SegV)/(Pass+Fail+Abort), 6) * 100

So... who knows? Twice could be a coincidence.

What is the chance that this single bit error would be in roughly the same place in the buffer ? Is this the only test that writes a large buffer and reads it back? @GrabYourPitchforks any other idea?

danmoseley · 2021-01-27T01:39:53Z

Maybe @jkotas has an idea

jkotas · 2021-01-27T02:04:20Z

I think we should wait and see whether it fails on some other machine too. I can be a bad memory on the specific machine.

If we see it failing on more than one machine, it can be dangling pointer bug or GC hole bug. We would need a process dump taken as close to the failure point as possible to diagnose it.

danmoseley · 2021-01-27T03:26:55Z

OK, let's check back in a month...

MattGal · 2021-01-27T16:04:41Z

OK, let's check back in a month...

Something easy to do and I don't mind doing, if you pick a correlation id of a specific run for me, is to send exactly hundreds of clones of the just the work item you suspect is a problem using the Helix SDK. We'd get pretty useful data about the general flakiness of the test and know whether to blame that one machine from this.

danmoseley · 2021-01-27T17:01:20Z

@MattGal thanks for the offer. Can you extract the "correlation ID" from one of the results in the query above?

MattGal · 2021-01-27T17:21:58Z

Sure, can you confirm somewhere that this log is the exact one to clone a bunch of times?

As part of the recent surge (I think you even contributed) we log the correlation id at the top of all these logs now like:
Console log: 'System.Text.Encoding.Tests' from job 8d51db18-6cb8-42d7-a68e-dfa7c178bbf5 ...

danmoseley · 2021-01-27T17:29:54Z

Ah, I see, I didn't know what correlation ID is -- sounds like it's job ID.

Yes, that's one of the ones with the problem. It would be interesting to loop that a zillion times on this machine and grab the console output of any failures. thank you!

MattGal · 2021-01-27T17:33:32Z

I was thinking a zillion times on all the machines, to isolate "we think this machine has failing hardware" thing?

Job Id is a long in the DB (in the case of the GUID above, "JobId" == 13132067)

MattGal · 2021-01-27T18:52:41Z

I ran this test 800x in a row across 274/278 of these machines (see JobId: 13167068 Correlation: 7082a89-a973-405f-af71-9babae5ac23e)

It ended up passing 3x on that machine:

WorkItems
| where Queued > ago(1d)
| where JobName == "7082a892-a973-405f-af71-9babae5ac23e" 
| where Name != JobName
| where MachineName == "ddvsotx2l280"

I am set up to be able to do this over and over to any scale we like but I"m trying to keep it minimal as PRs still need to work. I may run a few hundred here and there today and add them to my checks.

MattGal · 2021-01-28T01:11:15Z

Machine's back in the pool. I ran stress-ng loads suggested by Dan both on the machine for 2 hours and inside the container the test failed on, no luck :(

VincentBu · 2021-07-06T02:30:15Z

Failed again in runtime-coreclr libraries-jitstress 20210705.1

Failed tests:

net6.0-Linux-Release-arm-CoreCLR_checked-tailcallstress-(Ubuntu.1804.Arm32.Open)[email protected]/dotnet-buildtools/prereqs:ubuntu-18.04-helix-arm32v7-bfcd90a-20200121150440

- System.Text.Tests.TranscodingStreamTests.Read_Span(bufferLength: 1)
- System.Text.Tests.TranscodingStreamTests.Read_ByteByByte

Error message:

Assert.Equal() Failure
↓ (pos 2488567)
Expected: ···jklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr···
Actual:   ···jklmnopqrstuvwxyzabcDefghijklmnopqrstuvwxyzabcdefghijklmnopqr···
↑ (pos 2488567)


Stack trace
   at System.Text.Tests.TranscodingStreamTests.<RunReadTest>g__RunOneTestIteration|22_0(Int32 stringLength, <>c__DisplayClass22_0& ) in /_/src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 390
   at System.Text.Tests.TranscodingStreamTests.RunReadTest(Func`3 callback) in /_/src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 337
   at System.Text.Tests.TranscodingStreamTests.Read_ByteByByte() in /_/src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 293

kunalspathak · 2021-07-22T15:46:16Z

This is the failure on same machine ddvsotx2l280 but on different position 2488567. In prior run it was failing on position 3492055. @MattGal , @danmoseley

I tried reproducing it on the local Linux/arm machine and it doesn't reproduce.

GrabYourPitchforks · 2021-07-22T21:38:12Z

Spitballing a few ideas.

One option is that we try to log the physical memory address where this failure occurs. From the managed side it'd be easy enough to record the process's virtual memory address, but I don't know how to reliably map that back to a physical memory address short of having some system-level service already running on the box and using IPC.

Another option is to query the machine name at the start of the test and to perform a different action if we're on the target box. That could include skipping the test, using absurdly-sized payloads to try to trigger the repro more rapidly, or whatever else we think appropriate. Risks here would be that: (a) there's an opportunity cost for doing this work if we already suspect a hardware problem and have already proposed a resolution of pulling the machine from rotation; and (b) we don't want to risk accidentally leaving this code checked in.

A big-hammer is to memtest86+ the physical box. 😈

kunalspathak · 2021-07-22T22:56:02Z

This is a low hit non-deterministic issue, I am inclined to moving this out of .NET6.0. Let me know if anyone thinks that it should be investigated in .NET6.

danmoseley · 2021-07-23T02:46:59Z

@MattGal can we ask for someone to run memtest? It's hard to understand otherwise why it's the same box.

MattGal · 2021-07-23T16:12:28Z

@MattGal can we ask for someone to run memtest? It's hard to understand otherwise why it's the same box.

I took the machine offline and will file a ticket to get DDFUN to try this out. Due to my only access being over KVM, I can't seem to get to the boot loader screen to try memtest. Someone with physical access may have to attempt this.

danmoseley · 2021-07-23T16:17:47Z

I ran the query above again, and although this test has run on 34 distinct machines in the last 120 days, the only two failures (new ones since last time) have both been on ddvsotx2l280. Both failures had the same pattern.

I'm guessing this test is one of the most stressful on memory, but I don't know off the top of my head how to create a query like "which tests disproportionately fail on ddvsotx2l280". If that threw up other memory-intensive tests, it would be more evidence.

TestResults
| join kind=inner WorkItems on WorkItemId
| join kind=inner Jobs on JobId
| where Finished >= now(-365d)
| where Type == "System.Text.Tests.TranscodingStreamTests"
| where Method == "Read_Span"
| where Result == 'Fail'
| project Started, Duration, Type, Method, FriendlyName, MachineName, QueueName1, Message, StackTrace, Arguments, Branch

Started	Duration	Type	Method	FriendlyName	MachineName	QueueName1	Message	StackTrace	Arguments	Branch
2021-07-05 09:11:45.7870000	2.156	System.Text.Tests.TranscodingStreamTests	Read_Span	System.Text.Encoding.Tests	ddvsotx2l280	ubuntu.1804.armarch.open	Assert.Equal() Failure ↓ (pos 6482135) Expected: ···defghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijkl··· Actual: ···defghijklmnopqrstuvwXyzabcdefghijklmnopqrstuvwxyzabcdefghijkl··· ↑ (pos 6482135)	at System.Text.Tests.TranscodingStreamTests.g__RunOneTestIteration	22_0(Int32 stringLength, <>c__DisplayClass22_0& ) in //src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 390 at System.Text.Tests.TranscodingStreamTests.RunReadTest(Func`3 callback) in //src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 337 at System.Text.Tests.TranscodingStreamTests.Read_Span(Int32 bufferLength) in /_/src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 314	bufferLength: 1
2021-07-06 21:41:48.9150000	2.884	System.Text.Tests.TranscodingStreamTests	Read_Span	System.Text.Encoding.Tests	ddvsotx2l280	ubuntu.1804.armarch.open	Assert.Equal() Failure ↓ (pos 4839671) Expected: ···lmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrst··· Actual: ···lmnopqrstuvwxyzabcdeFghijklmnopqrstuvwxyzabcdefghijklmnopqrst··· ↑ (pos 4839671)	at System.Text.Tests.TranscodingStreamTests.g__RunOneTestIteration	22_0(Int32 stringLength, <>c__DisplayClass22_0& ) in //src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 390 at System.Text.Tests.TranscodingStreamTests.RunReadTest(Func`3 callback) in //src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 337 at System.Text.Tests.TranscodingStreamTests.Read_Span(Int32 bufferLength) in /_/src/libraries/System.Text.Encoding/tests/Encoding/TranscodingStreamTests.cs:line 314	bufferLength: 1

kunalspathak · 2021-07-26T18:00:05Z

Assigning to you @MattGal - Feel free to redirect to me once we get the result of memtest.

JulieLeeMSFT · 2021-08-10T20:14:51Z

@MattGal, any update with memtest?

danmoseley · 2021-08-10T20:22:14Z

@parose1 responded a few days ago on the issue linked.

This is failing ~1 time every 2 months, so it will be a long time before we have more data. I suggest to either close this (optimistically) or move out of 6.0

danmoseley · 2021-08-10T20:23:17Z

The query above shows no new failures since 7/6 so I suggest to close optimistically. I still think it's important to re-flash or whatever the device.

kunalspathak · 2021-08-10T20:24:26Z

I agree with @danmoseley .

JulieLeeMSFT · 2021-08-10T20:29:09Z

Thanks @danmoseley and @kunalspathak.

MattGal · 2021-08-10T20:33:00Z

@MattGal, any update with memtest?

It didn't seem to exist for this specific hardware setup, so it got reimaged instead.

danmoseley · 2021-08-10T22:27:39Z

Let's meet back here in 2 months and see whether this test failed again and if so on which machine.

danmoseley · 2021-08-10T22:28:04Z

Hopefully @MattGal the machine name didn't change? If not what is it now.

MattGal · 2021-08-10T22:58:04Z

Hopefully @MattGal the machine name didn't change? If not what is it now.

@parose1 can confirm but in my experience, they do not change the machine name. That's why all our Macs still have their Jenkins names :)

VincentBu added arch-arm32 os-linux Linux OS (any supported distro) labels Jan 26, 2021

dotnet-issue-labeler bot added area-System.Text.Encoding untriaged New issue has not been triaged by the area owner labels Jan 26, 2021

tarekgh removed the untriaged New issue has not been triaged by the area owner label Jan 26, 2021

tarekgh added this to the 6.0.0 milestone Jan 26, 2021

tarekgh added the test-run-core Test failures in .NET Core test runs label Jan 26, 2021

jkotas added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.Text.Encoding labels Jul 6, 2021

SingleAccretion mentioned this issue Jul 7, 2021

Disable folding of implementation-defined casts #53782

Merged

JulieLeeMSFT assigned kunalspathak Jul 17, 2021

kunalspathak assigned MattGal and unassigned kunalspathak Jul 26, 2021

kunalspathak closed this as completed Aug 10, 2021

ghost locked as resolved and limited conversation to collaborators Sep 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test failed: System.Text.Tests.TranscodingStreamTests.Read_Span(bufferLength: 1) #47444

Test failed: System.Text.Tests.TranscodingStreamTests.Read_Span(bufferLength: 1) #47444

VincentBu commented Jan 26, 2021

ghost commented Jan 26, 2021

GrabYourPitchforks commented Jan 26, 2021

stephentoub commented Jan 26, 2021

danmoseley commented Jan 26, 2021

danmoseley commented Jan 26, 2021

danmoseley commented Jan 27, 2021

danmoseley commented Jan 27, 2021

jkotas commented Jan 27, 2021

danmoseley commented Jan 27, 2021

MattGal commented Jan 27, 2021

danmoseley commented Jan 27, 2021

MattGal commented Jan 27, 2021

danmoseley commented Jan 27, 2021

MattGal commented Jan 27, 2021 •

edited

Loading

MattGal commented Jan 27, 2021

MattGal commented Jan 28, 2021

VincentBu commented Jul 6, 2021

kunalspathak commented Jul 22, 2021 •

edited

Loading

GrabYourPitchforks commented Jul 22, 2021

kunalspathak commented Jul 22, 2021

danmoseley commented Jul 23, 2021

MattGal commented Jul 23, 2021

danmoseley commented Jul 23, 2021 •

edited

Loading

kunalspathak commented Jul 26, 2021

JulieLeeMSFT commented Aug 10, 2021

danmoseley commented Aug 10, 2021

danmoseley commented Aug 10, 2021

kunalspathak commented Aug 10, 2021

JulieLeeMSFT commented Aug 10, 2021

MattGal commented Aug 10, 2021

danmoseley commented Aug 10, 2021

danmoseley commented Aug 10, 2021

MattGal commented Aug 10, 2021

Test failed: System.Text.Tests.TranscodingStreamTests.Read_Span(bufferLength: 1) #47444

Test failed: System.Text.Tests.TranscodingStreamTests.Read_Span(bufferLength: 1) #47444

Comments

VincentBu commented Jan 26, 2021

ghost commented Jan 26, 2021

GrabYourPitchforks commented Jan 26, 2021

stephentoub commented Jan 26, 2021

danmoseley commented Jan 26, 2021

dcount_MachineName

danmoseley commented Jan 26, 2021

danmoseley commented Jan 27, 2021

danmoseley commented Jan 27, 2021

jkotas commented Jan 27, 2021

danmoseley commented Jan 27, 2021

MattGal commented Jan 27, 2021

danmoseley commented Jan 27, 2021

MattGal commented Jan 27, 2021

danmoseley commented Jan 27, 2021

MattGal commented Jan 27, 2021 • edited Loading

MattGal commented Jan 27, 2021

MattGal commented Jan 28, 2021

VincentBu commented Jul 6, 2021

kunalspathak commented Jul 22, 2021 • edited Loading

GrabYourPitchforks commented Jul 22, 2021

kunalspathak commented Jul 22, 2021

danmoseley commented Jul 23, 2021

MattGal commented Jul 23, 2021

danmoseley commented Jul 23, 2021 • edited Loading

kunalspathak commented Jul 26, 2021

JulieLeeMSFT commented Aug 10, 2021

danmoseley commented Aug 10, 2021

danmoseley commented Aug 10, 2021

kunalspathak commented Aug 10, 2021

JulieLeeMSFT commented Aug 10, 2021

MattGal commented Aug 10, 2021

danmoseley commented Aug 10, 2021

danmoseley commented Aug 10, 2021

MattGal commented Aug 10, 2021

MattGal commented Jan 27, 2021 •

edited

Loading

kunalspathak commented Jul 22, 2021 •

edited

Loading

danmoseley commented Jul 23, 2021 •

edited

Loading