UTF+8 encodings are broken #54543

nikhilro · 2024-08-24T16:43:25Z

Version

22.7.0

Platform

Linux api-deployment-694785c9f5-8dd8j 5.10.223-211.872.amzn2.x86_64 #1 SMP Mon Jul 29 19:52:29 UTC 2024 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

Hey everyone, I'm not sure how to reproduce but latest node can't parse UTF+8 anymore. It works for the first minute or two (or couple hours if I remove Datadog APM instrumentation) but then returns garbage on the same request. I'm just using postgres.js to fetch and nest.js for the HTTP server. No fancy buffer manipulation.

curl --location 'https://api.vapi.ai/assistant/205deb59-755c-489c-8879-7523b1318ed8' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer XXXXXX'
{"id":"205deb59-755c-489c-8879-7523b1318ed8","orgId":"7616920b-4696-458b-a2aa-3453fd13ace4","name":"éñüçßÆ","createdAt":"2024-08-24T08:58:16.110Z","updatedAt":"2024-08-24T08:58:16.110Z","isServerUrlSecretSet":false}%
 
## 2 minutes later      
curl --location 'https://api.vapi.ai/assistant/205deb59-755c-489c-8879-7523b1318ed8' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer XXXXXX'
{"id":"205deb59-755c-489c-8879-7523b1318ed8","orgId":"7616920b-4696-458b-a2aa-3453fd13ace4","name":"������","createdAt":"2024-08-24T08:58:16.110Z","updatedAt":"2024-08-24T08:58:16.110Z","isServerUrlSecretSet":false}%

Note how éñüçßÆ gets corrupted.

How often does it reproduce? Is there a required condition?

Restart the process, it works for sometime and then corrupts itself.

What is the expected behavior? Why is that the expected behavior?

It should keep returning the current text.

What do you see instead?

Garbage text

Additional information

No response

The text was updated successfully, but these errors were encountered:

nikhilro · 2024-08-24T16:51:06Z

Update: UTF+8 is fine. It's specifically on ASCII extended set.

I tried other versions:

"name": "aa" # works fine
"name": "дмитрий" # works fine
"name": "💩" # works fine
"name": "¿" # doesn't work
"name" "éñüçßÆ" doesn't work

Our best guess is ASCII extended set UTF-8 encoded vs normal ASCII extended set are getting mixed up and corrupted

nikhilro · 2024-08-24T16:59:29Z

cc @ronag if you have any ideas.

found 3 commits related to encoding:

avivkeller · 2024-08-24T17:17:16Z

Hi! v22.7.0 has a few known buffer issues, so could you provide a minimal reproduction so the issue can be narrowed down?

Additionally, could you self-moderate your comment containing curse-words, as it may be offensive to some viewers?

Edit: Thanks!

Possibly a duplicate of: #54521

ronag · 2024-08-24T17:47:01Z

Can you check if this fixes it? #54526

nikhilro · 2024-08-24T18:43:00Z

@redyetidev

Ah gotcha, likely too hard to narrow down to minimal reproduction. If you wanna get on a call, I can show you a reproduction that happens within ~8 minutes. Happy to just test when the patch comes out though.
Done

@ronag

Similarly, happy to test when the patch comes out. I can setup the build if you really want.

Thanks for confirming both. I'll rollback to 22.6.0 for now.

sharpner · 2024-08-26T08:41:59Z

We also ran into this issue, took us forever to find the culprit

We reproduced it by having a simple express http handler deployed on amazon app runner:

  res.status(200).json({
    umlaute: 'äöü',
  });
};

Further Info:
node 22.6 seems not to be affected, but 22.7 is

eugene1g · 2024-08-26T11:33:57Z

I'm seeing this show up as failed PostgreSQL queries that contain an umlaut as a parameter, with a cryptic-looking error:

Unable to execute query: "invalid byte sequence for encoding "UTF8"

~~Further, I couldn't reproduce it on Apple silicon, but it fails reliably on Linux.~~

edit: the test case provided by @blexrob fails reliably in 22.7.0 on both Apple Silicon and Linux (22.6.0 works as expected)

let i = 0;
const testStr = "jürge";
const expected = Buffer.from(testStr).toString("hex");
for(; i < 1_000_000; i++) {
  const buf = Buffer.from(testStr);
  const ashex = buf.toString("hex");
  if (ashex !== expected) {
    console.log(`Decoding changed in iteration ${i} when changing to FastWriteStringUTF8, got ${ashex}, expected ${expected}`);
    break;
  }
}

if(i<1_000_000) {
  console.error("FAILED after %d iterations",i);
} else
  console.log("PASSED after %d iterations",i);

avivkeller · 2024-08-26T13:05:57Z

I'd like to remind everyone that "me too" comments only add noise to this already noisy topic. Please refrain from commenting until you have something to add to the conversation

Edit: this isn't directed at any comments. This is meant to deter future "me too" comments, as they occur often with issues like this.

avivkeller · 2024-08-26T22:03:15Z

I feel like one of the patches in v22.8.0 (#54560), when it lands, will resolve this issue. Once that lands, please post a comment whether it resolves this issue. Given it's current state, that could be a few days.

aapoalas · 2024-08-27T10:03:23Z

I assume you have already tracked this down, but I believe the issue is basically:

After enough calls to some string to buffer writing API, V8 optimizes the call to use the Fast API path.
Node.js' FastWriteString (I'm only guessing this is the API in question, but it is likely the one) gets the call and incorrectly assumes that the v8::FastOneByteString it is given is ASCII: This is not the case, OneByteString is Latin-1 encoded.
The function then directly copies the data into the destination buffer here. The buffer is now assumed to contain the string's data as UTF-8, but instead contains the data as Latin-1.

avivkeller · 2024-08-27T15:06:12Z

#54565 will fix this issue for the v22.8.0 release. Then, #54526 (and similar) will be evaluated for a future release.

When #54565 lands, I'll close this issue

devronhansen · 2024-08-28T07:15:41Z

This kept us up a couple of nights. Thank you for fixing it!

Da chainguard kun har latest node, og som en følge av denne buggen: nodejs/node#54543 vurderer vi det dithen at vi ikke lenger vil være på chainguard. Co-authored-by: Thomas Dufourd <[email protected]>

adriano-tirloni · 2024-08-28T16:20:09Z

I am not familiar with the internals of node, but I just lost a few days because of this.
I could not replicate efficiently because it takes a while to happen.

What gave it away was that the same request lifecycle returned intact UTF-8 string to the browser and corrupted UTF-8 to the logger service, which spins up a new worker for log transport.

avivkeller · 2024-08-29T10:07:29Z

When #54565 lands, I'll close this issue

This PR has landed. Expect the release to follow shortly:

SimonX200 · 2024-08-31T15:50:50Z

I hope that the test coverage gets improved with the fix in 22.8

nicholas-long · 2024-11-05T17:38:25Z

if this broke your data when saving to mongo, here is a script to help fix it: https://github.com/nicholas-long/mongo-node-fix-54543

An [important UTF-8 bug](nodejs/node#54543) was discovered in v22.7 and fixed in v22.8. We should only allow v22.8 to avoid this issue for end users. Also updates Node version used by various CI tooling to be compliant with the new setting

nikhilro mentioned this issue Aug 24, 2024

Please allow minor versions on Node 22 nikolaik/docker-python-nodejs#222

Open

superfan-dobri mentioned this issue Aug 25, 2024

Suddenly, elastic transport cannot handle unicode characters. elastic/elastic-transport-js#130

Closed

ericrange mentioned this issue Aug 25, 2024

ERR_BUFFER_OUT_OF_BOUNDS error with Google Vision package on Node 22.7.0 #54518

Closed

avivkeller added confirmed-bug Issues with confirmed bugs. regression Issues related to regressions. v22.x v22.x Issues that can be reproduced on v22.x or PRs targeting the v22.x-staging branch. buffer Issues and PRs related to the buffer subsystem. labels Aug 26, 2024

Nicd mentioned this issue Aug 27, 2024

# Bug Report: Encoding Version Changes on Server Overload nodejs/docker-node#2135

Closed

avivkeller closed this as completed Aug 29, 2024

avivkeller mentioned this issue Aug 29, 2024

Textencoder decode to UTF8 with fatal true throws exception in version 22.7 #54628

Closed

NullVoxPopuli mentioned this issue Aug 29, 2024

Malformed HTML Entities in Ember Templates emberjs/ember.js#20736

Closed

andershermansen mentioned this issue Aug 29, 2024

Node 22.7.0 Compatibility issues payloadcms/payload#7965

Closed

lpinca mentioned this issue Aug 30, 2024

Error: Invalid WebSocket frame: invalid UTF-8 sequence using Node 22.7 websockets/ws#2252

Closed

1 task

zntb mentioned this issue Oct 5, 2024

[Snyk] Upgrade mongodb from 6.8.1 to 6.9.0 zntb/mern-food-ordering-app-backend#17

Merged

ranjith221132 mentioned this issue Oct 5, 2024

[Snyk] Upgrade mongodb from 6.3.0 to 6.9.0 ranjith221132/Leavemanagement#15

Open

arenault-pass mentioned this issue Oct 5, 2024

[Snyk] Upgrade mongodb from 6.3.0 to 6.9.0 caxewsh/Drinkup#254

Closed

MoaidHashem3 mentioned this issue Oct 5, 2024

[Snyk] Upgrade mongodb from 6.8.1 to 6.9.0 MoaidHashem3/UMS-University-Management-System-Backend#19

Open

mkusztal mentioned this issue Oct 5, 2024

[Snyk] Upgrade mongodb from 6.3.0 to 6.9.0 mkusztal/portfolioAboutMe#181

Open

arenault-pass mentioned this issue Oct 6, 2024

[Snyk] Upgrade mongodb from 6.3.0 to 6.9.0 caxewsh/Drinkup#261

Closed

tjconcept mentioned this issue Oct 12, 2024

Overlapping LTS releases denoland/deno#26192

Open

ranjith221132 mentioned this issue Oct 25, 2024

[Snyk] Upgrade mongodb from 6.3.0 to 6.9.0 ranjith221132/Leavemanagement#25

Open

Abhishekh669 mentioned this issue Oct 25, 2024

[Snyk] Upgrade mongodb from 6.8.0 to 6.9.0 Abhishekh669/next-auth#5

Open

aliakseikrauchanka mentioned this issue Oct 25, 2024

[Snyk] Upgrade mongodb from 6.8.0 to 6.9.0 aliakseikrauchanka/lifeis#32

Open

parseplatformorg mentioned this issue Oct 26, 2024

refactor: Upgrade mongodb from 4.10.0 to 6.9.0 parse-community/parse-server#9377

Closed

Xcaciv mentioned this issue Oct 26, 2024

[Snyk] Upgrade mongodb from 6.7.0 to 6.9.0 Xcaciv/appsec-test-2024#1

Open

MohammadGhajari mentioned this issue Oct 28, 2024

[Snyk] Upgrade mongodb from 6.5.0 to 6.9.0 MohammadGhajari/information-security#62

Open

lydmoon mentioned this issue Oct 28, 2024

[Snyk] Upgrade mongodb from 6.2.0 to 6.9.0 lydmoon/Lydia-MoonFullStackBankingApplication#77

Open

xeni09 mentioned this issue Oct 28, 2024

[Snyk] Upgrade mongodb from 6.8.0 to 6.9.0 xeni09/e-learning-red-panda-school-for-deployment#50

Merged

abaum65 mentioned this issue Oct 29, 2024

[Snyk] Upgrade mongodb from 6.7.0 to 6.9.0 abaum65/saintcon-appsec-challenge-2024#2

Open

prosenjit07 mentioned this issue Nov 8, 2024

[Snyk] Upgrade mongodb from 6.8.0 to 6.9.0 prosenjit07/PathOptimzer#6

Open

parseplatformorg mentioned this issue Nov 9, 2024

refactor: Upgrade mongodb from 4.10.0 to 6.9.0 parse-community/parse-server#9419

Closed

shafeeqd959 mentioned this issue Nov 11, 2024

[Snyk] Upgrade mongodb from 6.9.0 to 6.10.0 contentstack/datasync-mongodb-sdk#34

Closed

TartejBrothers mentioned this issue Nov 12, 2024

[Snyk] Upgrade mongodb from 6.9.0 to 6.10.0 TartejBrothers/TodoApp-MongoDB#46

Merged

akanchhaS mentioned this issue Nov 12, 2024

[Snyk] Upgrade mongodb from 3.5.9 to 6.10.0 akanchhaS/github-goof-npm#1366

Open

zntb mentioned this issue Nov 12, 2024

[Snyk] Upgrade mongodb from 6.9.0 to 6.10.0 zntb/mern-food-ordering-app-backend#24

Closed

skanda890 mentioned this issue Nov 12, 2024

[Snyk] Upgrade mongodb from 6.9.0 to 6.10.0 skanda890/CodePark#154

Merged

thnbih mentioned this issue Nov 12, 2024

[Snyk] Upgrade mongodb from 6.5.0 to 6.10.0 thnbih/Harmonious-Living-Website#3

Open

This was referenced Nov 13, 2024

[Snyk] Upgrade mongodb from 6.9.0 to 6.10.0 zntb/Dynamic-Site-Test#11

Merged

[Snyk] Upgrade mongodb from 6.9.0 to 6.10.0 zntb/property-pulse#49

Closed

tyteen4a03 mentioned this issue Nov 26, 2024

chore: ensure node 22 version is 22.8 or above payloadcms/payload#9548

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF+8 encodings are broken #54543

UTF+8 encodings are broken #54543

nikhilro commented Aug 24, 2024 •

edited

Loading

nikhilro commented Aug 24, 2024 •

edited

Loading

nikhilro commented Aug 24, 2024

avivkeller commented Aug 24, 2024 •

edited

Loading

ronag commented Aug 24, 2024

nikhilro commented Aug 24, 2024

sharpner commented Aug 26, 2024 •

edited

Loading

eugene1g commented Aug 26, 2024 •

edited

Loading

avivkeller commented Aug 26, 2024 •

edited

Loading

avivkeller commented Aug 26, 2024

aapoalas commented Aug 27, 2024 •

edited

Loading

avivkeller commented Aug 27, 2024

devronhansen commented Aug 28, 2024

adriano-tirloni commented Aug 28, 2024

avivkeller commented Aug 29, 2024

SimonX200 commented Aug 31, 2024

nicholas-long commented Nov 5, 2024

UTF+8 encodings are broken #54543

UTF+8 encodings are broken #54543

Comments

nikhilro commented Aug 24, 2024 • edited Loading

Version

Platform

Subsystem

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behavior? Why is that the expected behavior?

What do you see instead?

Additional information

nikhilro commented Aug 24, 2024 • edited Loading

nikhilro commented Aug 24, 2024

avivkeller commented Aug 24, 2024 • edited Loading

ronag commented Aug 24, 2024

nikhilro commented Aug 24, 2024

sharpner commented Aug 26, 2024 • edited Loading

eugene1g commented Aug 26, 2024 • edited Loading

avivkeller commented Aug 26, 2024 • edited Loading

avivkeller commented Aug 26, 2024

aapoalas commented Aug 27, 2024 • edited Loading

avivkeller commented Aug 27, 2024

devronhansen commented Aug 28, 2024

adriano-tirloni commented Aug 28, 2024

avivkeller commented Aug 29, 2024

SimonX200 commented Aug 31, 2024

nicholas-long commented Nov 5, 2024

nikhilro commented Aug 24, 2024 •

edited

Loading

nikhilro commented Aug 24, 2024 •

edited

Loading

avivkeller commented Aug 24, 2024 •

edited

Loading

sharpner commented Aug 26, 2024 •

edited

Loading

eugene1g commented Aug 26, 2024 •

edited

Loading

avivkeller commented Aug 26, 2024 •

edited

Loading

aapoalas commented Aug 27, 2024 •

edited

Loading