-
-
Notifications
You must be signed in to change notification settings - Fork 34.2k
Open
Labels
performanceIssues and PRs related to the performance of Node.js.Issues and PRs related to the performance of Node.js.
Description
Correctness
Encodings that return invalid results:
- Single-byte:
ibm866(fails at even ascii input)koi8-uwindows-874windows-1252windows-1253windows-1255
- Multi-byte (all except
gb18030):gbk(should be identical togb18030but it is instead broken)big5euc-jpiso-2022-jpshift_jis(fails at even ascii input)euc-kr
Unimplemented encodings that throw:
iso-8859-16x-user-defined
If built without icu, utf-16le encoding also returns invalid results:
> new TextDecoder('utf-16le').decode(Uint16Array.of(0xd800))
'�' // correct
'\ud800' // no ICUPerformance
utf-8(aka default)TextDecoderis much slower on ascii input than it can and should be
1.3xon 4096 bytes,~3xon 1 MiB input- The above applies to
buffer.toString()too
It's much slower on ASCII input than a checked js impl (same1.3x-3x) windows-1252akanew TextDecoder('ascii')akanew TextDecoder('latin1')
is ~2x-4xslower than an optimized impl on ascii inputwindows-1252akanew TextDecoder('latin1')
is ~6x-12xslower than an optimized impl on latin1 inputwindows-1252is ~7x-12xslower than an optimized js impl- Other single-byte encodings that are significantly slower than js impl even on non-ascii input:
iso-8859-3,iso-8859-6,iso-8859-7,iso-8859-8,iso-8859-8-i,windows-1253,windows-1255,windows-1257 - None of the single-byte encodings are faster than the js impl even on non-ascii input
- All of the single-byte encodings except
windows-1252are>=10xslower than the js impl on ascii input
(windows-1252is only ~2-4xslower)
References
Nothing of the above requires any changes on the native side, I compared to a somewhat optimized JS implementation
See https://docs.google.com/spreadsheets/d/1pdEefRG6r9fZy61WHGz0TKSt8cO4ISWqlpBN5KntIvQ/edit
See tests in https://github.com/ExodusOSS/bytes/blob/master/tests/encoding/mistakes.test.js (comment out the import and it can be run on Node.js without deps with only that file)
Suggestions
- Add a proper ASCII fast path to
buffer.toString()
src: improve StringBytes::Encode perf on ASCII #61119 - Add a proper ASCII fast path to
new TextDecoder().decode(arg)
src: improve StringBytes::Encode perf on ASCII #61119 - Perhaps replace single-byte decoders with a js impl, remove native paths and lib usage. They all are just mappers, the implementation for all of them is identical
Or at least replace the slow, unsupported, or invalid ones.
lib: implement all 1-byte encodings in js #61093 - Remove
gbkdecoder path and make it do the same asgb18030as the spec says
lib: gbk decoder is gb18030 decoder per spec #61099 - Fix or replace implementations for
big5,euc-jp,iso-2022-jp,shift_jis,euc-kr - For utf16 decode optimistically using existing fast apis, then check the string for validity
- Fix bugs in the non-ICU codepath
- To fix legacy multi-byte decoders, attempt to re-use what Chromium has
mcollina, liuxingbaoyu and mertcanaltinmertcanaltin
Metadata
Metadata
Assignees
Labels
performanceIssues and PRs related to the performance of Node.js.Issues and PRs related to the performance of Node.js.