Skip to content

Avoid using the terms "latin1" and "iso-8859-1" for isomorphic encoding/decoding #55

@domenic

Description

@domenic

09878c7 made me aware that the functions latin1toString, latin1fromString propagate the confusion between "latin1" and "isomorphic" that we see in a lot of the JavaScript ecosystem, and have tried to help combat in whatwg/encoding@36fb4e7.

In short, the "latin1" encoding specified in the ISO-8859-1 spec does not provide any encodings for the bytes 0x00 to 0x1F or 0x7F to 0x9F. So a proper latin1 decoder would never return those bytes, and a proper latin1 decoder would throw when given those bytes.

In practice, nobody does this, and we have either:

  • Libraries following the windows-1252 mapping (TextEncoder/TextDecoder, the entire web platform, Node.js's modern standard library);
  • or libraries following the isomorphic decoding / encoding (a lot of C++ code, Node.js's old Buffer API).

This creates a lot of confusion when people expect one of these interpretations and get the other.

My strong suggestion is to never mention the terms latin1 or iso-8859-1 in public APIs, since they mean windows-1252 for people who read standards and mean something else (usually isomorphic encoding) for people who are coming from certain C++ codebases. (I think V8 is the original source of the confusion, at least in the Node.js ecosystem.) Instead, use the standard and non-overloaded term "isomorphic".

I realize this is a breaking change and might not be one you want to take on, but I thought I should file it, in the interest of making this the best encoding/decoding library for JS.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions