This package contains the Charset codec family. These codecs are used to encode Strings of text into various standardised binary representations for I/O purposes.

Any Character can be encoded without error by the unicode charsets (utf8 and utf16). However other charsets are only compatible with a limited range of characters (ascii and iso_8859_1), and so may throw an EncodeException. All charsets can throw DecodeException when decoding Bytes into Characters, as the valid binary format for each is strictly defined.

To convert a String to an ASCII byte List:

List<Byte> bytes = ascii.encode("Hello, World!");

Now, if you want to decode it back:

String string = ascii.decode(bytes);

Similarly, for a ByteBuffer:

ByteBuffer bytes = utf8.encodeBuffer("Clear Air Turbulence");
CharacterBuffer chars = utf8.decodeBuffer(bytes);

If you only know the name of a charset you can get its Charset with:

Charset? charset = charsetsByAlias["UTF-8"];
By: Stéphane Épardaud, Alex Szczuczko
Values
asciiSource Codeshared ascii ascii

The ASCII character set, as defined by its specification.

By: Stéphane Épardaud
charsetsByAliasSource Codeshared Map<String,Charset> charsetsByAlias

A mapping of all supported character sets.

Currently this contains:

  • ASCII
  • ISO 8859 1
  • UTF-8
  • UTF-16
iso_8859_1Source Codeshared iso_8859_1 iso_8859_1

The ISO 8859-1 character set, as defined by its specification.

By: Stéphane Épardaud
utf16Source Codeshared utf16 utf16

The UTF-16 character set, as defined by (its specification) [http://www.ietf.org/rfc/rfc2781.txt].

Decoders for UTF-16 will properly recognize BOM (byte order mark) markers for both big and little endian encodings, but encoders will generate big-endian UTF-16 with no BOM markers.

By: Stéphane Épardaud
utf8Source Codeshared utf8 utf8

The UTF-8 character set, as defined by (its specification) [http://tools.ietf.org/html/rfc3629].

By: Stéphane Épardaud
Interfaces
CharsetSource Codeshared Charset

A character set, which allows you to convert characters to bytes and back.

You can find a character set by a String alias with charsetsByAlias

Classes
asciiSource Codeshared ascii

The ASCII character set, as defined by its specification.

iso_8859_1Source Codeshared iso_8859_1

The ISO 8859-1 character set, as defined by its specification.

utf16Source Codeshared utf16

The UTF-16 character set, as defined by (its specification) [http://www.ietf.org/rfc/rfc2781.txt].

Decoders for UTF-16 will properly recognize BOM (byte order mark) markers for both big and little endian encodings, but encoders will generate big-endian UTF-16 with no BOM markers.

utf8Source Codeshared utf8

The UTF-8 character set, as defined by (its specification) [http://tools.ietf.org/html/rfc3629].