Class CharsetUtil
java.lang.Object
de.tomatengames.util.CharsetUtil
Provides utilities to work with character encodings.
- Since:
- 1.2
-
Method Summary
Modifier and TypeMethodDescriptionstatic intDecodes one UTF-8 code point from theInputStream.static intencodeUTF8(int codePoint) Encodes the specified Unicode code point using UTF-8.static intencodeUTF8(int codePoint, byte[] out, int offset) Encodes the specified Unicode code point using UTF-8 and writes the result into the output array at the specified offset.static longencodeUTF8(int codePoint, OutputStream out) Encodes the specified Unicode code point using UTF-8 and writes the result into theOutputStream.static intencodeUTF8(String str, byte[] out, int offset) Encodes the specifiedStringusing UTF-8 and writes the result into the byte array.static longencodeUTF8(String str, OutputStream out) Encodes the specifiedStringusing UTF-8 and writes the result into theOutputStream.static longencodeUTF8(String str, OutputStream out, byte[] buf) Encodes the specifiedStringusing UTF-8 and writes the result into theOutputStream.static longencodeUTF8(String str, OutputStream out, long maxOutput) Encodes the specifiedStringusing UTF-8 and writes the result into theOutputStream.static longencodeUTF8(String str, OutputStream out, long maxOutput, byte[] buf) Encodes the specifiedStringusing UTF-8 and writes the result into theOutputStream.
-
Method Details
-
encodeUTF8
public static int encodeUTF8(int codePoint) Encodes the specified Unicode code point using UTF-8.- Parameters:
codePoint- The code point.- Returns:
- The encoded UTF-8 bytes combined into an integer.
Higher order bytes that are
0x00are not part of the encoded code point. The lowest order byte is always part of the encoded code point. For example, letencbe the returned value. If(enc & 0xFFFFFF00) == 0, the code point is encoded as a single byte. If(enc & 0xFFFF0000) == 0and(enc & 0xFFFFFF00) != 0, the code point is encoded as two bytes. - Throws:
IllegalArgumentException- If the code point is out of range.
-
encodeUTF8
Encodes the specified Unicode code point using UTF-8 and writes the result into theOutputStream.- Parameters:
codePoint- The code point that should be encoded.out- The output stream. Must not benull.- Returns:
- The amount of bytes written. Minimum 1 and maximum 4.
- Throws:
IOException- If an I/O error occurs.IllegalArgumentException- If the code point is out of range.- Implementation Note:
- In most cases the return value is summed.
To make it easier to sum large outputs, the return type is
longinstead ofint.
-
encodeUTF8
Encodes the specifiedStringusing UTF-8 and writes the result into theOutputStream.- Parameters:
str- The string that should be encoded. Notnull.out- The output stream. Notnull.- Returns:
- The amount of bytes written.
- Throws:
IOException- If an I/O error occurs.IllegalArgumentException- e.g. if a code point is out of range.
-
encodeUTF8
Encodes the specifiedStringusing UTF-8 and writes the result into theOutputStream.Up to
maxOutput+3bytes may be written to the output. If more thanmaxOutputbytes should be written to the output, aLimitExceptionis thrown.- Parameters:
str- The string that should be encoded. Notnull.out- The output stream. Notnull.maxOutput- The maximum output byte length. Must not be negative.- Returns:
- The amount of bytes written.
- Throws:
IOException- If an I/O error occurs.IllegalArgumentException- e.g. if a code point is out of range.LimitException- If the maximum output length is exceeded.- Since:
- 1.4
-
encodeUTF8
Encodes the specifiedStringusing UTF-8 and writes the result into theOutputStream. The buffer is used to reduce the write operations to the OutputStream.- Parameters:
str- The string that should be encoded. Notnull.out- The output stream. Notnull.buf- The buffer. Notnull. The length of the buffer must be at least4. Any positions of the buffer could be written by this method.- Returns:
- The amount of bytes written.
- Throws:
IOException- If an I/O error occurs.IllegalArgumentException- e.g. if a code point is out of range.- Since:
- 1.7
-
encodeUTF8
public static long encodeUTF8(String str, OutputStream out, long maxOutput, byte[] buf) throws IOException Encodes the specifiedStringusing UTF-8 and writes the result into theOutputStream. The buffer is used to reduce the write operations to the OutputStream.Up to
maxOutputbytes may be written to the output. If more thanmaxOutputbytes should be written to the output, aLimitExceptionis thrown.- Parameters:
str- The string that should be encoded. Notnull.out- The output stream. Notnull.maxOutput- The maximum output byte length. Must not be negative.buf- The buffer. Notnull. The length of the buffer must be at least4. Any positions of the buffer could be written by this method.- Returns:
- The amount of bytes written.
- Throws:
IOException- If an I/O error occurs.IllegalArgumentException- e.g. if a code point is out of range.LimitException- If the maximum output length is exceeded.- Since:
- 1.7
-
encodeUTF8
public static int encodeUTF8(int codePoint, byte[] out, int offset) Encodes the specified Unicode code point using UTF-8 and writes the result into the output array at the specified offset. Depending on the code point, 1 to 4 bytes are written.- Parameters:
codePoint- The Unicode code point that should be encoded.out- The output array into which the encoded data should be written. Must not benull. Must be long enough to store the encoded data. It is recommended that the length is at leastoffset+4.offset- The start position in the output array to be written. Must not be negative.- Returns:
- The amount of bytes written. Minimum 1 and maximum 4.
- Throws:
IndexOutOfBoundsException- If the offset is negative or the output array is too short to store the encoded code point.IllegalArgumentException- If the code point is out of range. In this case no bytes are written.- Since:
- 1.3
-
encodeUTF8
Encodes the specifiedStringusing UTF-8 and writes the result into the byte array.- Parameters:
str- The string that should be encoded. Notnull.out- The output array into which the encoded data should be written. Must not benull. Must be long enough to store the encoded data. It is recommended that the length is at leastoffset+4*str.length().offset- The start position in the output array to be written. Must not be negative.- Returns:
- The amount of bytes written.
- Throws:
IndexOutOfBoundsException- If the offset is negative or the output array is too short to store the encoded code point.IllegalArgumentException- If a code point is out of range.- Since:
- 1.4
-
decodeUTF8
Decodes one UTF-8 code point from theInputStream.- Parameters:
in- The InputStream from which the code point should be read. Notnull. 1 to 4 bytes will be read depending on the code point.- Returns:
- The decoded code point in the range from
0to10FFFF, or -1 if the end of the stream is reached. - Throws:
CharacterDecodeException- If the UTF-8 code point cannot be decoded.IOException- If an I/O error occurs.- Since:
- 1.6
-