Class CharsetUtil

java.lang.Object
de.tomatengames.util.CharsetUtil

public class CharsetUtil extends Object
Provides utilities to work with character encodings.
Since:
1.2
  • Method Details

    • encodeUTF8

      public static int encodeUTF8(int codePoint)
      Encodes the specified Unicode code point using UTF-8.
      Parameters:
      codePoint - The code point.
      Returns:
      The encoded UTF-8 bytes combined into an integer. Higher order bytes that are 0x00 are not part of the encoded code point. The lowest order byte is always part of the encoded code point. For example, let enc be the returned value. If (enc & 0xFFFFFF00) == 0, the code point is encoded as a single byte. If (enc & 0xFFFF0000) == 0 and (enc & 0xFFFFFF00) != 0, the code point is encoded as two bytes.
      Throws:
      IllegalArgumentException - If the code point is out of range.
    • encodeUTF8

      public static long encodeUTF8(int codePoint, OutputStream out) throws IOException
      Encodes the specified Unicode code point using UTF-8 and writes the result into the OutputStream.
      Parameters:
      codePoint - The code point that should be encoded.
      out - The output stream. Must not be null.
      Returns:
      The amount of bytes written. Minimum 1 and maximum 4.
      Throws:
      IOException - If an I/O error occurs.
      IllegalArgumentException - If the code point is out of range.
      Implementation Note:
      In most cases the return value is summed. To make it easier to sum large outputs, the return type is long instead of int.
    • encodeUTF8

      public static long encodeUTF8(String str, OutputStream out) throws IOException
      Encodes the specified String using UTF-8 and writes the result into the OutputStream.
      Parameters:
      str - The string that should be encoded. Not null.
      out - The output stream. Not null.
      Returns:
      The amount of bytes written.
      Throws:
      IOException - If an I/O error occurs.
      IllegalArgumentException - e.g. if a code point is out of range.
    • encodeUTF8

      public static long encodeUTF8(String str, OutputStream out, long maxOutput) throws IOException
      Encodes the specified String using UTF-8 and writes the result into the OutputStream.

      Up to maxOutput+3 bytes may be written to the output. If more than maxOutput bytes should be written to the output, a LimitException is thrown.

      Parameters:
      str - The string that should be encoded. Not null.
      out - The output stream. Not null.
      maxOutput - The maximum output byte length. Must not be negative.
      Returns:
      The amount of bytes written.
      Throws:
      IOException - If an I/O error occurs.
      IllegalArgumentException - e.g. if a code point is out of range.
      LimitException - If the maximum output length is exceeded.
      Since:
      1.4
    • encodeUTF8

      public static long encodeUTF8(String str, OutputStream out, byte[] buf) throws IOException
      Encodes the specified String using UTF-8 and writes the result into the OutputStream. The buffer is used to reduce the write operations to the OutputStream.
      Parameters:
      str - The string that should be encoded. Not null.
      out - The output stream. Not null.
      buf - The buffer. Not null. The length of the buffer must be at least 4. Any positions of the buffer could be written by this method.
      Returns:
      The amount of bytes written.
      Throws:
      IOException - If an I/O error occurs.
      IllegalArgumentException - e.g. if a code point is out of range.
      Since:
      1.7
    • encodeUTF8

      public static long encodeUTF8(String str, OutputStream out, long maxOutput, byte[] buf) throws IOException
      Encodes the specified String using UTF-8 and writes the result into the OutputStream. The buffer is used to reduce the write operations to the OutputStream.

      Up to maxOutput bytes may be written to the output. If more than maxOutput bytes should be written to the output, a LimitException is thrown.

      Parameters:
      str - The string that should be encoded. Not null.
      out - The output stream. Not null.
      maxOutput - The maximum output byte length. Must not be negative.
      buf - The buffer. Not null. The length of the buffer must be at least 4. Any positions of the buffer could be written by this method.
      Returns:
      The amount of bytes written.
      Throws:
      IOException - If an I/O error occurs.
      IllegalArgumentException - e.g. if a code point is out of range.
      LimitException - If the maximum output length is exceeded.
      Since:
      1.7
    • encodeUTF8

      public static int encodeUTF8(int codePoint, byte[] out, int offset)
      Encodes the specified Unicode code point using UTF-8 and writes the result into the output array at the specified offset. Depending on the code point, 1 to 4 bytes are written.
      Parameters:
      codePoint - The Unicode code point that should be encoded.
      out - The output array into which the encoded data should be written. Must not be null. Must be long enough to store the encoded data. It is recommended that the length is at least offset+4.
      offset - The start position in the output array to be written. Must not be negative.
      Returns:
      The amount of bytes written. Minimum 1 and maximum 4.
      Throws:
      IndexOutOfBoundsException - If the offset is negative or the output array is too short to store the encoded code point.
      IllegalArgumentException - If the code point is out of range. In this case no bytes are written.
      Since:
      1.3
    • encodeUTF8

      public static int encodeUTF8(String str, byte[] out, int offset)
      Encodes the specified String using UTF-8 and writes the result into the byte array.
      Parameters:
      str - The string that should be encoded. Not null.
      out - The output array into which the encoded data should be written. Must not be null. Must be long enough to store the encoded data. It is recommended that the length is at least offset+4*str.length().
      offset - The start position in the output array to be written. Must not be negative.
      Returns:
      The amount of bytes written.
      Throws:
      IndexOutOfBoundsException - If the offset is negative or the output array is too short to store the encoded code point.
      IllegalArgumentException - If a code point is out of range.
      Since:
      1.4
    • decodeUTF8

      public static int decodeUTF8(InputStream in) throws IOException, CharacterDecodeException
      Decodes one UTF-8 code point from the InputStream.
      Parameters:
      in - The InputStream from which the code point should be read. Not null. 1 to 4 bytes will be read depending on the code point.
      Returns:
      The decoded code point in the range from 0 to 10FFFF, or -1 if the end of the stream is reached.
      Throws:
      CharacterDecodeException - If the UTF-8 code point cannot be decoded.
      IOException - If an I/O error occurs.
      Since:
      1.6