Friday, December 16, 2005

Java Unicode Support

Check the JSR-204 for java unicode support : JSR-204

Supplementary Character Support Approach
  • Use the primitive type int to represent code points in low-level APIs, such as the static methods of the Character class.

  • Interpret char sequences in all forms (char[], implementations of java.lang.CharSequence, implementations of java.text.CharacterIterator) as UTF-16 sequences, and promote their use in higher-level APIs.

  • Provide APIs to easily convert between various char and code point based representations.


Good blog on unicode support in j2se5 : John Conner blog
Highlights:
# char is a UTF-16 code unit, not a code point
# new low-level APIs use an int to represent a Unicode code point
# high level APIs have been updated to understand surrogate pairs
# a preference towards char sequence APIs instead of char based methods

No comments: