Archive for January, 2014

Published by Pui-chor on 23 Jan 2014

unicode and javascript

There is a good website explaining the unicode use in javascript:

This does explain the fact that why does the character 𨂾 is represented with 4 bytes. They are called “Surrogates”. The numeric value of the character under unicode definition is 164030 which cannot be represented by the maximum of 0xFFFF in hex. 164030 is 0×280BE in hex, 5 hex digits and not 4 hex (FFFF).

The unexplained is the code numeric value of 62539 generates the character 肾 using javascript’s fromCharCode() function and it displayed as 𨂾 in text but  in webpage.

I still need to find out why the discrepancy in the display but for sure displaying  𨂾 is wrong as this character is defined with numerical value of 164030 but not 62539.

Published by Pui-chor on 22 Jan 2014

UTF-8 Unicode

Just discover the problem using UTF-8 unicode. UTF-8 is a variable length code of the unicode representation which can display any symbol of any language round the world. I just came across a HAN character  and 𨂾. These two characters displayed exactly the same when they are in INPUT element as well as TEXTAREA element. but they are totally different in UTF-8. One is 2 bytes and other is 4 bytes.

UTF-8 funny outcome

You can see in the blog that the two characters are displayed differentlywithin the webpage or view with HTML browser. But when they are viewed within the text file, the one which cannot be seen will be displayed properly while the other character became un-recognizable. Hope that I can find out the reason behind.