Archive for April, 2013

Published by Pui-chor on 16 Apr 2013

Sorting Chinese Characters by their Stroke Count

Finally come up with the perl script to sort the given characters according to their stroke counts:

my $utf_line = decode(“UTF-8″, $line);< -- given this $line variable
my $orderLine=&orderList($line);<-- call the function to order the characters
$line=$utf_line;<--original line
$line.="<br>$orderLine";<-- plus the sorted line
my @group=();<-- setting the group with stroke count
my @id=();<--store corresponding characters of specific count
while($orderLine=~/(\d+)([^\d]+)/g){<-- re-organized the sort based on the ordered pattern
push @id, $1;<-this extracted stroke count are stored
push @group, $2;<- with this corresponding characters of this stroke count
$line='';<-- rework the line and format it with stroke count
for(my $i=0;$i<=$#id;$i++){
$line.="<span class=label>$id[$i]</span>”;
my $charLine=$group[$i];
foreach my $char (@chars){
if($char && $char !~/\s/){
if($freq{$char}){$class=’ freq’;}
$line.=”<button class=’final$class’ onclick=\”window.location=’$URL’\”>$char</button>”;
} else {


This routine is to format the strokeCount-ordered characters.

The actual ordering routine is here:

sub orderList{
my $text=shift;< -- get the characters list
my $sorted=&sortByStroke($text);<-- call the sorted routine
my @sorted=split(',',$sorted);<-- break up the sorted character line
my $orderIt='';<-- preparing the ordered format with stroke count, then characters, ..
for(my $i=0;$i<=$#sorted;$i++){
my $ch=$sorted[$i];<-- the first character is to be the smallest stroke count
my $start=$ch_hash{$ch};<-- check its stroke count
if($i==0){<-- check the character being the first
$prevc=$sorted[0];<--save it in prevc, previous character
else {
$prevc=$sorted[$i-1];<-- not the first, then save prevc the earlier character
my $prev=$ch_hash{$prevc};<-- the the stroke count of previous character
if($i==0 || ($i>0 && $i< $#sorted && $start != $prev)){ <-- if previous stroke count differs, add the new starting stroke count
return $orderIt;
sub sortByStroke {
my $text=shift;<-- get the characters string
$text=~s/,//g;<-- remove the comma separator
my $decodedstr = decode( "utf8", $text);<-- decode string
my @chars=split(//,$decodedstr);<-- split into characters
%ch_hash=();<-- using the hash to keep track
foreach my $ch(@chars){
$hex=dec_to_hex(ord($ch));<-- get the character's hex value
$ch_hash{$ch}=$strokecount{ $hex };<-- save it in the hash for sorting
@sorted=sort { $ch_hash{$a} <=> $ch_hash{$b} } keys %ch_hash;<– sort now by stroke count
return join(‘,’,@sorted);
The hash %strokecount is built earlier for all chinese characters in unicode table carrying their stroke count or with the stroke count extracted. This is not shown here. The sub dec_to_hex is also not shown here as well which should not be hard to implement if you know perl.

Published by Pui-chor on 15 Apr 2013


多謝The Unicode Consortium(實現了漢字定碼的標準,雖然很多漢字重複了,我說的「重複」是指正式代表漢字的意思,不同時代和不同地域下產生的「漢字」,雖然寫法及讀法不相同,但基本字義的根是一樣,時代發展中的「漢字」字義稍有不同,仍是有跡可尋。

簡體字便是由正體字衍生出來,同一 「漢字」同一字義。反而是詞是丰富了!詞是字的組合去表達新的事物及思維。


Unicode for CJK - sample


U+3400    kCangjie    TM
U+3400    kTotalStrokes    5
U+3401    kCangjie    MOW
U+3401    kCihaiT    37.103
U+3401    kTotalStrokes    6
U+3402    kCangjie    PPP
U+3402    kTotalStrokes    6
U+3403    kCangjie    OML
U+3403    kTotalStrokes    3
U+3404    kCangjie    JV
U+3404    kTotalStrokes    3

U+3404  kTotalStrokes 3 就是說漢字 U+3404 有三劃。

U+4E09  kTotalStrokes 3 就是說漢字 U+4E09 有三劃。 圖中所見U+4E09是三字。

Published by Pui-chor on 09 Apr 2013

CSS margins specification

Margins can be specified as:

margin:25px 50px 75px 100px;
top margin is 25px
right margin is 50px
bottom margin is 75px
left margin is 100px
margin:25px 50px 75px;
top margin is 25px
right and left margins are 50px
bottom margin is 75px
margin:25px 50px;
top and bottom margins are 25px
right and left margins are 50px
all four margins are 25px

Each top, left, right, bottom margin can be specified using: