[fpc-devel] Unicode conversion routines
JoshyFun
joshyfun at gmail.com
Sun Nov 23 17:50:09 CET 2008
Hello Florian,
Sunday, November 23, 2008, 5:21:29 PM, you wrote:
>> It no so worst, technically it is easy to solve all of those issues,
>> as you seen in the bug report also Delphi has the same problems and it
>> has not been fixed to keep Delphi compatibility.
FK> Afaik we decided to apply the patch, however, I'd no time yet to do so.
Well, maybe I should say "not fixed till now for Delphi
compatibility". I think when UnicodeString is a fact for stable
releases many functions should be fixed, overloaded, written, etc.
>> Also I whish to know which basic unicode functions will be supported
>> by FPC, only upper/lower, or maybe some more like decompose,
>> normalize, char-word-line-paragraph iterators... I have some of them
>> written if the FPC team wants them.
FK> It mainly depends if it needs external libs or huge tables.
My functions do not used any external lib but tables are quite big.
Most tables has been written using a pascal code to balance between
some "if/case" lines and binary search on tables, but tables are still
big but I think that they can not be too much shorter and keep
functionality. Examples:
------------------------------------------------
//LowerCase arrays size: 2376 bytes
const UnicodeLowerCaseArraySource: array [0..593] of WORD=(
//UpperCase arrays size: 2408 bytes
const UnicodeUpperCaseArraySource: array [0..601] of WORD=(
//TitleCase arrays size: 2424 bytes
const UnicodeTitleCaseArraySource: array [0..605] of WORD=(
type TUnicodeWordBreakEntry=packed record
BeginRange: LongInt;
EndRange: LongInt;
WordBreakClass: TUnicodeWordBreak;
end;
const TUnicodeWordBreakArray: array [0..737] of TUnicodeWordBreakEntry =(
------------------------------------------------
This tables also need some functions like:
function UnicodeUCS4ToLowerCase(const UC: integer): integer;
begin
case UC of
$000041..$00005A: Result:=UC+32;
$0000C0..$0000D6: Result:=UC+32;
$0000D8..$0000DE: Result:=UC+32;
$000391..$0003A1: Result:=UC+32;
$0003A3..$0003AB: Result:=UC+32;
$000400..$00040F: Result:=UC+80;
$000410..$00042F: Result:=UC+32;
$000531..$000556: Result:=UC+48;
$0010A0..$0010C5: Result:=UC+7264;
$001F08..$001F0F: Result:=UC+-8;
$001F18..$001F1D: Result:=UC+-8;
$001F28..$001F2F: Result:=UC+-8;
$001F38..$001F3F: Result:=UC+-8;
$001F48..$001F4D: Result:=UC+-8;
$001F68..$001F6F: Result:=UC+-8;
$001F88..$001F8F: Result:=UC+-8;
$001F98..$001F9F: Result:=UC+-8;
$001FA8..$001FAF: Result:=UC+-8;
$002160..$00216F: Result:=UC+16;
$0024B6..$0024CF: Result:=UC+26;
$002C00..$002C2E: Result:=UC+48;
$00FF21..$00FF3A: Result:=UC+32;
$010400..$010427: Result:=UC+40;
// In offsets calculated 406 items of 1023
else begin
Result:=UnicodeCaseBS(UnicodeLowerCaseArraySource,UC,0,High(UnicodeLowerCaseArraySource));
if Result=-1 Then Result:=UC else Result:=UnicodeLowerCaseArrayTarget[Result];
end;
end;
end;
--
Best regards,
JoshyFun
More information about the fpc-devel
mailing list