[fpc-devel] Unicode conversion routines

JoshyFun joshyfun at gmail.com
Sun Nov 23 17:50:09 CET 2008


Hello Florian,

Sunday, November 23, 2008, 5:21:29 PM, you wrote:

>> It no so worst, technically it is easy to solve all of those issues,
>> as you seen in the bug report also Delphi has the same problems and it
>> has not been fixed to keep Delphi compatibility. 

FK> Afaik we decided to apply the patch, however, I'd no time yet to do so.

Well, maybe I should say "not fixed till now for Delphi
compatibility". I think when UnicodeString is a fact for stable
releases many functions should be fixed, overloaded, written, etc.

>> Also I whish to know which basic unicode functions will be supported
>> by FPC, only upper/lower, or maybe some more like decompose,
>> normalize, char-word-line-paragraph iterators... I have some of them
>> written if the FPC team wants them.

FK> It mainly depends if it needs external libs or huge tables.

My functions do not used any external lib but tables are quite big.
Most tables has been written using a pascal code to balance between
some "if/case" lines and binary search on tables, but tables are still
big but I think that they can not be too much shorter and keep
functionality. Examples:

------------------------------------------------
//LowerCase arrays size: 2376 bytes
const UnicodeLowerCaseArraySource: array [0..593] of WORD=(
//UpperCase arrays size: 2408 bytes
const UnicodeUpperCaseArraySource: array [0..601] of WORD=(
//TitleCase arrays size: 2424 bytes
const UnicodeTitleCaseArraySource: array [0..605] of WORD=(

type TUnicodeWordBreakEntry=packed record
  BeginRange: LongInt;
  EndRange: LongInt;
  WordBreakClass: TUnicodeWordBreak;
end;
const TUnicodeWordBreakArray: array [0..737] of TUnicodeWordBreakEntry =(
------------------------------------------------

This tables also need some functions like:


function UnicodeUCS4ToLowerCase(const UC: integer): integer;
begin
  case UC of
  $000041..$00005A: Result:=UC+32;
  $0000C0..$0000D6: Result:=UC+32;
  $0000D8..$0000DE: Result:=UC+32;
  $000391..$0003A1: Result:=UC+32;
  $0003A3..$0003AB: Result:=UC+32;
  $000400..$00040F: Result:=UC+80;
  $000410..$00042F: Result:=UC+32;
  $000531..$000556: Result:=UC+48;
  $0010A0..$0010C5: Result:=UC+7264;
  $001F08..$001F0F: Result:=UC+-8;
  $001F18..$001F1D: Result:=UC+-8;
  $001F28..$001F2F: Result:=UC+-8;
  $001F38..$001F3F: Result:=UC+-8;
  $001F48..$001F4D: Result:=UC+-8;
  $001F68..$001F6F: Result:=UC+-8;
  $001F88..$001F8F: Result:=UC+-8;
  $001F98..$001F9F: Result:=UC+-8;
  $001FA8..$001FAF: Result:=UC+-8;
  $002160..$00216F: Result:=UC+16;
  $0024B6..$0024CF: Result:=UC+26;
  $002C00..$002C2E: Result:=UC+48;
  $00FF21..$00FF3A: Result:=UC+32;
  $010400..$010427: Result:=UC+40;
// In offsets calculated 406 items of 1023
  else begin
    Result:=UnicodeCaseBS(UnicodeLowerCaseArraySource,UC,0,High(UnicodeLowerCaseArraySource));
    if Result=-1 Then Result:=UC else Result:=UnicodeLowerCaseArrayTarget[Result];
    end;
  end;
end;

-- 
Best regards,
 JoshyFun




More information about the fpc-devel mailing list