[fpc-devel] Unit for handling UTF-8 strings

Michael Van Canneyt michael at freepascal.org
Tue Apr 9 08:55:15 CEST 2013



On Tue, 9 Apr 2013, Mattias Gaertner wrote:

> On Tue, 09 Apr 2013 08:24:11 +0200
> Michael Schnell <mschnell at lumino.de> wrote:
>
>> On 04/08/2013 07:02 PM, Mattias Gaertner wrote:
>>> I guess, you mean encoded string types.
>>
>> AFAIK, you can just create string variables of the appropriate coding
>> type and an assignment will do auto-conversion.
>
> Yes.
> But how do you examine the characters?
> If I understand Michael right, there will be some "implicit functions"
> for that. I wonder how they work.

See the character unit:

  // flat functions
   function ConvertFromUtf32(AChar : UCS4Char) : UnicodeString;
   function ConvertToUtf32(const AString : UnicodeString; AIndex : Integer) : UCS4Char; overload;
   function ConvertToUtf32(const AString : UnicodeString; AIndex : Integer; out ACharLength : Integer) : UCS4Char; overload;
   function ConvertToUtf32(const AHighSurrogate, ALowSurrogate : UnicodeChar) : UCS4Char; overload;
   function GetNumericValue(AChar : UnicodeChar) : Double; overload;
   function GetNumericValue(const AString : UnicodeString; AIndex : Integer) : Double; overload;
   function GetUnicodeCategory(AChar : UnicodeChar) : TUnicodeCategory; overload;
   function GetUnicodeCategory(const AString : UnicodeString; AIndex : Integer) : TUnicodeCategory; overload;
   function IsControl(AChar : UnicodeChar) : Boolean; overload;
   function IsControl(const AString : UnicodeString; AIndex : Integer) : Boolean; overload;
   function IsDigit(AChar : UnicodeChar) : Boolean; overload;
   function IsDigit(const AString : UnicodeString; AIndex : Integer) : Boolean; overload;
   function IsSurrogate(AChar : UnicodeChar) : Boolean; overload;
   function IsSurrogate(const AString : UnicodeString; AIndex : Integer) : Boolean; overload;
   function IsHighSurrogate(AChar : UnicodeChar) : Boolean; overload;
   function IsHighSurrogate(const AString : UnicodeString; AIndex : Integer) : Boolean; overload;
   function IsLowSurrogate(AChar : UnicodeChar) : Boolean; overload;
   function IsLowSurrogate(const AString : UnicodeString; AIndex : Integer) : Boolean; overload;
   function IsSurrogatePair(const AHighSurrogate, ALowSurrogate : UnicodeChar) : Boolean; overload;
   function IsSurrogatePair(const AString : UnicodeString; AIndex : Integer) : Boolean; overload;
   function IsLetter(AChar : UnicodeChar) : Boolean; overload;
   function IsLetter(const AString : UnicodeString; AIndex : Integer) : Boolean; overload;
   function IsLetterOrDigit(AChar : UnicodeChar) : Boolean; overload;
   function IsLetterOrDigit(const AString : UnicodeString; AIndex : Integer) : Boolean; overload;
   function IsLower(AChar : UnicodeChar) : Boolean; overload;
   function IsLower(const AString : UnicodeString; AIndex : Integer) : Boolean; overload;
   function IsNumber(AChar : UnicodeChar) : Boolean; overload;
   function IsNumber(const AString : UnicodeString; AIndex : Integer) : Boolean; overload;
   function IsPunctuation(AChar : UnicodeChar) : Boolean; overload;
   function IsPunctuation(const AString : UnicodeString; AIndex : Integer) : Boolean; overload;
   function IsSeparator(AChar : UnicodeChar) : Boolean; overload;
   function IsSeparator(const AString : UnicodeString; AIndex : Integer) : Boolean; overload;
   function IsSymbol(AChar : UnicodeChar) : Boolean; overload;
   function IsSymbol(const AString : UnicodeString; AIndex : Integer) : Boolean; overload;
   function IsUpper(AChar : UnicodeChar) : Boolean; overload;
   function IsUpper(const AString : UnicodeString; AIndex : Integer) : Boolean; overload;
   function IsWhiteSpace(AChar : UnicodeChar) : Boolean; overload;
   function IsWhiteSpace(const AString : UnicodeString; AIndex : Integer) : Boolean; overload;
   function ToLower(AChar : UnicodeChar) : UnicodeChar; overload;
   function ToLower(const AString : UnicodeString) : UnicodeString; overload;
   function ToUpper(AChar : UnicodeChar) : UnicodeChar; overload;
   function ToUpper(const AString : UnicodeString) : UnicodeString; overload;



More information about the fpc-devel mailing list