[fpc-devel] Unicode resource strings

Mon Aug 20 23:00:47 CEST 2012

Hans-Peter Diettrich wrote:
> Mark Morgan Lloyd schrieb:
> 
>> I've got a couple of terminal emulators using WideChar and WideString 
>> for internal manipulation, what /should/ I be using? and where does it 
>> leave things like Sorokin's regex unit, which similarly use WideChar 
>> and WideString?
> 
> Depends on which libraries you use. AFAIK SBCS RegEx works for both Ansi 
> and UTF-8 strings, so that an UTF-16 library is optional. For the 
> terminal emulators I'd think that it's sufficient to introduce an 
> internal string type that allows to switch between UTF-8 and UTF-16, so 
> that the (different?) behaviour can be tested. When there exist 
> differences, this indicates that the WideString emulators *only* handle 
> Unicode BMP characters, not surrogate pairs, and you have to decide 
> whether this restriction is okay for you.

I think I need to clarify. The terminal emulators are not for a standard
coding such as UTF-8, but accept a non-standard byte sequence over e.g.
a telnet or serial connection and convert that to a particular set of
characters to emulate e.g. an IBM Selectric APL golfball.

Sorokin's regex unit is a separate issue, and applies to FPC's regexpr
package which uses WideChar: I don't know whether this would be
problematic on Windows.

-- 
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]