>> I am only considering in memory representation being UTF-32 (or >> UCS-4). > > What do you mean with 'memory representation'? That, each char in a string in memory would be 4-bytes (or more); yet, when saved on disk (or transmitted across the net etc.) it would be UTF-8 compressed. IOW, no compression applied to in-memory strings.