[fpc-devel] Memory consumed by strings

Sun Nov 23 09:31:39 CET 2008

On 2008-11-23 10:19, Mattias Gaertner wrote:
> On Sat, 22 Nov 2008 23:05:43 +0200
> listmember<listmember at letterboxes.org>  wrote:
>
>> Is there a way to determine how much memory is consumed by strings by
>> a running application?
>>
>> I'd like to know this, in particular, for FPC ana Lazarus --to begin
>> with.
>>
>> And, the reason I'd like to know this is this: Whenever I suggest
>> that char size be increased to 4, the idea gets opposed on the grouds
>> that it will need huge memory --4 times as much.
>>
>> There's of course some merit in that arguement, but I have no idea
>> what it is '4 times' of.
>>
>> This is not very engineer-like --it being unmeasured.
>>
>> Can anyone suggest a way to measure the memory load caused by strings?
>
> The exact amount depends on the application, but think about loading
> text files of 100mb into strings. This will need at least the
> 100mb plus the overhead for each string (at least 12 bytes). With 2 byte
> chars an extra of 100mb would be needed and with 4 byte chars 300mb
> additional mem would be needed.
>
> For example the lazarus IDE typically holds 50 to 200mb sources in
> memory. If this would be changed to unicodestring (2 byte per char) then
> the IDE would need 50 to 200mb more memory. And because many time
> consuming tasks are already bound by the memory bandwidth of current
> computers, the IDE would become twice as slow. Do the math for 4 byte
> per char.

What I had in mind wasn't to store the string data in UTF-32 (or UCS-4); 
it would still be UTF-8 or whatever.

I am only considering in memory representation being UTF-32 (or UCS-4).

This way, loading from and saving to would hardly be affected, yet 
in-memory operations would be a lot faster and more simplified.