[fpc-devel] Extended($FFFFFFFFFFFFFFFF) = -1?

Mon Mar 3 01:08:06 CET 2014

On 03 Mar 2014, at 00:29, Hans-Peter Diettrich wrote:

> Ewald schrieb:
> 
>> It seems like sticking to one principle (signed integer as much as
>> possible) actually breaks another principle (bitpattern).
> 
> Wirth and his Pascal language are well designed with signed types above all, and unsigned types being subranges. In so far one could consider hex constants with the sign bit set as syntactical errors.
> 

Well, a warning or something like that (note, hint?) would be welcome indeed. Something like `Your constant will probably not be interpreted like you expect it due to [....]`.

>>> You do care about the signedness, because the only way to represent
>>> int64(-1) in hexadecimal is as $ffffffffffffffff.
> 
> Negative numbers never should be expressed in hex.

Agreed, if one wants a negative number, one should put a `-` in front of it.

> 
>> And what about -$1? Or is that too far fetched?
> 
> That's correct, because -$1 is -1 is a valid integral expression, without signedness problems.

Yeah, that is what I thought as well...

> 
>> This highest bit then reflects the sign.
> 
> The sign representation is machine specific, as you know. On 1's complement machines there exist two representation of zero, as +0 and -0, and you cannot express both as hexadecimal constants in an portable way.

Yes, but Jonas wrote that:

`The internal representation of a machine is unrelated to how values in the source code are interpreted. Just like 'a' in an ASCII source file will always mean the character 'a', even if that source file is compiled for a machine that uses EBDIC. Numbers in FPC source files will always be interpreted as two's complement.`

This means that the sign representation must be portable because it is always 2's complement. I suppose that the compiler will convert the representation of the sign to the appropriate notation for the machine/architecture it is compiling to.

So, hence my 2's complement notation for a _potential_ type.

> That's why high level languages, like Pascal, forbid hex representations of (possibly) negative numerical values.

That indeed would be a solution, but freepascal doesn't forbid it. Actually I am a bit opposed to forbidding stuff, so a simple warning of some sort would do just aswell for me.

> 
>> `-1` would then be $1 FFFF FFFF FFFF FFFF,
>> whereas $FFFF FFFF FFFF FFFF would be $0 FFFF FFFF FFFF FFFF. It
>> really is quite easy to store it like that and `fix` things [picking
>> a fitting datatype] afterwards.
> 
> The datatype has to be constructed/determined first, and *if* there exists a type with more than 64 bits, then it will be a signed type, with a 65 bit (unsigned) subrange matching your needs. But if no such type exists, you are lost.

Yes, that is true, but there always is a 64 signed/unsigned type (perhaps not native). On machines where, for example, only 32 bit wide datatypes are allowed, the virtual subrange should be 33 instead of 65 bytes.

Anyway, that is the way how I parse constants. The important rule here is that you don't need the full 65 bit in the final representation. The signedness of the type can fix this loss of the one bit.

> 
>> Anyway, then you have got backwards
>> compatibility to take care of, since there will be someone out there
>> who's code actually depends on this behaviour.
> 
> When we agree that a bitpattern of $FFFF FFFF FFFF FFFF can be interperted differently on different 32 bit machines, as -1 or -MaxInt,

Why `on 32 bit machines`? I'm fairly confident that this particular constant on this particular compiler version will generate the same outcome on every possible architecture out there (just change the `extended` to `single` in the original example, because extended tends to vary).

> then it's obvious that such a textual representation should cause an compilation error "not portable...". We know that such an error message has not yet been implemented, but if you insist on writing unportable code... :-]

I insist on using a constant that is:
	- 64 bit wide
	- Only contains 1's
	- Is interpreted as an unsigned number wherever mathematical operations are performed.

Those demands are quite portable, no?

My original problem was easily solved with a typecast QWord(<gargantuan constant goes here>), so that was no longer an issue. What baffled me though was  the fact that this (mis-: in my opinion) mis-parsing of certain constants is by design.

--
Ewald