[fpc-devel] Extended($FFFFFFFFFFFFFFFF) = -1?

Sat Mar 1 01:19:20 CET 2014

On 28 Feb 2014, at 23:43, Jonas Maebe wrote:

> 
> On 28 Feb 2014, at 21:07, Ewald wrote:
> 
>> 
>> On 28 Feb 2014, at 20:39, Jonas Maebe wrote:
>> 
>>> All hexadecimal constants are (conceptually) parsed as int64, so this is by design. int64($00000000ffffffff) is not -1.
>> 
>> So all numeric constants that are not floats are parsed as Int64's?
> 
> They are parsed as the smallest integer type that can represent them.

Ha, but an Int64 cannot represent this number. The maximum postive range of an Int64 is 2**63 - 1. The constant I used was 2**64 - 1. It doesn't fit. So this brings us to another issue: the constant cannot be represented. Yet no error or even a warning is raised.

> Because of the (unPascalish) decision to have an unsigned version of the largest supported integer type, there are indeed some cases that require decisions to define the behaviour

That is perfectly true. But shouldn't the most basic behaviour of a language be at the very least intuitive? There is an unsigned type that can hold this constant, yet it is not used except if you use a typecast. 

> and regardless of what decision you make, some people won't like it.

I don't believe that anyone ever expected the $FXXX XXXX XXXX XXXX to become less than zero (X denotes anything from 0 to F).

Its not about `liking` a decision. When we're talking about multiple inheritance, interfaces, strings, generics, we are talking about liking IMHO: everyone has his/her own coding style/habits/preferences. I have yet to meet someone who likes $FF.....FFFF to be -1.

> 
>> Isn't that view about numeric constants a bit limited (why an Int64 for example, you could've picked a virtual Int256 just as well)?
> 
> Supporting larger constant values in the compiler than what we support in the language itself would be very counter-intuitive.

In a way yes, in another way it allows the compiler to be able to just interpret the number and see where it gets. The thing is that you wouldn't get this weird behaviour at all since this only happens when the constant is about as large as the maximum internal compiler representable constant. Increase this size and there would be no issue. Take, for example the number -1. This could be represented with:
$1 FFFF FFFF FFFF FFFFF	(take a word length of 65 bit)

Then take the other constant $FFFF FFFF FFFF FFFF, this could be represented as
$0 FFFF FFFF FFFF FFFF

No data is lost in this process.

This mimics the exact behaviour of 32 constants. The exact datatype size can afterwards be resolved by removing either all 0's or all 1's from the left hand side of this representation.  If you removed zero's, pick an unsigned datatype. Extend to the right size (1,2,4,8 for time being) with 0's. If you removed one's, it was signed. You thus need one extra bit. Pick the right signed datatype and fill with 1's on the left hand side.

The only numbers that cannot be represented are -$FXXX XXXX XXXX XXXX. This could be fixed by going to a 66 bit internal datatype, but is would be pointless since there is no datatype that can represent this constant.

Really, all I'm asking is one extra bit ;-)

> 
>> Especially if you have a data type that can contain the number in it's original intention?
> 
> Hexadecimal numbers have no "intention" as far as signedness is concerned.

That's debatable (but, as you mention, classified as part of the language definition), but a constant has a certain magnitude. It is IMO this magnitude that is the point. I don't expect a number with a lot of digits (2**64 - 1) to become rather small in magnitude (1). That is what I mean by `intention`. With this I haven't said anything about signedness: I don't care if 762 is interpreted as a signed 32 bit integer or unsigned 16 bit integer, as long as the meaning remains.

If I don't want this `intention`, when performing `evil bit level hacks` (algorithms like the famous 0x5f3759df algorithm), I use a typecast to instruct the compiler how to store it.

> How they are interpreted is up to the language definition.
> 
>> Delphi compatibility I read in the bug report you mentioned, and I understand that in mode delphi (see below though for a bit of `issues`), but the example program is in mode fpc (or how is it called?). Can that at least be called a bug (in the documentation at the very least)?
> 
> No, unless the documentation states that the behaviour is different for mode fpc. The behaviour, in both FPC and Delphi modes, is by design.

I was looking at the wrong page (I was looking as `Numbers` instead of `Ordinal Types`). My mistake.

> 
>> By the way, what do you do when you want to port fpc to a one's comlement machine (if they still exist)? Is $FFFF FFFF FFFF FFFF equal to 0 then?
> 
> The internal representation of a machine is unrelated to how values in the source code are interpreted. Just like 'a' in an ASCII source file will always mean the character 'a', even if that source file is compiled for a machine that uses EBDIC. Numbers in FPC source files will always be interpreted as two's complement.

Good point.

> 
>> And when you have a CPU that has a native integer size of 128 bit, how do you do the transformation then? Just truncate the constant to a 64 bit wide integer? Admitted, it are rare cases...
> 
> If FPC is ever ported to such a machine and if a built-in integer type of 128 bit would then be added for constants (the second would not automatically follow from the first), then the behaviour /might/ change. Just like it did from Turbo Pascal to Delphi ($fffffffff is interpreted as -1 in TP, but as high(cardinal) in Delphi). Such a change would probably be done based on a modeswitch, rather than by default in all syntax modes.

Here you have it: it might change. The thing I say/argue is that if the compiler would interpret the constant as it is written in code:
	- A minus (`-`) before it means negative, nothing before it means positive
	- A lot of digits = A large number in magnitude
	- The larger the most significant digit of an equally long constant (#digits), the larger it is in magnitude

..., then the compiler doesn't need a change, it simply needs to be notified of the new datatype.

The subject of this thread violates all three intuitive interpretations [intuitive to my judgement -- this is how I learned numbers in basic school].

If the compiler doesn't find a fitting datatype for the constant, it should stop compiling and give an error IMHO - this is what it does, as you know, when you enter `$1 FFFF FFFF FFFF FFFF` as a constant. Actually, until today, I always thought it did just that.

--
Ewald