Rouge one : an integer representation story

Integers in C# come in two flavors : signed (int) and unsigned (uint). There are of course native integers (again signed nint and unsigned nuint), but I won’t be covering those.

Unsigned integer can hold an integer in range from 0 (uint.MinValue) to 4294967295 (uint.MaxValue). Signed integer (we are going to stick to 32-bit integers for the rest of this article) can hold an integer in range from -2147483648 (int.MinValue) to 2147483647 (int.MaxValue).

But wait, why the int.MinValue is not equal to-int.MaxValue? Where does this rouge one comes from?

To penetrate into this mystery we must first learn how does the computer store integer values.

Show us your bits!

Let’s start with positive 32-bit integer. We have 32 bits of space, thus we can store 2^32-1 different integer values.

// min unsigned int value
0             = 0000_0000_0000_0000_0000_0000_0000_0000

// max unsigned int value
4_294_967_295 = 1111_1111_1111_1111_1111_1111_1111_1111

Now, how do we store the negative number in the same 32-bit space? More specific, where do we store the sign?

The most obvious answer would be – in the most significant bit (MSB, the left one). This gives us the following logic.

 0 = 0000_0000_0000_0000_0000_0000_0000_0000
 1 = 0000_0000_0000_0000_0000_0000_0000_0001
-1 = 1000_0000_0000_0000_0000_0000_0000_0001

Let’s see what minimal and maximal numbers we end up with.

 2_147_483_647 = 0111_1111_1111_1111_1111_1111_1111_1111
-2_147_483_647 = 1111_1111_1111_1111_1111_1111_1111_1111

-2147483648 is nowhere to bee seen. Furthermore, we get two different representations of a decimal zero – positive one and a negative one.

 0 = 0000_0000_0000_0000_0000_0000_0000_0000
-0 = 1000_0000_0000_0000_0000_0000_0000_0000 

Obviously, the actual state of things is a bit more complex than our initial idea.

Lat’s think how we can alter our way of storing a sign to get -2147483648.

If we forget about the fact that MSB is a sign bit for a moment, then binary representation of 2147483648 (that’s 2^31) would be.

2_147_483_648 = 1000_0000_0000_0000_0000_0000_0000_0000

Then to get the -2147483648 we only need to agree that the MSB represents not only a sign, but a value of -1*2^31.

This gives us the following decimal to binary conversion formula.

1000_0000_0000_0000_0000_0000_0000_0000 -> -1 * 2^31 + 0 * 2^30 + ... + 0 * 2^0

Let’s see how we can use this formula to encode some other negative number. To get, say -127, we need to find the binary representation that will give us -127 = -2147483648 + <some number>. That number is 2_147_483_775 with the following binary representation.

2_147_483_775 = 1000_0000_0000_0000_0000_0000_0111_1111

Let’s observe some curious thing.

2_147_483_775 = 1000_0000_0000_0000_0000_0000_0111_1111
127           = 0000_0000_0000_0000_0000_0000_0111_1111

Unsigned 127 differs from unsigned 2_147_483_775 by MSB only.

Now lets encode -127 would in binary where MSB that encodes -1*2^31.

-127 = 1111_1111_1111_1111_1111_1111_1000_0001

And that one also looks suspiciously like unsigned 127 with all bits flipped, except for the first (least significant bit, LSB)!

Let’s take the next integer -128 and perform the same steps.

// unsigned
 128 = 0000_0000_0000_0000_0000_0000_1000_0000

// signed 
-128 = 1111_1111_1111_1111_1111_1111_1000_0000

If we just flip all the bits of the unsigned 128, the value we get will differ from the resulting encoded value by just 1!

// unsigned 
 128  = 0000_0000_0000_0000_0000_0000_1000_0000

// flipped 
 128' = 1111_1111_1111_1111_1111_1111_0111_1111

// signed 
-128  = 1111_1111_1111_1111_1111_1111_1000_0000

Therefore, to get -128 from unsigned 128 all we need to do is flip all the bits and add 1. Same goes for the previous example with 127.

And that’s exactly how the things actually work in memory and it’s called two’s complement! One important thing to note is that when adding 1 we ignore integer overflow, otherwise bad things would happen.

Before we dive into a wee bit of formal maths, lets perform one more conversion, but in revere.

What value is encoded as all bits set to 1?

// MSB is 1 so this is a negative number
x      = 1111_1111_1111_1111_1111_1111_1111_1111

// subtract 1
x-1    = 1111_1111_1111_1111_1111_1111_1111_1110

// flip the bits
(x-1)' = 0000_0000_0000_0000_0000_0000_0000_0001 // 1!

Yep, the humble -1 is ones all the way down.

We have the way of representing negative numbers, now let’s formalize things a bit and see how this helps us to simplify the subtraction.

Take it with a grain of maths

  • Suppose f(n) is a function that flips all the bits in n (e.g. f(1001_0111) = 0110_1000) then
  • -n = f(n) + 1
  • Therefore a calculation a = b - c
  • can be rewritten as a = b + (-c) = b + f(c) + 1

We have just substituted the subtraction with addition, and all the CPU needs to know to subtract two numbers is how to get two’s complement and add. Since the numbers are stored in two’s complement anyway – it just adds the numbers and calls it a day!

Next time we will dive into bitwise operators and calculate integer and unsigned integer minimal and maximal values ourselves using only those. Stay tuned!

1 thought on “Rouge one : an integer representation story

Leave a Reply

Your email address will not be published. Required fields are marked *