Yesterday, someone showed me this code:
#include <stdio.h>
int main(void)
{
unsigned long foo = 506097522914230528;
for (int i = 0; i < sizeof(unsigned long); ++i)
printf("%u ", *(((unsigned char *) &foo) + i));
putchar('\n');
return 0;
}
That results in:
0 1 2 3 4 5 6 7
I am very confused, mainly with the line in the for
loop. From what I can tell, it seems like &foo
is being cast to an unsigned char *
and then being added by i
. I think *(((unsigned char *) &foo) + i)
is a more verbose way of writing ((unsigned char *) &foo)[i]
, but this makes it seem like foo
, an unsigned long
is being indexed. If so, why? The rest of the loop seems typical to printing all elements of an array, so everything seems to point to this being true. The cast to unsigned char *
is further confusing me. I tried searching about casting integer types to char *
specifically on google, but my research got stuck after some unhelpful search results about casting int
to char
, itoa()
, etc. 506097522914230528
specifically prints out 0 1 2 3 4 5 6 7
, but other numbers appear to have their own unique 8 numbers shown in the output, and bigger numbers seem to fill in more zeroes.
As a preface, this program will not necessarily run exactly like how it does in the question as it exhibits implementation-defined behavior. In addition to this, tweaking the program slightly can cause undefined behavior as well. More information on this at the end.
The first line of the main
function defines an unsigned long foo
as 506097522914230528
. This seems confusing at first, but in hexadecimal it looks like this: 0x0706050403020100
.
This number consists of the following bytes: 0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00
. By now, you can probably see its relation to the output. If you're still confused about how this translates into the output, take a look at the for loop.
for (int i = 0; i < sizeof(unsigned long); ++i)
printf("%u ", *(((unsigned char *) &foo) + i));
Assuming a long
is 8 bytes long, this loop runs eight times (remember, two hex digits are enough to display all possible values of a byte, and since there are 16 digits in the hex number, the result is 8, so the for loop runs eight times). Now the real confusing part is the second line. Think about it this way: as I previously mentioned, two hex digits can show all possible values of a byte, right? So then if we could isolate the last two digits of this number, we would get a byte value of seven! Now, assume the long
is actually an array which looks like this:
{00, 01, 02, 03, 04, 05, 06, 07}
We get the address of foo
with &foo
, cast it to an unsigned char *
to isolate two digits, then use pointer arithmetic to basically get foo[i]
if foo
is an array of eight bytes. As I mentioned in my question, this probably looks less confusing as ((unsigned char *) &foo)[i]
.
A bit of a warning: This program exhibits implementation-defined behavior. This means that this program will not necessarily work the same way/give the same output for all implementations of C. Not only is a long 32 bits in some implementations, but when we declare the unsigned long
, the way/order in which it stores the bytes of 0x0706050403020100
(AKA endianness) is also implementation-defined. Credit to @philipxy for pointing out the implementation-defined behavior first. This type punning causes another issue which @Ruslan pointed out, which is that, if the long
is casted to anything other than a char *
/unsigned char *
, C's strict aliasing rule comes into play and you will get undefined behavior (Credit of the link goes to @Ruslan as well). More detail on these two points in the comment section.