I need to clear a chunk of memory (32-bit float
s) to zero, and I use my_set()
:
static inline void my_set(float *dst, float v, int n)
{
while (n-- > 0)
*(dst++) = v;
}
#define MY_SIZE 1024
int main()
{
float my_mem[MY_SIZE];
my_set(&my_mem, 0.0f, MY_SIZE)
}
Should I use memset() instead? Will it perform better on a platform with limited resources? Will GCC optimize my_set
to use memset
?
Seems that it will: https://godbolt.org/z/hP8jr1odP
Seems that it depends on architecture and array size, but will usually be optimized to memset
: https://godbolt.org/z/YqnrEfPGY
#define MY_SIZE 1024 * 1024
static inline void my_set(float *dst, float v, int n)
{
while (n-- > 0)
*(dst++) = v;
}
float my_mem[MY_SIZE];
int main()
{
my_set(my_mem, 0.0f, MY_SIZE);
}
Array size \ Architecture | x86_64 | arm | arm64 | risc-v |
---|---|---|---|---|
1 KiB | loop | memset | memset | memset |
1 MiB | memset | memset | memset | memset |
Still, I'd trust the compiler to know what's best and not worry too much.
Should I use memset() instead?
Yes.
Will it perform better on a platform with limited resources?
Yes. Or it will at least not perform worse.
Will GCC optimize my_set to use memset?
Yes, no, maybe. When compiling with -ffreestanding
(embedded systems target) it tries not to include any library calls and then discards any memset
calls. Otherwise in PC-like environments, the machine code seems to boil down to a memset
call.
Please note that setting something to zero explicitly or setting it to a value will generate very different machine code.
memset
and similar library functions are optimized to perform well on the data width of the CPU. Which is not necessarily the same as the data width of a float, typically 32 bit.