c - Performance penalty on misaligned data -
As a CS student, I am trying to understand the basics of computer. As I stumbled across, I wanted to test the penalty for that performance on my own. I think what he is talking about and why this happens / should be.
Anyway, this is my code that I wrote the tasks that he wrote:
int main (zero) {int i = 0; Uint8_t alignment = 0; Uint8_t size = 1024 * 1024 * 10; // 10MiB uint8_t * block = maulok (shape); For (align = 0; alignment and lt; = 17; alignment ++) {start_t = clock (); Munge8 (block + alignment, size) for (i = 0; i & lt; 100000; i ++); End_t = clock (); Printf ("% i \ n", end_t - start_t); } // repeat, but next time the mug 16, the mung 32, the mangle 64}
I do not know what my CPU & amp; RAM is very fast, but the output of all four functions (Mung 8, Munje 16, Mung 32 and Mung 64) is always 3 or 4 (random, no pattern).
Is this possible? 100,000 iterations should work a lot, or am I wrong? I'm working on a Windows 7 Enterprise X64, Intel Core i7-4600U CPU @ 2.10GHz. All compiler optimizations have been discontinued I / O.
Not answered all questions given on SO why my solution is not working.
What am I doing wrong? Any help is greatly appreciated.
Edit: First of all: Thank you very much for helping you change the type of type from uint8_t
to uint32_t
After that, I have replaced all the inner loops, which can undefine the behavior of test tasks into two different rows:
while (data 32! = Data 32End) { Data32 ++; * Data 32 = - (* Data32); }
Now I am receiving a relatively stable output of 25/26, 12/13, 6 and 3, which calculates the average of 100 iterations. Is this a logical result? Does this mean that my architecture handles unexplained access to the coalition as fast (or slow) as accessibility? Do I measure timely inappropriately? Or is there a problem with accuracy when divided by 10? My new code:
int main (zero) {int i = 0; Uint8_t alignment = 0; Uint64_t size = 1024 * 1024 * 10; // 10MiB uint8_t * block = maulok (shape); Printf ("% i \ n \ n", CLOCKS_PER_SEC); // 1000 yields, just to compare how fast 'ticks' to my machine (alignment = 0; alignment and lt; = 17; alignment ++) {start_t = clock (); For (i = 0; i <100; i ++) singlet (block + alignment, size); End_t = clock (); Printf ("% i \ n", (end_t - start_ty) / 100); } // Again, repeat with all different actions}
General criticism is definitely also appreciated.
This integer fails due to overflow:
Uint8_t size = 1024 * 1024 * 10; // 10MiB
should be:
const size_t size = 1024 * 1024 * 10; // 10MiB
Do not know why you use 8-bit quantities to capture that thing.
Check how to enable all alerts for your compiler
Comments
Post a Comment