c - Performance penalty on misaligned data -


As a CS student, I am trying to understand the basics of computer. As I stumbled across, I wanted to test the penalty for that performance on my own. I think what he is talking about and why this happens / should be.

Anyway, this is my code that I wrote the tasks that he wrote:

  int main (zero) {int i = 0; Uint8_t alignment = 0; Uint8_t size = 1024 * 1024 * 10; // 10MiB uint8_t * block = maulok (shape); For (align = 0; alignment and lt; = 17; alignment ++) {start_t = clock (); Munge8 (block + alignment, size) for (i = 0; i & lt; 100000; i ++); End_t = clock (); Printf ("% i \ n", end_t - start_t); } // repeat, but next time the mug 16, the mung 32, the mangle 64}  

I do not know what my CPU & amp; RAM is very fast, but the output of all four functions (Mung 8, Munje 16, Mung 32 and Mung 64) is always 3 or 4 (random, no pattern).

Is this possible? 100,000 iterations should work a lot, or am I wrong? I'm working on a Windows 7 Enterprise X64, Intel Core i7-4600U CPU @ 2.10GHz. All compiler optimizations have been discontinued I / O.

Not answered all questions given on SO why my solution is not working.

What am I doing wrong? Any help is greatly appreciated.

Edit: First of all: Thank you very much for helping you change the type of type from uint8_t to uint32_t After that, I have replaced all the inner loops, which can undefine the behavior of test tasks into two different rows:

  while (data 32! = Data 32End) { Data32 ++; * Data 32 = - (* Data32); }  

Now I am receiving a relatively stable output of 25/26, 12/13, 6 and 3, which calculates the average of 100 iterations. Is this a logical result? Does this mean that my architecture handles unexplained access to the coalition as fast (or slow) as accessibility? Do I measure timely inappropriately? Or is there a problem with accuracy when divided by 10? My new code:

  int main (zero) {int i = 0; Uint8_t alignment = 0; Uint64_t size = 1024 * 1024 * 10; // 10MiB uint8_t * block = maulok (shape); Printf ("% i \ n \ n", CLOCKS_PER_SEC); // 1000 yields, just to compare how fast 'ticks' to my machine (alignment = 0; alignment and lt; = 17; alignment ++) {start_t = clock (); For (i = 0; i <100; i ++) singlet (block + alignment, size); End_t = clock (); Printf ("% i \ n", (end_t - start_ty) / 100); } // Again, repeat with all different actions}  

General criticism is definitely also appreciated.

This integer fails due to overflow:

  Uint8_t size = 1024 * 1024 * 10; // 10MiB  

should be:

  const size_t size = 1024 * 1024 * 10; // 10MiB  

Do not know why you use 8-bit quantities to capture that thing.

Check how to enable all alerts for your compiler


Comments

Popular posts from this blog

mysql - How to enter php data into a html multiple select box -

java - Can't add JTree to JPanel of a JInternalFrame -

c++ - Cassandra datastax cpp driver - avoiding unnecessary copies -