discussion forum



message

Name: Colin Percival
eMail: Colin_Percival@sfu.ca
Date: June 25, 1998 at 12:02:25
Subject: Possible P6 bug
Text: I have posted this on behalf of Colin. He suggested to do so, but was
unable to do it himself, using Lynx. I am looking into a way of using
an eMail alias, which would get routed into the discussion forum.

--
CL

---------------------------------------------------------------------

About a month ago I started to work on optimizing some of my code for
the P6 core. I quickly found, however, that my code was taking very
strange numbers of clock cycles. The following loop is the simplest
example I have found which exhibits this:

mov eax,0
cpuid
mov ecx,10000
l1: add eax,eax
add ebx,ebx
add eax,eax
add ebx,ebx
add eax,eax
add ebx,ebx
dec ecx
jne l1

This code should take exactly 4 clock cycles. However, depending upon
the alignment of the loop, it takes between 4.00 and 5.32 clock cycles.
To be specific:

0 mod 16 5.29 cycles
1 mod 16 5.31 cycles
2 mod 16 5.00 cycles
3 mod 16 4.00 cycles
4 mod 16 5.31 cycles
5 mod 16 4.00 cycles
6 mod 16 5.31 cycles
7 mod 16 4.00 cycles
8 mod 16 4.93 cycles
9 mod 16 5.28 cycles
10 mod 16 5.31 cycles
11 mod 16 5.33 cycles
12 mod 16 5.26 cycles
13 mod 16 5.32 cycles
14 mod 16 5.32 cycles
15 mod 16 5.29 cycles

Furthermore, if the cpuid is taken out, the times vary depending upon
what code is executed before.

I posted to comp.asm.lang.x86 about this a few weeks ago, and, although
no-one there could explain it, one person (Yves Gallot) noticed that
adding the line

mov edx,mem_var

causes the code to revert to taking 4 clocks (although this was before I
looked into the different alignments, so this might be a red herring).

I have also communicated with Intel, but they are also mystified.

Do you have any idea why this is happening, or how to go about working
it out?

Thanks,

Colin Percival

PS. Again, it may be a red herring, but all the code samples I've
found so far which exhibit this behaviour have an overabundance of
integer instructions. Paul Hsieh speculated that a necessary condition
might be the decoders running faster than the execution units.

PPS. If you think it might help, please post this to your forum.
Unofrtunately, lynx doesn't handle it very well.




post a followup message
(Be nice... or be blocked. Be technical... or be erased.)

Name: optional link title:
eMail: optional link URL:
Subject: optional image URL:
  Insert line breaks by hand when only about one inch remains at the right side. Otherwise your message will be unreadable.
Text:
 

  Note: The above eMail form fields may look unaligned if you are using a browser other than Netscape Navigator version 3.0.



currently posted followup messages
(You may have to press the RELOAD button of your browser.)




main page