discussion forum



message

Name: Christian Ludloff
eMail: ludloff@sandpile.org
Date: June 28, 1998 at 16:41:29
Subject: Re: Possible P6 bug
In Reply To: Re: Possible P6 bug by
Paul Hsieh on June 28, 1998 at 03:19:22
Text: || Of course. Because CPUID is serializing, it will ensure that all the units
|| have finished their stuff. So your code won't be executed in parallel with
|| any code from before the CPUID. Which makes the core schedule differently.
| Given the way its used in the code, I assume Colin realized this. He is
| making the point that the loop's steady state can be heavily influenced
| by the initial state of the pipeline.
| In theory everything should eventually flush out of the reservation
| station and some sort of repeating steady state dependent on only the
| contents of the loop should arise. However, Colin's investigations
| pretty much prove that this is not the case.

I can imagine that certain code sequences can end up in a state, where
they only repeat behavior, but extremely rarely (or even never) end up
in a "steady" state (ie. depending on the loop count you get different
execution times on every measurement).

| Or you could figure it out yourself by using some of the MSR's that
| track things like "the number of RAT resource conflicts" or something
| like that. The RAT (Resource Allocation Table?) is a bit of a mystery
| to me (and to most I think given their weak documenation.) I think
| this is where all the renaming, and register loading goes on.

The RAT is the Register Alias Table. It determines, whether the source
operands of uOPs are taken from real registers, or from the ROB. So if
a previous uOP's destination register matches the current uOP's source
register, then the RAT ensures that the current uOP uses the ROB entry
from the previous uOP, rather than the real register.

Also I noticed that Intel claims, that only three uOPs can retire in a
single clock cycle, whereas taken branches must retire in the 1st slot
then. Finally, according to Intel's docs, the branch will be predicted
taken by the static prediction on the first iteration, since it is one
of the conditional branches which go backwards. Then the branch should
end up in the BTB, but Intel still claims a penalty of "approximately"
one cycle once this negative branch is predicted correctly.

| So I really don't think this is a P6 bug but just a side effect of their
| design trade offs. The problem with all of this is that they are just
| theories. Intel hasn't written enough clear documentation to figure any
| of this out for sure. But I hope this discussion leads to some good
| analysis of the P6 from a performance point of view.

I agree.

--
CL

optional link: sandpile.org



post a followup message
(Be nice... or be blocked. Be technical... or be erased.)

Name: optional link title:
eMail: optional link URL:
Subject: optional image URL:
  Insert line breaks by hand when only about one inch remains at the right side. Otherwise your message will be unreadable.
Text:
 

  Note: The above eMail form fields may look unaligned if you are using a browser other than Netscape Navigator version 3.0.



currently posted followup messages
(You may have to press the RELOAD button of your browser.)




main page