discussion forum
message
| Name: |
Christian Ludloff |
| eMail: |
ludloff@sandpile.org |
| Date: |
June 28, 1998 at 16:41:29 |
| Subject: |
Re: Possible P6 bug |
| In Reply To: |
Re: Possible P6 bug by Paul Hsieh on June 28, 1998 at 03:19:22 |
| Text: |
|| Of course. Because CPUID is serializing, it will ensure that all the units || have finished their stuff. So your code won't be executed in parallel with || any code from before the CPUID. Which makes the core schedule differently. | Given the way its used in the code, I assume Colin realized this. He is | making the point that the loop's steady state can be heavily influenced | by the initial state of the pipeline. | In theory everything should eventually flush out of the reservation | station and some sort of repeating steady state dependent on only the | contents of the loop should arise. However, Colin's investigations | pretty much prove that this is not the case.I can imagine that certain code sequences can end up in a state, where they only repeat behavior, but extremely rarely (or even never) end up in a "steady" state (ie. depending on the loop count you get different execution times on every measurement). | Or you could figure it out yourself by using some of the MSR's that | track things like "the number of RAT resource conflicts" or something | like that. The RAT (Resource Allocation Table?) is a bit of a mystery | to me (and to most I think given their weak documenation.) I think | this is where all the renaming, and register loading goes on. The RAT is the Register Alias Table. It determines, whether the source operands of uOPs are taken from real registers, or from the ROB. So if a previous uOP's destination register matches the current uOP's source register, then the RAT ensures that the current uOP uses the ROB entry from the previous uOP, rather than the real register. Also I noticed that Intel claims, that only three uOPs can retire in a single clock cycle, whereas taken branches must retire in the 1st slot then. Finally, according to Intel's docs, the branch will be predicted taken by the static prediction on the first iteration, since it is one of the conditional branches which go backwards. Then the branch should end up in the BTB, but Intel still claims a penalty of "approximately" one cycle once this negative branch is predicted correctly. | So I really don't think this is a P6 bug but just a side effect of their | design trade offs. The problem with all of this is that they are just | theories. Intel hasn't written enough clear documentation to figure any | of this out for sure. But I hope this discussion leads to some good | analysis of the P6 from a performance point of view. I agree. -- CL |
| optional link: |
sandpile.org |
post a followup message
(Be nice... or be blocked. Be technical... or be erased.)
currently posted followup messages
(You may have to press the RELOAD button of your browser.)
|