| In general you are right. But with my special problem I | incrementing/decrementing some reference counts, which is done by | processors only. Unfortunately these reference counts are | accessed frequently and I cannot afford the performance penalty.Can you afford that your program may or may not work reliably on a future HTT-capable processor? I mean, the P4 might be capable of detecting this particular scenario, and as a result work fine even if you don't use the LOCK prefix. However, architecturally the LOCK is required, to ensure compatibility -- no ifs or buts. Performance isn't that good an excuse for writing implementation- dependent code. Also, bundling more than one version of the code and selecting the desired variant at run-time requires that you come up with an absolutely reliable processor/feature detection. Which you won't be able to, unless you can foresee the future. Are you absolutely certain that you can't improve the algorithm, to avoid this particular performance issue? Have you considered getting help from Intel's developer support team? Just curious. |