| However, architecturally | the LOCK is required, to ensure compatibility -- no ifs or buts.But.... ;-), yes, you are absolutely right. | Performance isn't that good an excuse for writing implementation- | dependent code. Also, bundling more than one version of the code | and selecting the desired variant at run-time requires that you | come up with an absolutely reliable processor/feature detection. | Which you won't be able to, unless you can foresee the future. For this special application, the software will be bundled with the hardware, so I either able to select a fitting processor or to adjust the software for a special system. | Are you absolutely certain that you can't improve the algorithm, | to avoid this particular performance issue? I am sure that the algorithm could be rewritten to avoid this problem, but that would cost a lot of effort. (There are other options, e.g. I might stick the conflicting threads to one processor and let other independent threads run on the other, but that confines the scheduler.) |