The interfaces for atomic objects that we have been considering provide ordering constraints as part of the atomic operations themselves. This is consistent with the Java and C# volatile-based approach, and our atomic_ops package, but inconsistent with some other atomic operation implementations, such as that in the Linux kernel, which often require the use of explicit memory fences..
Some rationale for this choice is provided as part of N2145 and N2047. Here we provide a somewhat expanded and updated version of that rationale.
Note that N2153 argues that explicit fences are still needed for maximum performance on certain applications and architectures. The arguments here do not preclude providing them as well.
Here we list our reasons for explicitly associating ordering semantics with atomic operations, and correspondingly providing different variants with different ordering constraints:
On X86 processors, the fence is redundant only if precisely the right kind of fence (for a store, one that prevents "LoadStore" and "StoreStore" reordering) is used. (N2153 does suggest such a fence.)
On Itanium, the fence, once requested, can generally not be optimized back to an st.rel instruction. To see this, consider the hypothetical lock release sequence:
x = 1; // lock protect assignment LoadStore_and_StoreStore_fence(); // hypothetical optimal fence lock.store_relaxed(0); // atomically clear spinlock y.store_relaxed(42); // Unrelated atomic assignmentNote that this does not allow the assignments to x and y to be reordered. However, on Itanium, we really want to transform this to the equivalent of
x = 1; // lock protect assignment lock.store_release(0); // atomically clear spinlock y.store_relaxed(42); // Unrelated atomic assignmentThis is not safe, since this version does allow the assignments to x and y to be reordered. In most realistic contexts, it would be hard to determine that there is no subsequent assignment like the store to y. Hence I would not expect this transformation to be generally feasible.
This issue is admittedly largely Itanium specific. But note that the above reordering should be allowed; the fence-based implementation adds a useless constraint. If we only have fences, we can't express the abstractly correct ordering constraint for even a spin-lock release.