This is analogous to the corresponding theorem for Java, though some of the proof details differ.
Claim: If a program allows no data races on a given input, and the only synchronization operations are simple lock and unlock operations with acquire and release semantics respectively, then the program obeys sequential consistency rules, i.e. it behaves as though it had been executed by sequentially interleaving its actions, and lettting each load instruction see the last previous value stored to the same location in this interleaving.
Very informal Proof:
Consider a particular consistent execution on the given input. The corresponding happens-before relation is an irreflexive partial order, which we can extend to a total order T. Chose T such that each lock operation appears as late in the order as possible, and each unlock appears as early as possible. We then have to argue that:
Ordinary memory operations must appear in is-sequenced-by order, since that is one of the constraints on happens-before. Similarly lock operations must not appear after actions they are sequenced-before, and conversely for unlock actions. Thus we have to argue that our construction ensures that lock operations do not appear earlier, and unlock operations appear no later than in thread order.
Since we place lock operations as late as possible, they can be placed before their thread order only if this is required by other inter-thread happens-before relationships, which would have to be attributable to a communicates-with relationship. But a lock operation does not communicate-with anything, hence there can be no other ordering constraints on the lock operation. A corresponding argument applies to the ordering of unlock operations. Thus T contains the operations of each thread in is-sequenced-by order.
Now assume that a load operation L in T sees the value stored by Svisible but a store instruction Smiddle appears between them in T.
We know from the fact that T is an extension of happens-before that Smiddle does not happen-before Svisible. But since there are no data races and Smiddle and Svisible confict, one of them must happen-before the other, and hence Svisible must happen-before Smiddle. This means that Smiddle may not happen-before L, since that would otherwise violate the constraint on inter-thread visibility in the definition of consistent executions.
But we again know that T is an extension of happens-before, and hence L cannot happen-before Smiddle. It follows that there is a data race between L and Smiddle, and hence arrive at a contradiction.
Hence every load must see the last store in T that precedes it and stores to the same location. •
I believe this also holds if a data race is defined as a store occurring simultaneously with (or adjacent to in a sequential interleaving) a load or a store, so long as our notion of "simple lock" doesn't include a trylock primitive. See Boehm, "Reordering Constraints for Pthread-Style Locks" for details.
The above argument does not apply if we allow load_acquire and store_release operations. In particular, cinsider the following example from the strawman proposal:
r1 = load_acquire(&x);
r2 = load_acquire(&y);
This allows r1 = r2 = 0, where sequential consistency (and Java) do not.