This is analogous to the corresponding theorem for Java, though some of the proof details differ.

**Claim:**
If a program allows no data races on a given input,
and the only synchronization
operations are simple `lock` and `unlock` operations with
acquire and release semantics respectively, then the
program obeys sequential consistency rules, i.e. it behaves as though
it had been executed by sequentially interleaving its actions, and
lettting each load instruction see the last previous value stored to
the same location in this interleaving.

Very informal **Proof**:

Consider a particular consistent execution on the given input.
The corresponding happens-before relation is an irreflexive
partial order, which we can extend to a total order *T*.
Chose *T* such that each `lock` operation appears
as late in the order as possible, and each `unlock`
appears as early as possible. We then have to argue that:

- The actions of each thread appear in
*T*in thread order, i.e. in is-sequenced-before order. - Each load sees the last preceding store in
*T*that stores to the same location. (For present purposes we view bit-field stores as loading and then storing into the entire "location".)

Ordinary memory operations must appear in is-sequenced-by order, since
that is one of the constraints on happens-before. Similarly
`lock` operations must not appear after actions they
are sequenced-before, and conversely for `unlock` actions.
Thus we have to argue that our construction ensures that `lock`
operations do not appear earlier, and `unlock` operations
appear no later than in thread order.

Since we place `lock` operations as late as possible, they
can be placed before their thread order only if this is required
by other inter-thread happens-before relationships, which would
have to be attributable to a communicates-with relationship.
But a `lock` operation does not communicate-with
anything, hence there can be no other ordering constraints on
the `lock` operation. A corresponding argument
applies to the ordering of `unlock` operations.
Thus *T* contains the operations of each thread in
is-sequenced-by order.

Now assume that a load operation *L* in *T* sees the
value stored by *Svisible* but a store instruction *Smiddle*
appears between them in *T*.

We know from the fact that
*T* is an extension of happens-before that
*Smiddle* does not happen-before *Svisible*.
But since there are
no data races and *Smiddle* and *Svisible* confict,
one of them must happen-before the other, and hence
*Svisible* must happen-before *Smiddle*.
This means that *Smiddle* may not happen-before *L*, since
that would otherwise violate the constraint on inter-thread visibility
in the definition of consistent executions.

But we again know that *T* is an extension of happens-before,
and hence *L* cannot happen-before *Smiddle*.
It follows that there is a data race between *L* and
*Smiddle*, and hence arrive at a contradiction.

Hence every load must see the last store in *T* that precedes it
and stores to the same location. •

I believe this also holds if a data race is defined as a store occurring simultaneously with (or adjacent to in a sequential interleaving) a load or a store, so long as our notion of "simple lock" doesn't include a trylock primitive. See Boehm, "Reordering Constraints for Pthread-Style Locks" for details.

The above argument does not apply if we allow `load_acquire`
and `store_release` operations. In particular,
cinsider the following example from the strawman proposal:

Thread1 | Thread2 |

store_release(&y, 1); r1 = load_acquire(&x); |
store_release(&x, 1); r2 = load_acquire(&y); |

This allows *r1* = *r2* = 0, where sequential consistency
(and Java) do not.