2 minute read

When will OCC be a win?

P(failure) x restart_cost < AVG(Locking delay per query)

Essentially boils down to P(failure).

Advantages:

  • No lock overhead.
  • No deadlocks.

Disadvantages:

  • Lock cost becomes restart cost.
  • May “starve”. Abort repeatedly.

How does OCC work?

tbegin > read phase > tend > validation phase > write phase

Read Phase

Such an unfortunate naming! It does midlead me from time to time. There are writes (and a lot) in read phase.

My Definition: A transaction does “temporary” work in read phase.

Because there’s no lock, transactions write into copies of objects during the read phase. Only after passing validation can the copies come into effect. (Isolation)

Validation Phase

Assigning transaction numbers (tn)

Validation must have a ground - that is, the order of transactions.

How to determine the order?

Principle (spoiler): We need to know the write set (which is constructed during the read phase) of all previous (in terms of tn) txns before we can enter our validation phase.

  • Should assign tn after this txn comes into existence.
  • Should assign tn before validation phase starts: or we enter validation phase w/o a tn. Sad!
  • Two choices: at the start/end of read phase
    • At the start: Remember the principle above, which means we have to wait for all previous txns to finish their read phases.
    • At the end: we don’t have to wait since we are sure that all previous txns have finished their read phases.

Requirements to pass the validation phase

One of the following three:

20210929105146

20210929105238

Serial Validation

In the paper, they maintain a global count tnc for generating tns. Since we are in a concurrent environment, tnc should be protected by a critical section.

Method 1

A very straightforward one.

Note how tnc is not updated immediately when the read phase ends. It’s updated only after validation succeeds. This way, if validation fails, this tn can still be used by other txns.

20210929111802

Method 2

mid_tn is obtained at the beginning of tend. t from start_tn + 1 to mid_tn are the transactions that passed validation before myself entered validation, so they are guaranteed to have left the critical section.

This method adds concurrency by cutting the size of the criticl section.

You can cut the critical section further by replacing the finish_tn below with mid_tn_2 and repeat the above process.

image-20210929111930485

Parallel Validation

active txns are those that have completed their read phase but have not yet completed their write phase.

This is just an additional check for validation condition (3).

20210929113241

Miscellaneous

  • OCC is suitable for mostly-read workloads since conflicts rarely occur. Validation phase is usually small, even not needed at all.
  • Use a hack to counter starvation: grant a lock to those who are starved and let them run to finish.
  • For main-memory databases, OCC is great since it only modifies local state instead of a global data structure as in locking schemes.
  • I should consider writing less next time cuz this is eating my time. Next time I’ll try to only include what I find difficult to comprehend.