











| Implementation Shoop caches                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| rite Races:                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| Cannot update cache until bus is obtained<br>• Otherwise, another processor may get bus first,<br>and then write the same cache block! |                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| Two step process:                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| Arbitrate for bus                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| <ul> <li>Place miss on bus and complete operation</li> <li>If miss accurs to block while weiting for bus</li> </ul>                    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| handle miss (invalidate may be needed) and t<br>restart.                                                                               | hen                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|                                                                                                                                        | <ul> <li>Imprementation Shoop caches</li> <li>'rite Races:</li> <li>Cannot update cache until bus is obtained <ul> <li>Otherwise, another processor may get bus first, and then write the same cache block!</li> </ul> </li> <li>Two step process: <ul> <li>Arbitrate for bus</li> <li>Place miss on bus and complete operation</li> </ul> </li> <li>If miss occurs to block while waiting for bus handle miss (invalidate may be needed) and trestart.</li> </ul> |



| MESI Protocol                                                                                                                                                                            |           |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|
| <ul> <li>Simple protocol drawbacks: When writing a block<br/>invalidations even if the block is used privately</li> </ul>                                                                | , send    |
| <ul> <li>Add 4th state (MESI)</li> <li><u>M</u>odfied (private,!=Memory)</li> <li>eXclusive (private,=Memory)</li> <li><u>S</u>hared (shared,=Memory)</li> <li><u>I</u>nvalid</li> </ul> |           |
| Original Exclusive => Modified (dirty) or Exclusive                                                                                                                                      | : (clean) |
|                                                                                                                                                                                          | 9         |



|                | Memory Co           | onsistency    |                |
|----------------|---------------------|---------------|----------------|
| Sequential N   | lemory Access       | on Uniproce   | ssor executior |
| A ← 10;        | // First Write to A |               |                |
| A ← 20;        | // Last write to A  |               |                |
| Read A;        | // A will k         | nave value of | 100            |
| f "Read A" ret | turns value 100     | ), the execut | ion is wrong!  |
| Memory Con     | sistency on Mu      | Itiprocessor  |                |
| P1             | P2                  | P3            | P4             |
| Initial: A=B=  | 0;                  |               |                |
| A ← 10;        | A==10               | A==10         | A==0           |
| B ← 20;        | B==20               | B==0          | B==20          |
|                | (Right)             | (Right)       | (Wrong?!)      |
|                |                     |               |                |

| Sequential                                      | Jonsistency                      |
|-------------------------------------------------|----------------------------------|
| Sequential consistency: All m                   | emory accesses are in            |
| program order and globally                      | serialized, or                   |
| <ul> <li>All memory writes appear in</li> </ul> | the same order on all processors |
| Any other processor perceive                    | es a write to A only when it     |
| Teuus A                                         |                                  |
| Programmer's view about cons                    | sistency: how memory writes      |
| and reads are ordered on e                      | every processor                  |
| Programmer's view about cons                    | sistency: how memory writes      |
| and reads are ordered on e                      | every processor                  |
| Programmer's view on P3                         | Programmer's view on P4          |
| Programmer's view about cons                    | sistency: how memory writes      |
| and reads are ordered on e                      | every processor                  |
| <u>Programmer's view on P3</u>                  | Programmer's view on P4          |
| $A \leftarrow 10;$                              | B←20;                            |
| Programmer's view about cons                    | sistency: how memory writes      |
| and reads are ordered on e                      | every processor                  |
| <u>Programmer's view on P3</u>                  | <u>Programmer's view on P4</u>   |
| $A \leftarrow 10;$                              | B (-20;                          |
| Read A (A==10);                                 | Read A (A==0);                   |
| Programmer's view about con:                    | sistency: how memory writes      |
| and reads are ordered on e                      | every processor                  |
| Programmer's view on P3                         | <u>Programmer's view on P4</u>   |
| $A \leftarrow 10$ ;                             | B (-20;                          |
| Read A (A==10);                                 | Read A (A==0);                   |
| Read B (B==0);                                  | Read B (B==10);                  |
| Programmer's view about con:                    | sistency: how memory writes      |
| and reads are ordered on e                      | every processor                  |
| Programmer's view on P3                         | Programmer's view on P4          |
| $A \leftarrow 10$ ;                             | B←20;                            |
| Read A (A==10);                                 | Read A (A==0);                   |
| Read B (B==0);                                  | Read B (B==10);                  |
| B $\leftarrow 20$ ;                             | A←10;                            |
| Programmer's view about con:                    | sistency: how memory writes      |
| and reads are ordered on e                      | every processor                  |
| Programmer's view on P3                         | Programmer's view on P4          |
| $A \leftarrow 10$ ;                             | B←20;                            |
| Read A (A==10);                                 | Read A (A==0);                   |
| Read B (B==0);                                  | Read B (B==10);                  |
| B \leftarrow 20;                                | A←10;                            |
| (Consistent)                                    | (Inconsistent!)                  |

| P1: $A \leftarrow 0$ ;                                                    | Two processors:<br>P2: | B ← 0;                |
|---------------------------------------------------------------------------|------------------------|-----------------------|
| $\begin{array}{c} A \leftarrow 1;\\ L1:  \text{if } (B == 0) \end{array}$ | L2:                    | B ← 1;<br>if (A == 0) |
| s there an explana                                                        | ation that L1 is tru   | e and L2 is false?    |
| Global View                                                               | View from P1           | View from P2          |
| A ← 0                                                                     | A ← 0                  | A ← 0                 |
| B ← 0                                                                     | B ← 0                  | B ← 0                 |
| A ← 1                                                                     | A ← 1                  | A ← 1                 |
| P1 Reads B                                                                | L1: Read B==0          |                       |
| P2 Reads A                                                                |                        | L2: Read A==1         |
| B ← 1                                                                     | B ← 1                  | B ← 1                 |
|                                                                           |                        |                       |

| Sequential                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | Consis                 | tency Overhead                                                                       |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|--------------------------------------------------------------------------------------|
| What could have been                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | wrong i                | f both L1 and L2 are true?                                                           |
| P1: A ← 0;                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | P2:                    | B ← 0;                                                                               |
| A ← 1;<br>L1: if (B == 0)<br>A's invalidation has                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | L2:<br>not arriv       | $\overrightarrow{B} \leftarrow 1;$<br>if (A == 0)<br>ved at P2, and B's invalidation |
| has not arrived at F                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 21                     |                                                                                      |
| Reading A or B happenet in the second sec | oens bef               | ore the writes                                                                       |
| Solution I: Delay ANY<br>location or not) unt                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | followir<br>il an inva | ng accesses (to the memory<br>lidation is ALL DONE.                                  |
| Overhead:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                        |                                                                                      |
| What is the full lat                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | ency of                | invalidation?                                                                        |
| How frequent are in                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | nvalidati              | ons?                                                                                 |
| How about memory                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | level par              | rallelism?                                                                           |

|                                    | Memory Consistence Models                                                    |
|------------------------------------|------------------------------------------------------------------------------|
| Why sho                            | uld sequential consistency be the only correct one?                          |
| It                                 | is just the most simple one                                                  |
| It                                 | was defined by Lamport                                                       |
| Memory                             | consistency models: A contract between a multiprocessor                      |
| builde                             | r and system programmers on how the programmers would                        |
| reasor                             | n about memory access ordering                                               |
| Relaxed                            | consistency models: A memory consistency that is weaker                      |
| than t                             | he sequential consistency                                                    |
| <ul> <li>Se</li> <li>wr</li> </ul> | quential consistency maintains some total ordering of reads and              |
| <ul> <li>Pro</li></ul>             | ncessor consistency (total store ordering): maintain program order o         |
| wr                                 | ites from the same processor                                                 |
| <ul> <li>Par<br/>pro</li> </ul>    | tial store order: writes from the same processor might not be in ogram order |
|                                    |                                                                              |

| Memo                                                                                                         | ory Consist                                              | ency Models                                                           |
|--------------------------------------------------------------------------------------------------------------|----------------------------------------------------------|-----------------------------------------------------------------------|
| P1: A ← 0;                                                                                                   | P2:                                                      | B ← 0;                                                                |
| A ← 1;<br>L1: if (B == 0)<br>Explain in process                                                              | L2:<br>sor consistenc                                    |                                                                       |
| View from P1                                                                                                 | View from P2                                             | Another view from P2                                                  |
| A ← 0                                                                                                        | B ← 0                                                    | A ← 0                                                                 |
| B ← 0                                                                                                        | B ← 1                                                    | B ← 0                                                                 |
| A ← 1                                                                                                        | A ← 0                                                    | L2: Read A==0                                                         |
| L1: Read B==0                                                                                                | L2: Read A==0                                            | A ← 1                                                                 |
| B ← 1                                                                                                        | $A \leftarrow 1$                                         | B ← 1                                                                 |
| (a)                                                                                                          | (b)                                                      | (c)                                                                   |
| <ul> <li>(b) Remote write:</li> <li>(c) Local reads by</li> <li>Key point: prog<br/>shared memory</li> </ul> | s appear in a d<br>/passes local w<br> rammers know<br>/ | ifferent order<br>rites (relax W->R order)<br>how to reason about the |
|                                                                                                              |                                                          | 16                                                                    |

|                                     | Mentor                     | Consistency und IL        | •                                          |
|-------------------------------------|----------------------------|---------------------------|--------------------------------------------|
| Specula                             | ite on loads,              | flush on possible vie     | plations                                   |
| <ul> <li>With</li> </ul>            | ILP and SC wh              | at will happen on this?   |                                            |
| P1 code                             | P2 code                    | P1 exec                   | P2 exec                                    |
| A = 1                               | B = 1                      | issue "store A"           | issue "store B"                            |
| read B                              | read A                     | issue "load B"            | issue "load A"                             |
|                                     |                            | commit A , send inv (win  | ner) flush at load A<br>commit B, send inv |
| SC can                              | be maintaine               | ed, but expensive, so     | o may also use                             |
| TSO or                              | PC                         |                           |                                            |
| <ul> <li>Speci<br/>perfo</li> </ul> | ilative executio<br>rmance | on and rollback can still | improve                                    |
| Perform                             | nance on con<br>MC ~ Weak  | temporary multipro<br>MC  | cessors: ILP +                             |