# Hardware-Software Contracts for Secure Speculation

Joint work with Marco Guarnieri, Pepe Vila @ IMDEA Software, Madrid Boris Köpf @ Microsoft Research, Cambridge, UK

Supported by Intel Strategic Research Alliance (ISRA) "Information Flow Tracking across the Hardware-Software Boundary"





Spectre Attacks: Exploiting Speculative Execution – S&P 2019

#### Exploits speculative execution to leak sensitive information

#### Almost all modern processors are affected

P. Kocher, J. Horn, A. Fogh, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, S. Mangard, T. Prescher, M. Schwarz, Y. Yarom — 2



# Countermeasures

Software-level countermeasures  $\checkmark$  $\checkmark$ 

#### Hardware-level countermeasures

- **Disabling speculation**  $\checkmark$ 
  - Delaying speculative loads
- Taint tracking of speculative data: STT & NDA

#### *Example*: insert "fences" to selectively terminate speculative execution Implemented in major compilers (Microsoft Visual C++, Intel ICC, Clang)





# Goals

### 1. Capture hardware-level countermeasures in a unifying framework

### 2. Introduce hardware-software contracts to capture their security guarantees

3. Requirements for secure programming based on hardware-software contracts



# Outline

- 1. Speculative Execution Attacks
- 2. Hardware-level Countermeasures
- 3. Hardware-Software Contracts

### 4. Requirements for Secure Programming

# 1. Speculative Execution Attacks



# Microarchitecture 101

#### Cache Fast but small memory storing recently accessed data across cores

Die shot of AMD "Barcelona" Quad Core CPU





## Background: Assembly language

#### if (x < A size) $y = \mathbf{B}[\mathbf{A}[\mathbf{X}]]$



L1: load t,  $\mathbf{A}$  + x END:

### $\mu$ Assembly = our "toy" assembly language



# **Background: Speculative execution**

Rollback changes if speculation was wrong

Only architectural (ISA, "logical") state, not microarchitectural state

Predict instructions' outcomes and speculatively continue execution





Predictions based on branch history & program structure

# Background: Branch prediction

- Key hardware data structure for out-of-order and speculative execution
- Keeps track of "in-flight instructions"
- Example:

. . .

. . .

Entry Instruction Control Dep. 0: c ← x < A size 1: beqz c, END 2: 3:

 $c \leftarrow x < A size$ beqz c, END L1: load t,  $\mathbf{A}$  + x load y, **B** + t END:

Speculative Instruction Fetch 

| Entry | Instruction          | Control Dep. |
|-------|----------------------|--------------|
| 0:    | c ← x < A_size       | _            |
| 1:    | beqz c, END          | _            |
| 2:    | load t, <b>A</b> + x | 2            |
| 3:    | _                    | _            |
|       | •••                  |              |

. . .



- Key hardware data structure for out-of-order and speculative execution
- Keeps track of "in-flight instructions"
- Example:

. . .

- - -

Entry Instruction Control Dep. 0: c ← x < A size 1: beqz c, END 2: load t,  $\mathbf{A}$  + x 2 3:

c ← x < A size beqz c, END L1: load t,  $\mathbf{A}$  + x load y, **B** + t END:

Speculative Instruction Fetch 

| Entry | Instruc | tion | Control    | Dep. |     |  |   |
|-------|---------|------|------------|------|-----|--|---|
| 0:    | C ←     | X    | < A_       | _s:  | ize |  | - |
| 1:    | beqz    | С,   | EN         | D    |     |  | _ |
| 2:    | load    | t,   | <b>A</b> - | + :  | X   |  | 2 |
| 3:    | load    | Уи   | <b>B</b> - | + -  | t   |  | 2 |
|       |         |      |            |      |     |  |   |

- - -

. . .

. . .



. . .

- Key hardware data structure for out-of-order and speculative execution
- Keeps track of "in-flight instructions"
- Example:

| Entry Instruction           | Control Dep. | Evaluate   | Entry | Instruction                       | Control Dep. |
|-----------------------------|--------------|------------|-------|-----------------------------------|--------------|
| 0: c ← x < A_si             | ze -         | x < A_size | 0:    | c ← 0                             | _            |
| 1: beqz c, <i>END</i>       | _            |            | 1:    | beqz 0, <i>END</i>                | _            |
| 2: load t, $\mathbf{A}$ + x | 2            |            | 2:    | load t, $\mathbf{A} + \mathbf{x}$ | 2            |
| 3: load y, <b>B</b> + t     | 2            |            | 3:    | load y, <b>B</b> + t              | 2            |
|                             |              |            |       |                                   |              |

c ← x < A size beqz c, END L1: load t,  $\mathbf{A}$  + x load y, **B** + t END:



- Key hardware data structure for out-of-order and speculative execution
- Keeps track of "in-flight instructions"
- Example:

. . .

. . .

| Entry | Instruc | tion |            |   | Control | Dep. | F    |
|-------|---------|------|------------|---|---------|------|------|
| 0:    | C ←     | 0    |            |   |         | _    | mis- |
| 1:    | beqz    | 0,   | END        |   |         | _    |      |
| 2:    | load    | t,   | <b>A</b> + | Х |         | 2    |      |
| 3:    | load    | Уи   | <b>B</b> + | t |         | 2    |      |
|       |         |      |            |   |         |      |      |

c ← x < A size beqz c, END L1: load t,  $\mathbf{A}$  + x load y, **B** + t END:

| Rollback    | Entry | Instruction        | Control Dep. |
|-------------|-------|--------------------|--------------|
| speculation | 0:    | c ← 0              | _            |
|             | 1:    | beqz 0, <i>END</i> | _            |
|             | 2:    | _                  | _            |
|             | 3:    | _                  | _            |
|             | •••   |                    |              |

. . .



- Key hardware data structure for out-of-order and speculative execution
- Keeps track of "in-flight instructions"
- Example:

. . .

. . .

| Entry | Instruc | tion |    |    |   | Control Dep. |
|-------|---------|------|----|----|---|--------------|
| 0:    | C ←     | 0    |    |    |   | _            |
| 1:    | beqz    | 0,   | El | VD |   | _            |
| 2:    | load    | t,   | A  | +  | Х | 2            |
| 3:    | load    | y,   | В  | +  | t | 2            |
|       |         |      |    |    |   |              |

 $c \leftarrow x < A size$ beqz c, END L1: load t,  $\mathbf{A}$  + x load y, **B** + t END:





. . .













# 2. Hardware-level Countermeasures



### A parametric speculative, out-of-order processor

- Hardware-level countermeasures restrict speculative execution
- Intended to work for arbitrary
  - (compliant) scheduler
  - memory hierarchy
  - branch predictor

- scheduler
- caches
- branch predictor

#### We introduce a processor model that is parametric in

#### A parametric speculative, out-of-order processor ... formalized as binary relation on hardware states

memory reorder registers buffer



#### A parametric speculative, out-of-order processor ... formalized as binary relation on hardware states

 $\langle m, a, buf, cs, bp \rangle \stackrel{d}{\Rightarrow} \langle m', a', buf', cs', bp' \rangle$  $d = next(sc) \qquad sc' = update(sc, buf'\downarrow)$ 



predecessor state

directive from scheduler, fetch, execute i, retire

 $\langle m, a, buf, cs, bp, sc \rangle \Rightarrow \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', cs', bp', sc' \rangle \leqslant \langle m', a', buf', bp', sc' \rangle \leqslant \langle m', a', buf', bp', sc' \rangle \leqslant \langle m', a', bp', sc' \rangle \otimes \langle m', a', bp', bp', sc' \rangle \otimes \langle m', a', bp', bp', sc' \rangle \otimes \langle m', a', bp', bp',$ 

SUCCESSO state



### Rules capture effect of directives **Example: Fetch**

applying reorder buffer to register state Fetch-Branch-Hit

instruction is a branch

#### branch predictor reorder buffer says l' is next is not full $\Rightarrow a' = apl(buf, a) \qquad |buf| < \mathbf{w} \qquad a'(\mathbf{pc}) \neq \bot$ $\Rightarrow p(a'(\mathbf{pc})) = \mathbf{beqz} \ x, \ell \qquad \ell' = predict(bp, a'(\mathbf{pc}))$ $access(cs, a'(\mathbf{pc})) = Hit$ $update(cs, a'(\mathbf{pc})) = cs' < \cdots$ $\langle m, a, buf, cs, bp \rangle \xrightarrow{\text{fetch}} \langle m, a, buf \cdot \mathbf{pc} \leftarrow \ell' @a'(\mathbf{pc}), cs', bp \rangle$ add change of pc cache state to reorder buffer 24





# Rules capture effect of directive

#### result of instruction at head of reorder buffer is resolved

# RETIRE-ASSIGNMENT $\Rightarrow buf = x \leftarrow v@\varepsilon \cdot buf' \quad v \in Vals$

 $\langle m, a, buf, cs, bp \rangle \xrightarrow{\text{retire}} \langle m, a[x \mapsto v], buf', cs, bp \rangle$ 

apply change to registers and remove entry from reorder buffer



# Capturing countermeasures

- Countermeasures restrict freedom of scheduler
- Defined for arbitrary cache and branch predictor

We consider three approaches:

- 1. Disabling speculation
- 2. Delaying speculative loads
- 3. Taint tracking of speculative values



# Disabling speculative execution

Constrain scheduler to

- 1. fetch
- 2. execute 1
- 3. retire
- 4. go to 1.

Obviously eliminates all speculative-execution attacks.

• Really slow.

### Delaying speculative loads (Sakalis et al., ISCA 2019)

**STEP-OTHERS** 

$$\langle m, a, buf, cs, bp \rangle \stackrel{d}{\Rightarrow} \langle m', a', buf', cs', bp' \rangle d = next(sc) sc' = update(sc, buf'\downarrow) \frac{d \in \{ \mathbf{fetch}, \mathbf{retire} \} \lor (d = \mathbf{execute} \ i \land buf|_i \neq \mathbf{load} \ x, e) }{\langle m, a, buf, cs, bp, sc \rangle \Rightarrow_{\mathbf{loadDelay}} \langle m', a', buf', cs', bp', sc' \rangle }$$
  
STEP-EAGER-DELAY  
$$\langle m, a, buf, cs, bp \rangle \stackrel{d}{\Rightarrow} \langle m', a', buf', cs', bp' \rangle d = next(sc) sc' = update(sc, buf'\downarrow) d = \mathbf{execute} \ i \\ buf|_i = \mathbf{load} \ x, e \forall \mathbf{pc} \leftarrow \ell @\ell' \in buf[0..i-1]. \ \ell' = \varepsilon \\ \langle m, a, buf, cs, bp, sc \rangle \Rightarrow_{\mathbf{loadDelay}} \langle m', a', buf', cs', bp', sc' \rangle$$

**Question**: Does this eliminate all speculative-execution attacks?

#### Non-loads can be executed arbitrarily.

Loads are only executed nonspeculatively.







### Delaying speculative loads (Sakalis et al., ISCA 2019) Spectre v1 Example



Will only be performed non-speculatively.  $\rightarrow$  Problem solved.



### Delaying speculative loads (Sakalis et al., ISCA 2019) Variant of Spectre v1 Example



. . .

 $c \leftarrow x < A size$ load t,  $\mathbf{A}$  + x beqz c, END L1: begz t, L2 END:

**Question**: How to capture its security guarantees?



Unlike entirely nonspeculative execution, leaks whether **A**[**x**] is 0!





### Taint-tracking speculative values (STT, Yu et al., MICRO 2019, NDA, Weisse et al. MICRO 2019)

- Allow speculative loads, but make sure the loaded values do not leak
- Difference:
  - STT: Prevent any "transmit" instruction on data derived from speculative loads
  - NDD: Prevent any propagation of speculatively loaded data (more conservative than STT)





# Taint-tracking speculative values (STT, Yu et al., MICRO 2019, NDA, Weisse et al. MICRO 2019)

STEP d = next(sc)  $\langle m, a, buf_{ul}, cs, bp \rangle$   $sc' = update(sc, buf'\downarrow)$   $\langle m, a, buf, cs, bp, sc \rangle$ "Richer" reorder buffer state captures "taint"

> Allows for more speculative and out-of-order execution. But how secure is it?





Taint-tracking speculative values





# (STT, Yu et al., MICRO 2019, NDA, Weisse et al. MICRO 2019)

"semantically equivalent" to Spectre v1 example





# 3. Hardware-Software Contracts

# Hardware-software contracts

Goals:

- Capture security guarantees in a simple, mechanism-independent manner
- Carve out differences between existing countermeasures
- Serve as a **basis for secure programming**

# Meaning of contracts

Contract assigns sequences of observations to architectural states

### States $\sigma, \sigma'$ are contractindistinguishable.

#### **Definition** 1 ( $\{\cdot\} \vdash [\cdot]\}$ ). A hardware semantics $\{\cdot\}$ satisfies a contract $\left[ \cdot \right]$ if, for all programs p and all initial arch. states $\sigma, \sigma'$ , if $[p](\sigma) = [p](\sigma')$ , then $\{p\}(\sigma) = \{p\}(\sigma')$ .

### States $\sigma, \sigma'$ are HWindistinguishable.

# Structure of contracts

# **Contract** = Execution Mod

"How are programs executed"

# Execution Mode · Observer Mode

# "What is visible about the execution?"



## Two execution modes

#### **Sequential (seq)** = non-speculative execution

**Speculative (spec)** = each branch is mispredicted and speculatively executed

# Observer modes

- **Constant time (ct)** = observer can see addresses of loads
  - stores

### Architectural (arch) = ct + observer can see values of loads

#### **Program counter (pc)** = observer can see addresses of instruction fetches

instruction fetches



# A Lattice of Contracts • [ • ] seq arch [ • ] seq-spec \_ct-pc spec

### "Leaks all data accessed nonspeculatively"

Spec

"Leaks addresses of non-spec. loads/stores/ instruction fetches"





 $\frac{1}{2} \left[ \cdot \right]_{ct}^{seq} \left[ \cdot \right]_{seq}^{seq} = \text{disabling speculation}$  $\begin{bmatrix} \cdot \end{bmatrix}_{ct-pc}^{seq-spec} & \left\{ \cdot \right\}_{loadDelay} = delaying$ speculative loads = taint tracking





## 4. Requirements for Secure Programming



# What is secure programming?

**Definition 3**  $(p \vdash NI(\pi, \llbracket \cdot \rrbracket))$ . Program p is *non-interferent* w.r.t. contract  $\llbracket \cdot \rrbracket$  and policy  $\pi$  if for all initial arch. states  $\sigma, \sigma': \sigma \simeq_L \sigma' \Rightarrow \llbracket p \rrbracket(\sigma) = \llbracket p \rrbracket(\sigma')$ .

"States agree on public data"

**Proposition 2.** *If*  $p \vdash NI(\pi, \llbracket \cdot \rrbracket)$  *and*  $\{ \cdot \} \vdash \llbracket \cdot \rrbracket$ *, then*  $p \vdash NI(\pi, \{ \cdot \} \cdot \rrbracket)$ *.* 

"Executions are indistinguishable under contract"

### What is secure programming? **Two variants**



#### "Constant-time programming" "Sandboxing" e.g. Javascript in the browser e.g. cryptographic code



# Sandboxing

may not access secret data non-speculatively.

Appropriate bounds check if (x < A\_size)  $\mathbf{V} = \mathbf{B}[\mathbf{A}[\mathbf{X}]]$ 

"Traditional" sandboxing mechanisms ensure that malicious code

- Ensures non-interference w.r.t. [ · ] <sup>seq</sup> arch .
  - HW-level countermeasures are adequate for sandboxing.







# **Constant-time programming**

"Traditional" constant-time programming ensures non-interference w.r.t. [[·]]<sup>seq</sup><sub>ct</sub>.





- HW-level countermeasures are not fully adequate for traditional constant-time programming.
- Need to apply additional SW-level countermeasures.

# Find out more in the paper: <u>https://arxiv.org/abs/2006.03841</u>

# Thank you for your attention!