## Provably Timing-Predictable Microarchitectures

Jan Reineke

joint work with Sebastian Hahn and Johannes Kahlen







#### COMPUTER SCIENCE

#### Context: Hard Real-Time Systems



Airbag Reaction in < 10 ms





## Embedded software must

- deliver correct control signals,
- within fixed time bounds.

## **Timing Analysis Problem**

```
// Perform the convolution.
for (int i=0; i<10; i++) {
  x[i] = a[i]*b[j-i];
  // Notify listeners.
  notify(x[i]);</pre>
```

Embedded Software



Microarchitecture



Timing Requirements

#### What does the execution time depend on?

• The input, determining which path is takener science through the program.



#### What does the execution time depend on?

- o The input, determining which path is taken w
- o The state of the hardware platform:
  - Due to caches, pipelining, speculation, etc.



#### Influence of Microarchitectural State



Best Case Worst Case

#### What does the execution time depend on?

• The input, determining which path is taken through the program.



- **o** The state of the hardware platform:
  - Due to caches, pipelining, speculation, etc.
- Interference from the environment:
  - External interference as seen from the analyzed task on shared busses, caches, memory.



Radojkovic et al. (ACM TACO, 2012) on Intel Atom and Intel Core 2 Quad:

> up to 14x slow-down due to interference on shared L2 cache and memory controller

## **Two Schools of Thought**



## **Timing Anomalies**



#### **Timing Anomalies: Example**

#### **Scheduling Anomaly**

![](_page_10_Figure_2.jpeg)

Bounds on multiprocessing timing anomalies RL Graham - SIAM Journal on Applied Mathematics, 1969 – SIAM (http://epubs.siam.org/doi/abs/10.1137/0117039)

## Timing Compositionality: By Example SAARLAND INIVERSITY

![](_page_11_Figure_1.jpeg)

#### Timing Compositionality =

Ability to simply sum up timing contributions by different components

Implicitly or explicitly assumed by (almost) all approaches to timing analysis for multi cores and cache-related preemption delays (CRPD).

## Timing Compositionality: Benefit

![](_page_12_Picture_1.jpeg)

![](_page_12_Figure_2.jpeg)

### Achieving Timing Compositionality

#### Textbook in-order pipeline + LRU caches

![](_page_13_Figure_3.jpeg)

#### **Bad News I:** Timing Anomalies

![](_page_14_Figure_1.jpeg)

We show such a pipeline has timing anomalies:

*Toward Compact Abstractions for Processor Pipelines* S. Hahn, J. Reineke, and R. Wilhelm. In Correct System Design, 2015. Maximal cost of an additional cache miss?

Intuitively: main memory latency

**Unfortunately:** ~ 2 times main-memory latency

- ongoing instruction fetch may block load
- ongoing load may block instruction fetch

### *Key Insight*: Anomalies Require Non-Monotonicity

![](_page_16_Figure_1.jpeg)

#### Monotonicity Enables Predictability 1/2

*Theorem 1 (Timing Anomalies):* Monotonicity implies absence of timing anomalies.

![](_page_17_Figure_2.jpeg)

#### Monotonicity Enables Predictability 1/2

*Theorem 2 (Timing Compositionality):* Monotonicity enables the derivation of sound penalties.

![](_page_18_Figure_2.jpeg)

#### How to Achieve Monotonicity?

![](_page_19_Figure_1.jpeg)

Definition (Strictly In-Order): We call a pipeline *strictly in-order* if each *resource* processes the instructions in program order.

- Enforce memory operations (instructions and data) in-order (common memory as resource)
- Block instruction fetch until no potential data accesses in the pipeline

Theorem 1 (Monotonicity): In the strictly in-order pipeline progress of an instruction is monotone in the progress of other instructions.

Corollary (Timing Anomalies and Timing Compositionality): In the strictly in-order pipeline

- does not have timing anomalies, and
- admits compositional analysis with natural penalties.

#### **Experimental Evaluation**

Performance:

Strictly in-order pipeline is about 6% slower than regular in-order pipeline.

 $\rightarrow$  Preserves most of the benefits of pipelining.

Predictability:

~4x faster single-core analysis

~32x faster multi-core analysis

### Automating the Predictability Proofs

## Formalization of Processor

![](_page_23_Picture_2.jpeg)

## Formalization of Property

## SMT Solver

![](_page_23_Picture_5.jpeg)

# Processor states map dynamic instruction instances to their progress $\rightarrow$ infinite state space

#### Suffices to consider a finite window:

![](_page_24_Figure_3.jpeg)

### Proof of Monotonicity: First Attempt

![](_page_25_Figure_1.jpeg)

However, the formula is **satisfiable**!

Need to capture reachable states...

### Proof of Monotonicity: Second Attempt

#### Define formula that is **unsatisfiable**, if *transition relation* is **monotonic**.

 $validPipelineState(p_1) \land validPipelineState(p_2) \land$ 

 $p_1 \sqsubseteq p_2 \land \mathbf{cycle}(p_1, p_1') \land \mathbf{cycle}(p_2, p_2') \land \neg p_1' \sqsubseteq p_2'$ 

proved correct via separate SMT queries

This formula is indeed **unsatisfiable**!

#### See paper: Sebastian Hahn, Jan Reineke: **Design and analysis of SIC: a provably timingpredictable pipelined processor core.** Real-Time Systems, November 2019

### Efficiency of SMT Proofs

| Proof Z3 Ru                           | ntime |
|---------------------------------------|-------|
| monotonicity of SIC                   | 7s    |
| non-monotonicity of textbook in-order | < 1s  |
| anomaly-freedom w.r.t. cache          | < 1s  |

*Compositionality w.r.t. instruction-cache and interrupts:* 

![](_page_28_Figure_3.jpeg)

#### **Conclusions and Future Work**

#### Key Insight: Monotonicity enables Timing Predictability

Strictly in-order pipeline is monotonic

Predictability proofs can be automated

- o Translation of model to SMT still manual
- o Need to capture relevant invariants manually
- Can we automate the process further?

![](_page_30_Picture_0.jpeg)

#### Sebastian Hahn, Jan Reineke: **Design and analysis of SIC: a provably timingpredictable pipelined processor core.** RTSS 2018 (best student paper award)

Sebastian Hahn, Jan Reineke: **Design and analysis of SIC: a provably timing predictable pipelined processor core.** Real-Time Systems, November 2019