



#### ACACES Summer School 2017 Fiuggi, Italy

What does the execution time of a
 program depend on, on a single-core machine?

#### Input-dependent control flow



#### Microarchitectural State





Measuring or simulating the execution time for all inputs and all hardware states is not feasible in practice:

- There are too many.
- We cannot control the initial hardware states.
- Need for approximation!

#### Requirements for WCET Analysis

- 1. Upper bounds must be safe, i.e. not underestimated.
- 2. Upper bounds should be tight, i.e. not far away from real execution times.
- 3. Analysis effort must be tolerable.



Standard WCET Analysis Approach Today: Separation of Concerns + Abstraction

#### o Value Analysis:

Determines invariants for the values in registers and in memory

#### o Separation:

1.

Depends on hardware

Depends on program semantics

- Bound possible microarchitectural executions using abstractions.
- 2. Determine constraints on control flow (e.g. loop bounds) through program by abstractions.

• Combination: combine 1 and 2 to bound execution time of the whole program.





#### Running Example







#### Value Analysis

Determines invariants on values of registers at different program points. Invariants are often in the form of enclosing intervals of all possible values.

Where is this information used?

- Microarchitectural Analysis
  - Pipeline Analysis
  - Cache Analysis
- Control-Flow Analysis
  - Detect infeasible paths
  - Derive loop bounds





![](_page_11_Figure_0.jpeg)

![](_page_12_Figure_0.jpeg)

![](_page_13_Figure_0.jpeg)

# Microarchitectural Analysis

Ideal 1970s world: one instruction = one cycle *Today*:

- Pipelining
- Branch prediction + speculative execution
- Caches
- DRAM-based main memory
- Execution time of individual instruction highly variable and dependent on state of microarchitecture
- Need to analyze in which states the microarchitecture may be in when executing an instruction

# Pipelining

- Instruction execution is split into several stages
- Several instructions can be executed in an overlapped fashion
- Some processors can start more than one instruction per cycle: VLIW, Superscalar
- Some processors can execute instructions outof-order

| Fetch   |
|---------|
|         |
| Decode  |
|         |
| Execute |
|         |
| Memory  |
|         |
| WB      |

#### ••• Hardware Features: Pipelines Inst 1 Inst 3 Inst 2 Inst 4 Fetch Decode Fetch Execute Decode Fetch WB Execute Decode Fetch WB Execute Decode WB Execute

*Ideal Case*: One Instruction per Cycle, but there are Hazards!

WB

#### Pipeline Hazards

- Data Hazards: Operands not yet available (Data Dependences)
- Resource Hazards: Consecutive instructions use same resource
- o Control Hazards: Conditional branch
- Instruction-Cache Hazards: Instruction fetch causes cache miss
- o Data-Cache Hazards: Load causes cache miss

Assuming worst case everywhere is not an option; it would be too pessimistic!

→ Have to statically analyze the possible microarchitectural behaviors.

![](_page_18_Figure_0.jpeg)

- *"parchitectural state" "program state"* Processor (pipeline, cache, registers, memory) viewed as a *big* state machine, performing transitions every clock cycle
- Starting in an initial state for an instruction, transitions are performed, until a final state is reached:
  - final state: instruction has left the pipeline
  - # transitions: execution time of instruction
- Transitions may be annotated with events indicating e.g. a bus access, or a cache miss.

![](_page_19_Figure_0.jpeg)

![](_page_20_Figure_0.jpeg)

#### Abstracted State Machine

State space is product of

- "microarchitectural state", i.e. pipeline and cache state, and
- "program state", i.e., register and memory contents including the program inputs

*First Abstraction:* 

Discard program state (which is dealt with in control-flow analysis)

Second Abstraction: Find abstract domains that compactly represent large sets of concrete microarchitectural states How to Achieve "Sound Approximation"?
 Abstract Interpretation in a Nutshell

1. Every abstract state s<sup>#</sup> represents a set of *conc(s<sup>#</sup>)* concrete states:

![](_page_22_Figure_2.jpeg)

How to Achieve "Sound Approximation"? Abstract Interpretation in a Nutshell

#### 2. Local Consistency:

The successors of the concretization of an abstract state s<sup>#</sup> are represented by s<sup>#</sup>'s successors:

![](_page_23_Figure_3.jpeg)

# How to Achieve "Sound Approximation"? Abstract Interpretation in a Nutshell

![](_page_24_Figure_1.jpeg)

#### Consequences of Abstraction: Nondeterminism

![](_page_25_Figure_1.jpeg)

Nondeterminism:

In contrast to the concrete model, in the abstract model, one state can have several successor states.

Each abstract state represents a set of concrete states, which may have different successor states.

E.g. one may result in a cache hit, the other in a cache miss.

Consequences:

→ The abstract execution graph includes spurious executions, which leads to overapproximation of the WCET

 $\rightarrow$  There is a tradeoff between analysis cost and precision

### Consequences of Abstraction: Cycles

![](_page_26_Figure_1.jpeg)

*Cyclicity:* The abstract model may have cycles.

This is due to abstraction from the "program state". E.g. abstract states do not capture the value of variables in a loop.

Consequences:

→ The abstract execution graph alone cannot be used to derive any WCET bound

 $\rightarrow$  Need to combine information with control-flow analysis results

![](_page_27_Figure_0.jpeg)

![](_page_28_Figure_0.jpeg)

- Determines a worst-case path and an upper bound on the WCET.
- o Formulated as integer linear program (ILP).

![](_page_28_Figure_3.jpeg)

![](_page_29_Figure_0.jpeg)

 $\dots$  + Restriction to integers = ILP.

#### LP is in polynomial time, yet, ILP is NP hard, but often efficiently solvable in practice.

Solvers (e.g. CPLEX) determine the maximal value of the objective function + corresponding valuation of variables.

Global Bound Analysis aka Path Analysis aka Implicit Path Enumeration

Determines a worst-case path through the abstract execution graph and an upper bound on the WCET:

- Introduce a variable for each edge in abstract execution graph to capture how often this edge is taken
- Encode structure of graph via linear constraints
- Encode loop bounds and other infeasible path information via linear constraints

![](_page_30_Figure_6.jpeg)

#### Integer linear program:

- $max x_{a} + x_{b} + x_{c} + \dots$
- s.t. Structural Constraints Infeasible Path Constraints Loop Bound Constraints

#### Global Bound Analysis: Small Example

![](_page_31_Figure_1.jpeg)

+ Loop Bound = 5

$$\begin{array}{c} max \ x_{a} + x_{b} + x_{c} + x_{d} + x_{e} + x_{f} \\ s.t. \ x_{a} = 1 \\ x_{a} = x_{b} \\ x_{b} + x_{f} = x_{c} \\ x_{c} = x_{d} \\ x_{d} = x_{e} + x_{f} \end{array}$$

$$\begin{array}{c} Structural \ Constraints \\ x_{c} <= 5 \\ x_{c} <= 5 \\ x_{c} > 0 \end{array}$$

Solution:  

$$x_a = x_b = x_e = 1$$
  
 $x_c = x_d = 5$   
 $x_f = 4$   
 $\rightarrow x_a + x_b + x_c + x_d + x_e + x_f = 17$ 

32

## Summary and Outlook

• Separate Analysis into SW and HW aspects:

- SW: Control-flow Analysis
- HW: Microarchitectural Analysis
- Combine results using Integer Linear Program
- Abstraction:
  - Employ sound abstractions to solve undecidable problems approximately
  - $\rightarrow$  will see such an abstraction for caches next

# Literature (very incomplete)

WCET Analysis:

- Li, Malik: Performance analysis of embedded software using implicit path enumeration, In: Proceedings LCTRTS, 1995
- Ferdinand et al.: Reliable and Precise WCET Determination for a Real-Life Processor, In: Proceedings EMSOFT, 2001
- Stephan Thesing. Safe and Precise WCET Determination by Abstract Interpretation of Pipeline Models. PhD thesis, Saarland University, 2004
- Ingmar Jendrik Stein. ILP-based Path Analysis on Abstract Pipeline State Graphs. PhD thesis, Saarland University, 2010

Loop Bounds:

- Cullmann, Martin: Data-flow based detection of loop bounds, In: Proceedings WCET, 2007
- Ermedahl, Sandberg, Gustafsson, Bygde, Lisper: Loop bound analysis based on a combination of program slicing, abstract interpretation, and invariant analysis, In: Proceedings WCET 2007
- De Michiel, Bonenfant, Casse, Sainrat: Static Loop Bound Analysis of C Programs Based on Flow Analysis and Abstract Interpretation, In: Proceedings RTCSA, 2008