Static Analysis

Static analysis

Examination of code without executing it

Static analysis aims to

  • identify potential errors
  • ensure adherence to coding standards
  • optimise performance using
  • early bug detection

Possible tools

  • linters & style checkers (check formatting/practices)
  • data flow analysis (reasoning about the set of values at different points in execution)
  • control-flow analysis (reasoning about potential order of execution)
  • type checking (verify variables are used consistently for their types)

Python

Python will attempt to execute unsafe programs, and fails mostly at runtime. Shows type error when it happens.

Language system puts burden of writing safe programs (with relevance to type safety) on programmers. Thus, a type system detecting type errors can act as a static analyzer.

Type system

Mechanism for distinguishing good programs from bad.

A tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute.

Good programs are well-typed (where the types are appropriate) while bad programs are ill-typed or not type-able.

Lambda Calculus

Motivation

  • Foundation for functional programming
  • Language theory

Syntax

An anonymous function is replaced by a lambda abstraction.

int f (int x) { return x; }

would be replaced by

Similarly, the function call would be replaced by a lambda application:

int f (int x) { return x; }
f (3);

would be represented as

Conventions

Body of \lambda extends as far to right as possible

Functions are left-associative

Higher order functions

Functions can be returned as return values, and passed in as arguments

Free and Bound Variables

\alpha- equivalence

Bound variable can be renamed, as it is a placeholder.

In this scenario, is the scope of the

However, note that the names of free variables matter, and cannot be renamed.

Variables can be both free and bound.

Let be the set of free variables in .

  • means is a free variable in M

Given that

The set of free variables are defined as:

These statements can be understood as:

  • the free variable in is just itself
  • the free variables in are the free variables in , except for
  • the free variables in are the free variables in M and the free variables in N

Semantics

reduction:

effectively means everything represents inside the function is substituted with .

Substitution

Given that

Substitution of free variables happen as such:

The statements can be interpreted as:

  • doing the substitution of replaces it with
  • replacing a variable that does not exist in the expression keeps it constant

Effectively, to substitute across a function application, substitution needs to be done within its terms.

Effectively, as substitution only affects the free variables, note that since here is a bounded variable, no substitution occurs.

Name Capture

For substitution, avoid name capture:

becomes

which is incorrect, as the free variable becomes a bounded variable . Thus, to avoid this, rename bound variables before substitution (rename bound variable to first).

Reduction Rules

Normal form

redex: a term of form

normal form: a term containing no redex

Confluence

Terms can be evaluated in any order.

Effectively, the final result is the same, regardless of which terms are evaluated, and in which order.

The theorem is as such:

Given that ,

Effectively, for all , if and , then there exists such that and . This can be interpreted as: If there are steps to go from the original form to any intermediary form, there are steps to go from the intermediary form to the final form.

Non-Terminating Reduction

Some terms have no normal forms, and thus, some terms may have both terminating and non-terminating reduction sequences.

Reduction Strategies

Normal-order

The left-most, outer-most redex is reduced first.

If normal form exists, normal-order reduction finds normal form.

Given the term,

normal order reduces in this order:

Applicative-order

Left-most, inner-most redex is reduced first.

Given the term

Applicative order may not be as efficient as normal order when argument is not used.

Programming

Encoding Boolean values/operator

Church numerals

Pairs/Tuples

Effectively, a pair is defined

f -> x -> y -> f(x, y)

Then to get the equivalent values back:

head = p -> p(x -> y -> x)
tail = p -> p(x -> y -> y)

This definition then can be extended to work for tuples.

Formal Type Systems

Formal type systems are precise specification of the type checker that allows formal proofs of type safety.

Type system

  • typing rules: assigns types to terms
  • type safety: well-typed terms cannot go wrong

Adding Types to Calculus

With these types, a term can be given a type:

The typing context refers to a set of typing assumptions:

If the context is empty, it is indicated with , and otherwise includes the types of all the free variables in .

STLC Typing Rules

Soundness and Completeness

Sound

Sound type systems never accept programs that can go wrong.

Complete

Complete type systems never rejects a program that can’t go wrong.

For any Turing-complete PL, the set of programs that may go wrong is undecidable, meaning that a type system cannot be both sound and complete.

Soundness is preferred.

STLC

Well-typed terms in STLC are sound.

Type safety in STLC

For any and , if and , then , and either or .

The reduction of a well-typed term either diverges, or terminates in a value of the expected type.

Lemma: Preservation M, M', \tau, if \cdot \vdash M: \tau and M \rightarrow M', then \cdot M': \tau

Well-typed terms reduce only to well-typed terms of the same type.

Lemma: Progress M, \tau, if \cdot \vdash M: \tau, then either M \in values or \exists M'.M\rightarrow M'.

A well-typed term is either a value or can be reduced.

STLC is not complete. The type system may reject terms that do not go wrong.

Well-typed terms in STLC always terminate

Strong normalisation theorem.