Motivation

35-50% of time validating and debugging software
Estimated cost: 50-75% of testing, debugging and verification

General Idea

Debugging can be done

during development and testing (due to failing tests, static analysis reports)
during commit, integration and production (based on user reports)

Terminology

Mistake

Human act/decision resulting in error

Defect/bug

Error in program code

Fault

Location in which programming error is triggered, and unwanted state is entered

Failure

Where fault becomes externally visible

Types of Debugging

`printf` Debugging

Pros

Simple and intuitive

Language and tool agnostic

Cons

Can quickly become confusing and unclear

Might forget print statements in program code

Might require recompilation

Logging

Logging is a more systematic alternative, allowing for multiple debugging levels:

DEBUG a general debugging event
INFO an event for informational purposes
WARN an event that might possibly lead to an error
ERROR an error in application that is possibly recoverable
FATAL a fatal event preventing continuation of program

Debugger

A debugger works by setting breakpoints

which refers to pausing at certain locations in the code.

With the pause, the user can

step in/over/out of methods
do step by step execution
inspect program state at a point of time

Pros

More systematic than printf debugging

Can display additional information

Can set breakpoints specific to values

Mental Model

In order to debug the failure, we need to trace it back to the defect that caused the first fault.

Challenges

Program states are typically large, making it challenging or infeasible to search them all amnually.

Not always clear if intermediate states are correct
Executions might consists of million of steps

Scientific Method to Debugging

Formulate a question
Come up with hypothesis, based on knowledge obtained, while formulating question that may explain the observed behaviour
Determine logical consequences of hypothesis and formulate a prediction that can support or refute the hypothesis. Ideally, prediction would distinguish hypothesis from likely alternatives.
Test prediction in experiment. If prediction holds, confidence in hypothesis increases, otherwise it decreases.
Repeat Steps 2-4 until no discrepancies between hypothesis and predictions and/or observations.

Example

Given a function that strips away HTML markup, these are the test cases.

Input	Expected Output	Output	Outcome
`<b>foo</b>`	foo	foo	Passed
`<b>"foo"</b>`	”foo”	foo	Failed
`"<b>foo</b>"`	”foo”	`<b>foo</b>`	Failed
`<b id="bar">foo</b>`	foo	foo	Passed
Then, we can come up with some example hypotheses:

Hypotheses	Outcome	Test Case
Tag `<>` always stripped from input	False	Check test case 3
Double quotes always stripped from input	True
Tags in double quotes are not stripped	True	Check test case 2,3

Then, we can refine the hypothesis with a new one: The error is due to the tag being set.
However, this is wrong.
We can then refine it again: Error is due to quote condition evaluating to true.
Now, this is correct.
Continue and repeat and iterate.

Diagnosis

When diagnosis explains both causality and incorrectness, then and only then is it time to actually fix the code accordingly.

Rubberducking

Revisit observations and explain problem to someone else:

use a rubber duck/another inanimate object to explain the issue.

Program Slicing

How to automatically determine which earlier variables influenced current, erroneous state?

Slicing

Extracts a subset of the program that potentially affects variables or values at a certain program location.

The main advantage means that the slice only needs to be inspected, rather than the whole program.

Pros

Rules out locations of program that could not have effect on failure

Bring together possible origins that may be scattered across code

Tracking Origins

Start with faulty state
Determine ‘s origins - parts of earlier states that could have caused faulty state
For each origin , determine whether faulty or not

If we find a part of state that is faulty, yet only has correct origins, defect has been found.

Dependencies

Data dependency

is assigned a value computed from .

Control dependency

statement is only executed because condition involving was evaluated, influencing execution path.

Static vs Dynamic Slicing

Static slicing generally contains the whole method, while dynamic slicing eliminates some more part of the code - for example, if a branch was not executed, it cannot be part of the slice.

Forward vs Backward Slicing

Backward slicing refers to what influenced a value, while forward slicing refers to what statements are influenced by value.

Statistical Fault Localisation

How to automatically localise faults?

Statistical debugging

Statistical reasoning for debugging

Statistical fault localisation

Utilise multiple executions for localising faults

Observation: whether a program passes or fails likely correlates with events that precede it - where correlation does not imply causation.

Tarantula

Intution

Program elements that are executed by a failing test are more likely to be faulty than those primarily executed by passing test cases.

If suspiciousness is low:

failing test cases do not exercise line If suspiciousness is high:
failing test cases exercise this line

Statistical Debugging Tools Adoption

Limited tool support in form of IDEs, but useful as ingredients in other approaches.

Hypotheses:

Programmers who debug with assistance of automated debugging tools will locate bugs faster (conclusion: yes, on easy debugging task)
Effectiveness of automated tools increases with level of difficulty (no supprort)
Effectiveness of debugging when using ranking based automated tools are affected by rank of faulty statement (no support)

Test Case Reduction

Are large inputs (generated form automated testing techniques) necessary to trigger bugs?

Small scope hypothesis

A high proportion of errors can be found by testing a program for all test inputs within some small scope.

Test-case reduction tools reduce an input element, apply the user-supplied test oracle, and checks whether the bug is still being triggered. If so, it continues applying another reduction, if not, it undoes the change and attempts another reduction.

Pros

Easier to debug/analyze

Identify duplicate bug-inducing test cases

Easier to communicate

Cons

Complex interestingness test

Delta debugging inefficent for highly structured languages

Approaches

Binary search - if both halves pass, then where is the error?

Delta Debugging

Systematically remove elements from bug-inducing input.

Generally

find a smaller test case that still fails
find local minimum: no subset of reduced test causes the test to fail
start reducing considering a granularity of (halves)
consider partitions of input, and complement
if any partitions causes test to fail, continue reducing, otherwise reduce complement
- increase granularity if does not work

Isolating Failure-Inducing Changes

Regression bug

Something that worked before stopped working

Automatic regression debugging using version control systems.

Approach

Naive approach: run every version to check if failure is triggered

takes long time
many changes across two adjacent versions
fully automated

Binary search:

identify commit in tries
supported by git bisect

Delta debugging on changes:

apply delta debugging by considering individual patches
differrent granularities possible

Explorer

Debugging

Motivation

General Idea

Terminology

Types of Debugging

`printf` Debugging

Logging

Debugger

Mental Model

Challenges

Scientific Method to Debugging

Example

Diagnosis

Rubberducking

Program Slicing

Tracking Origins

Dependencies

Static vs Dynamic Slicing

Forward vs Backward Slicing

Statistical Fault Localisation

Tarantula

Statistical Debugging Tools Adoption

Test Case Reduction

Approaches

Delta Debugging

Isolating Failure-Inducing Changes

Approach

Graph View

Table of Contents

Backlinks

Explorer

Debugging

Motivation

General Idea

Terminology

Types of Debugging

printf Debugging

Logging

Debugger

Mental Model

Challenges

Scientific Method to Debugging

Example

Diagnosis

Rubberducking

Program Slicing

Tracking Origins

Dependencies

Static vs Dynamic Slicing

Forward vs Backward Slicing

Statistical Fault Localisation

Tarantula

Statistical Debugging Tools Adoption

Test Case Reduction

Approaches

Delta Debugging

Isolating Failure-Inducing Changes

Approach

Graph View

Table of Contents

Backlinks

`printf` Debugging