General Context

Dimensionality reduction

Reduce the number of features (variables/dimensions) in a dataset while retaining as much of the relevant information as possible.

Possible usecases

  • Visualisation
  • Feature extraction/transformation

Curse of Dimensionality

Sample complexity

The number of samples needed to learn a hypothesis class increases exponentially with the number of features :

Thus, if a machine learning model takes in high-dimensional input, it will suffer from the curse of dimensionality.

Key Idea

A key idea would be to remove redundant features.

Redundant features are features that are not important in distinguishing.

yxOriginal Datay is redundantxReduced Dimensions

Given some other data, we can also change the basis to create redundant features.

yxOriginal DatayxBasis ChangenewYnewXnewXReduced Dimensions ## Identifying the basis

Given the transposed data matrix ,

we can decompose it into 3 matrices, , with Singular Value Decomposition:

Basic concepts

Orthonormal

Two vectors are orthonormal if they are:

  • normalised:
  • orthogonal:

Orthonormal basis

A set of vectors such that every vector can be expressed as a linear combination of the set

Existence of SVD

WLOG, let .

For any real valued matrix , there exists a factorisation

called SVD such that,

  • is having orthonormal columns (left singular vectors)
  • is and is a diagonal matrix with (singular values)
  • is with orthonormal columns (right singular vectors)

Thus, using SVD:

A simplified interpretation, with a data matrix

Using SVD:

Effectively, this means that every data point is a linear combination of basis vectors .

Dimensionality reduction via SVD

N singular values tells us about the importance of the new basis vectors. Less important basis vectors.

From , we can see the singular values:

Recall that singular values are ordered in size:

Thus, we can just reduce the dimensions by setting the to 0.

where, effectively we set all values after new dimension to .

We can then take this new basis (with removed dimensions) of size ():

Compressing vectors

Using the newly found basis , we can compress the matrix

getting a matrix .

The matrix defines a dimensional subspace of the feature space, and for any feature vector , multiplying with obtains the linear combination coefficients in the dimensional subspace.

For example, if the new dimensions , then given a feature vector ,

Reconstructing back to the original matrix

For a fixed r, it holds that \tilde{U}\tilde{U}^{T} \approx \mathbb{I} with approximation error dependent on r.

The data can be reconstructed in the original dimension :

Mean centered-data

To get mean centered-data:

Step 1: Compute mean feature vector over samples

Step 2: Compute mean-centred data points

Step 3: Use mean-centred data matrix

Given a mean-centered data \hat{X}^{T}, the values \frac{\sigma^{2}_{j}}{N-1} are variances of the data in the basis defined by vectors u^{(j)}

Thus, given a threshold of variance , choose a minimum such that

WIth the r found, original data points \hat{x}^{(i)} and reconstructed data points \tilde{x} are close to x.

Principle Component Analysis

Capture components that maximise the statistical variations of the data.