General Context

Dimensionality reduction

Reduce the number of features (variables/dimensions) in a dataset while retaining as much of the relevant information as possible.

Possible usecases

Visualisation

Feature extraction/transformation

Curse of Dimensionality

Sample complexity

The number of samples needed to learn a hypothesis class increases exponentially with the number of features :

Thus, if a machine learning model takes in high-dimensional input, it will suffer from the curse of dimensionality.

Key Idea

A key idea would be to remove redundant features.

Redundant features are features that are not important in distinguishing.

Given some other data, we can also change the basis to create redundant features.

## Identifying the basis

Given the transposed data matrix ,

we can decompose it into 3 matrices, , with Singular Value Decomposition:

Basic concepts

Orthonormal

Two vectors are orthonormal if they are:

normalised:

orthogonal:

Orthonormal basis

A set of vectors such that every vector can be expressed as a linear combination of the set

Existence of SVD

WLOG, let .

For any real valued matrix , there exists a factorisation

called SVD such that,

is having orthonormal columns (left singular vectors)

is and is a diagonal matrix with (singular values)

is with orthonormal columns (right singular vectors)

Thus, using SVD:

A simplified interpretation, with a data matrix

Using SVD:

Effectively, this means that every data point is a linear combination of basis vectors .

Dimensionality reduction via SVD

$N$ singular values tells us about the importance of the new basis vectors. Less important basis vectors.

From , we can see the singular values:

Recall that singular values are ordered in size:

Thus, we can just reduce the dimensions by setting the to 0.

where, effectively we set all values after new dimension to .

We can then take this new basis (with removed dimensions) of size ():

Compressing vectors

Using the newly found basis , we can compress the matrix

getting a matrix .

The matrix defines a dimensional subspace of the feature space, and for any feature vector , multiplying with obtains the linear combination coefficients in the dimensional subspace.

For example, if the new dimensions , then given a feature vector ,

Reconstructing back to the original matrix

For a fixed $r$ , it holds that $\tilde{U}\tilde{U}^{T} \approx \mathbb{I}$ with approximation error dependent on $r$ .

The data can be reconstructed in the original dimension :

Mean centered-data

To get mean centered-data:

Step 1: Compute mean feature vector over samples

Step 2: Compute mean-centred data points

Step 3: Use mean-centred data matrix

Given a mean-centered data $\hat{X}^{T}$ , the values $\frac{\sigma^{2}_{j}}{N-1}$ are variances of the data in the basis defined by vectors $u^{(j)}$

Thus, given a threshold of variance , choose a minimum such that

WIth the $r$ found, original data points $\hat{x}^{(i)}$ and reconstructed data points $\tilde{x}$ are close to $x$ .

Principle Component Analysis

Capture components that maximise the statistical variations of the data.

Image compression can be done using a PCA model.

Given a image, the image can be represented as a (512, 512, 3) array, which can be reshaped into 2 dimensions as (512, 1536).

Using PCA, we can determine the amount of features to keep, , and thus we can use SVD to obtain these number of features, so the compressed image will be of shape (n, 1536), and we have the reconstruction matrix to reconstruct which would be of shape (512, n). We would also need to normalise the features, which we do by storing the mean during reconstruction for scaling during the PCA.

Space saved:

where ,

Drawbacks:

lossy compression
large
storage of
- can be kept

Explorer

Dimensionality Reduction

General Context

Curse of Dimensionality

Key Idea

Basic concepts

Dimensionality reduction via SVD

Compressing vectors

Reconstructing back to the original matrix

Mean centered-data

Principle Component Analysis

Graph View

Table of Contents

Backlinks