# Unsupervised 1 - traditional

## Introduction

1. Supervised Learning = you have label.
2. Unsupervised Learning = you don't have label.

PCA / t-SNE = dimension reduction (faster, reduce noise)

We can use Unsupervised Learning to **improve** Supervised Learning.

## PCA

Z = XQ, Q is a matrix, then the direction is changed.

**Measure information -> variation**

A deterministic variable = 0 variation. We want to transform Z to have most info in 1st column, 2nd most info in 2nd column, etc.

De-correlation

Z as latent variable

AV = LV

eignenvector(V) direction is not changed by matrix A.

eignenvalues(L) a scale factor.

## t-SNE

t-SNE = t-distributed + stochastic neighbor embedding

nonlinear method

overcome PCA limitation

No train & test data, t-SNE modifies output directly in order to minimize cost function.

It didn't know labels, and just try to **preserve distance** between each input vector.

Huge RAM requirement

t-SNE will fail on Donuts or XOR problem.
