HandsonML 9. Unsupervised Learning
1. Basic
example of clustering: semi-supervised learning, customer segmentation, data analysis, anomaly detection, search engine.
2. K-means
Good: Guarantee to converge
Good: Fast
Bad: Could converge to sub-optimal solution
Bad: K is predefined
2.1. Improve sub-optimal solutions
run the algorithm multiple times with different random initialization and keep the best solution.
2.2. K-means as data preprocessing
$$ need to copy image @ page 251
2.3. K-means in semi-supervised learning
(random pick) supervised learning ( train on 1st 50 samples) = 83.3% accuracy
k-means to identify 50 clusters (train on 50 centroid images) = 92.2% accuracy
label propagation = 94.0% accuracy
Full dataset (70k labeled samples) = 96.9%
3. DBSCAN
Two hyper-params: ε and min_samples
Robust to outlier
~ linear complexity
$$ add page 257, figure 9-14
4. GMM (Gaussian mixture model)
Last updated
Was this helpful?