Introduction to Dimension Reudction

The images may take a while to load...

Here we have 300 artworks images from the collection at the Metropolitan Museum of Art (MET) to illustrate the power of dimension reduction algorithm (i.e. Principal Component Analysis, Spectral Embeddings, ...). Each of the images in this dataset now lives in very high dimension (128 width x 128 height x 3 color channels) and for visualization purposes, they are currently all positioned randomly on a 2D base.

Are there any "principals" in these 49,152 length vectors (flattening image) that can give us a better understanding for these images maybe for a later task such as classification? This question have always been a important one to ask because high dimensional data are very noisy and simple classfiers trained on high dimensional data aren't very robust and accurate in their predictions.

Modern representation learning approaches lend us a way of looking at this problem, particularly for this visualization project, we are adapting an spectral embedding dimension reduction technique knwon as Laplacien Eigenmap, which reduces the dimensionality of the dataset while preserving the relationships between each data points. We will explain more in the following sections.

From Data Points to Graph - KNN

Intuitively, what would be an valid way of preserving the relationship between each data point? Perhaps we should know first what is the relationship between each data point. Again, these data point lives in very high dimensional space, so mathamatically, we can find the "norm" or the Eucledian distance between these vectors as a very naive but practical way of meausing similarities on such high dimension.

So we have a distance between each of the vectors with each other, how do we meausre who is close? How do we decide about what is a good threshold of measurement of "close"? Again, we start with an intuitive approach called K Nearest Neighbor, which simply deem the kth closest vectors to the current vectors as "close". Mathamatically, this is equivalent of making a matrix to represent all of thepotential neighbors of a vector and then adding edges between each node (vector) i and its k closest neighbors.

We will extend such idea of "matrix" or similarity matrix in the later section. Now just play around with perfroming K nearest neighbors on a subset of the image data set illustrated previously.

K =1

(Hover over the image to view its neighbor. Click the image to "lock" the view and use the slider to expand to $K$ Neighbors. Click the image again to "unlock" the view.)

Measure of Closeness

We denote similarity matrix as W and degree matrix as D. In terms of KNN, similarity matrix encodes the information about the nodes and their respective neighbors. Degree matrix, on the other hand, is a diagnol matrix that encodes the number of neighrbos for each node in its diagonal entries, which is calculated by the sum of each of the rows of the similarity matrix. In the KNN case, degree matrix will be a diagonal matrix with diagonal entries k.

A Similarity Matrix (6x6) here:

= 𝑊

A Degree Matrix (6x6) here:

= 𝐷

Then we can compute the Graph Laplacian Matrix 𝐿 = 𝐷 − 𝑊
𝐷 is the degree matrix
𝑊 is the similarity matrix

You may be wondering, what the heck are these numbers and matrix and how are they even related to sloving the issue of extracting principals? Wait a little, we will expain in the following section.

(Hover over the images to view its neighbors and highlight in matrix.)

Framing Optimization Problem

Don't be intemidated by the following scary looking math equation, it will be very easy to interpret once you have read through this section. Now just forget everything we told you earlier with the linear algebra stuff (we will use that later and you will also see the connections later). Now back to the problem of "preserving relationships on high dimension to when the data is on low dimension", this is the equivalence of saying that "data points should be closer together when tehy are projected to teh lower dimension". With this in mind, let's look at this equation:

$$ \text{Cost}(\hat{f}) = \sum_{i=1}^n \sum_{j=1}^n W_{ij} (f_i - f_j)^2 $$

This equation is essentially saying that we are looking at a weighted cost function where data point $f_i$ and $f_j$ after projection would impose a larger cost when they are far away from each other. Wait, but this doesn't seem right? Actually, there is the similarity matrix 𝑊 multiplied to this distance term to say that "the bigger the term in 𝑊, the more the distance weights in the overall cost" (in case of KNN, this value would be just 0 and 1 in 𝑊). In this manner, when two data points are neighbors of each other, represented by having a 1 in their corresponding $i$ and $j$ entries in the 𝑊 matrix and their distance after projecting to the lower dimension space would count towards the cumulative cost of the projection.

Now this problem becomes a optimization porblem and, as data science majors, we love optimizatin problems because we can use different ways of looking at the problem to find the optimal solution, in this case it would be the optimal "projected data point (each $f_i, f_j$)" that minimize the cost function.

Laplacien Eigenmap In Eigenvector Space

Again, on the previous section, we introduced the way we frame the issue of dimension reduction into a optimization issue. Now we will add a little bit more details into the picture. For the scope of this visualization, we will not go into the mathamatical detail of the reasoning, but the optimization problem on the previous section can be reframed into a vector representation in the following form:

$$ \text{Cost}(\hat{f}) = \hat{f}^T L \hat{f} $$

subject to $ \|\hat{f}\| = 1 $ and $ \hat{f} \perp (1,1,\dots,1)^T $

Doesn't this 𝐿 looks very familier? Yep, this is the exact Laplacien Matrix that we have introduced from the previous section, which is directly the key of solving this optimization issue. Now we have framed the new optimization problem in a vector form, but we still need to find a way to minimize it. How should we do that? Again, for mathamatical reasons that we will not go into in this visualization, the solution to this problem is exactly the bottom non-zero Eigenvalue Eigenvector of the Laplacien Matrix, which we have graphed on the right with each images labeled on the data point (location of the image is decided by the bottom Eigenvectors). Observe the graph carefully and you may see some interesting trend (i.e. color intensity of the image, object ratio with respect to the image).

(Click the button to switch between viewing 1 Dimensional projection vs. 2 Dimensional projection)

Futher Reading: for the mathamatical aspects of Laplacien Eigenmap, or about how the bottom Eigenvectors is the minimized solution, we have linked relevant resources below in the footer note section. The original publication of the Laplacien Eigenmap algorithm was also linked.

Different Number For $K$ Neighbors?

Hyperparameter refers to the variables that we can set (not learned by the model). Remenber that there is a hyperparameter that you can tune for different performance fo the algorithm? That's right, the number of neighbors can significantly effect the similarity matrix created, which significantly chanegs teh result of the algorithm. Our default $K$ previously was $35$

Select K:

Select the number of $K$ neighbors you want and click visualize to see the effect on the projection! Maybe try to identify some differences in the principals captured by different $K$ values. This also illustrate another usage of this algorithm: capturing some image with specific characteristics for filtering purposes (i.e. the art pieces image).

Preserving Locality: Notice that the underlaying principal extracted is somewhat the same, illustrating underlaying mechanism of Laplacien Eigenmap! Play around with different transformation and experiment on your own!.

Transform From To

An Intuitive Introduction to Laplacian Eigenmaps for Dimensionality Reduction

By Weijie Zhang and Kaiwen Bian

June, 2024

Introduction to Dimension Reudction

From Data Points to Graph - KNN

Measure of Closeness

Framing Optimization Problem

Laplacien Eigenmap In Eigenvector Space

Different Number For \(K\) Neighbors?