Here we have 300 artworks images from the collection at the Metropolitan Museum of Art (MET) to illustrate
the power of dimension reduction algorithm (i.e. Principal Component Analysis, Spectral Embeddings, ...). Each of the images
in this dataset now lives in very high dimension (128 width x 128 height x 3 color channels) and for visualization purposes,
they are currently all positioned randomly on a 2D base.
Are there any "principals" in these 49,152 length vectors (flattening image) that can give us a better understanding for these images
maybe for a later task such as classification? This question have always been a important one to ask because high dimensional data are
very noisy and simple classfiers trained on high dimensional data aren't very robust and accurate in their predictions.
Modern representation learning approaches lend us a way of looking at this problem, particularly for this visualization project, we are
adapting an spectral embedding dimension reduction technique knwon as Laplacien Eigenmap, which reduces the dimensionality of the dataset
while preserving the relationships between each data points. We will explain more in the following sections.
From Data Points to Graph - KNN
Intuitively, what would be an valid way of preserving the relationship between each data point?
Perhaps we should know first what is the relationship between each data point. Again, these data point
lives in very high dimensional space, so mathamatically, we can find the "norm" or the Eucledian distance between these vectors
as a very naive but practical way of meausing similarities on such high dimension.
So we have a distance between each of the vectors with each other, how do we meausre who is close? How do we decide about what
is a good threshold of measurement of "close"? Again, we start with an intuitive approach called K Nearest Neighbor, which simply
deem the kth closest vectors to the current vectors as "close". Mathamatically, this is equivalent of making a matrix to represent
all of thepotential neighbors of a vector and then adding edges between each node (vector) i and its k closest neighbors.
We will extend such idea of "matrix" or similarity matrix in the later section. Now just play around with perfroming K nearest neighbors
on a subset of the image data set illustrated previously.
K =1
(Hover over the image to view its neighbor. Click the image to "lock" the view and use the slider to expand to \(K\) Neighbors. Click the image again to "unlock" the view.)
Measure of Closeness
We denote similarity matrix as W and degree matrix as D. In terms of KNN,
similarity matrix encodes the information about the nodes and their respective neighbors.
Degree matrix, on the other hand, is a diagnol matrix that encodes the number of neighrbos
for each node in its diagonal entries, which is calculated by the sum of each of the rows of
the similarity matrix. In the KNN case, degree matrix will be a diagonal matrix with diagonal entries k.
A Similarity Matrix (6x6) here:
= 𝑊
A Degree Matrix (6x6) here:
= 𝐷
Then we can compute the Graph Laplacian Matrix 𝐿 = 𝐷 − 𝑊
𝐷 is the degree matrix
𝑊 is the similarity matrix
You may be wondering, what the heck are these numbers and matrix and how are they even
related to sloving the issue of extracting principals? Wait a little, we will expain in the following section.
(Hover over the images to view its neighbors and highlight in matrix.)
Framing Optimization Problem
Don't be intemidated by the following scary looking math equation, it will be very easy to interpret
once you have read through this section. Now just forget everything we told you earlier with the linear algebra stuff (we
will use that later and you will also see the connections later). Now back to the problem of "preserving relationships on high dimension
to when the data is on low dimension", this is the equivalence of saying that "data points should be closer together when tehy are projected to
teh lower dimension". With this in mind, let's look at this equation:
This equation is essentially saying that we are looking at a weighted cost function where data point \(f_i\) and \(f_j\) after projection would impose a larger cost
when they are far away from each other. Wait, but this doesn't seem right? Actually, there is the similarity matrix 𝑊 multiplied to this distance term to say that
"the bigger the term in 𝑊, the more the distance weights in the overall cost" (in case of KNN, this value would be just 0 and 1 in 𝑊). In this manner, when two data points are neighbors
of each other, represented by having a 1 in their corresponding \(i\) and \(j\) entries in the 𝑊 matrix and their distance after projecting to the lower dimension space would count towards
the cumulative cost of the projection.
Now this problem becomes a optimization porblem and, as data science majors, we love optimizatin problems because we can use different ways of looking
at the problem to find the optimal solution, in this case it would be the optimal "projected data point (each \(f_i, f_j\))" that minimize the cost function.
Laplacien Eigenmap In Eigenvector Space
Again, on the previous section, we introduced the way we frame the issue of dimension reduction
into a optimization issue. Now we will add a little bit more details into the picture. For the scope of this visualization,
we will not go into the mathamatical detail of the reasoning, but the optimization problem on the previous section can be reframed into a vector representation
in the following form:
$$ \text{Cost}(\hat{f}) = \hat{f}^T L \hat{f} $$
subject to \( \|\hat{f}\| = 1 \) and \( \hat{f} \perp (1,1,\dots,1)^T \)
Doesn't this 𝐿 looks very familier? Yep, this is the exact Laplacien Matrix that we have introduced from the previous section, which is directly the key of
solving this optimization issue. Now we have framed the new optimization problem in a vector form, but we still need to find a way to minimize it. How should we do that? Again, for
mathamatical reasons that we will not go into in this visualization, the solution to this problem is exactly the bottom non-zero Eigenvalue Eigenvector of the Laplacien Matrix, which we have
graphed on the right with each images labeled on the data point (location of the image is decided by the bottom Eigenvectors). Observe the graph carefully and you may see some interesting trend (i.e. color intensity of the image,
object ratio with respect to the image).
(Click the button to switch between viewing 1 Dimensional projection vs. 2 Dimensional projection)
Futher Reading: for the mathamatical aspects of Laplacien Eigenmap, or about how the bottom Eigenvectors is the minimized solution,
we have linked relevant resources below in the footer note section. The original publication of the Laplacien Eigenmap algorithm was also linked.
Different Number For \(K\) Neighbors?
Hyperparameter refers to the variables that we can set (not learned by the model). Remenber that there is a hyperparameter
that you can tune for different performance fo the algorithm? That's right, the number of neighbors can significantly effect the similarity matrix created,
which significantly chanegs teh result of the algorithm.
Our default \(K\) previously was \(35\)
Select the number of \(K\) neighbors you want and click visualize to see the effect on the projection!
Maybe try to identify some differences in the principals captured by different \(K\) values. This also illustrate another usage of
this algorithm: capturing some image with specific characteristics for filtering purposes (i.e. the art pieces image).
Preserving Locality: Notice that the underlaying principal extracted is somewhat the same,
illustrating underlaying mechanism of Laplacien Eigenmap! Play around with different transformation and experiment on your own!.