Cosine Similarity — Introduction and applications in NLP
How similar is Father and Mother? When you ask this question to anyone, you will have different answers. Some say, mum can cook, dad can’t cook; mum has long hair, dad has short hair. Then, how are you going to quantify the similarity between Father and Mother? Luckily, we have cosine similarity.
Let’s start with the equation for cosine similarity. It is just a dot product between 2 vectors!
First, we construct the vector for Father. Each element of the vector represents the feature of Father. if the Father can cook, then element is represented with “1”, else it is represented with “0”. As shown in figure 1, The Father is represented in [1,0,1,1,1,0,0], while the Mother is represented in [1,0,1,1,0,1,1]. Now, we can calculate the cosine similarity of Father and Mother which is 0.671 similar.
The dot product may not be clear to you. Let’s visualize it with basic trigonometric.
As we can see from demo 1, with theta 1, A is further away from B. When we project A to B direction, the distance is much shorter compared to the projection of A in B direction is demo 2. It is very clear in Demo 2 that A is very similar to B. This clearly explain how cosine function can be used in finding similarity.
Lastly, I will provide a summary of cosine similarity. Hope that this will help you in understanding cosine similarity clearly.