Hacker News

Which vector similarity metric should I use?

by imaureron 5/16/2023, 3:34:15 PM with 2 comments

by sharemywinon 5/16/2023, 4:10:39 PM
Does this seem right?
| Task | Distance Measure |
|-------------------------------|-----------------------|
| Document classification | Cosine Distance |
| Semantic search | Cosine Distance |
| Recommendation systems | Cosine Distance |
| Image recognition | Euclidean Distance (L2)|
| Speech recognition | Euclidean Distance (L2)|
| Handwriting analysis | Euclidean Distance (L2)|
| Recommendation systems | Inner Product (Dot Product)|
| Collaborative filtering | Inner Product (Dot Product)|
| Matrix factorization | Inner Product (Dot Product)|
| Image processing | L2-Squared Distance |
| Error detection and correction| Hamming Distance |
| DNA sequence comparison | Hamming Distance |
| Taxicab geometry | Manhattan Distance |
| Chessboard distance | Manhattan Distance |
by messeon 5/16/2023, 3:48:53 PM
Even ignoring vector magnitudes, wouldn't cosine distance as a measure of similarity only make sense if you're working with a convex set? That seems like it's far from a guarantee working in a high-dimensional space.