Text embeddings have limitations when it comes to handling long documents and out-of-domain data.
Today, we are talking to Nils Reimers. He is one of the researchers who kickstarted the field of dense embeddings, developed sentence transformers, started HuggingFace’s Neural Search team and now leads the development of search foundational models at Cohere. Tbh, he has too many accolades to count off here.
We talk about the main limitations of embeddings:
Are you still not sure whether to listen? Here are some teasers:
Nils Reimers:
Nicolay Gerold:
text embeddings, limitations, long documents, interpretation, fine-tuning, re-ranking, future research
00:00 Introduction and Guest Introduction 00:43 Early Work with BERT and Argument Mining 02:24 Evolution and Innovations in Embeddings 03:39 Constructive Learning and Hard Negatives 05:17 Training and Fine-Tuning Embedding Models 12:48 Challenges and Limitations of Embeddings 18:16 Adapting Embeddings to New Domains 22:41 Handling Long Documents and Re-Ranking 31:08 Combining Embeddings with Traditional ML 45:16 Conclusion and Upcoming Episodes