Chapter 11. Embeddings and Representation Learning

Representation Learning

In the previous chapter, we learned how we can interface language models with external tools, including data stores. External data can be present in the form of text files, database tables, knowledge graphs, and just about any type you can think of. Data can span a wide variety of content types, ranging from proprietary domain-specific knowledge bases to intermediate results and outputs generated by LLMs.

If the data is structured, for example residing in a relational database, the language model can issue a SQL query to retrieve the data it needs. But what if the data is present in unstructured form?

One way to retrieve data from unstructured text datasets is to search by keywords or use regular expressions. For the Apple CFO example in the previous chapter, we can retrieve text containing CFO mentions from a corpus containing financial disclosures, hoping ...

Get Designing Large Language Model Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.