Do you know how to create embeddings using the LangChain and OpenAI APIs? With LangChain Embeddings, you can easily figure it out and understand what vector embeddings are.
Ever wondered how Google understands our intentions when we search for anything and how our faces are read and compared by models? How are our favorite products recommended?
Vector embeddings play a major role in that. Let’s quickly dive into what vector embeddings look like and how we can use them in LangChain. Since we will be using LangChain, I will be calling vector embeddings LangChain Embeddings most of the time.
LangChain Embeddings: What are They?
Well, programming languages were invented to talk with machines to program them; now embeddings are used by them to understand our intentions to make our communication with the machines even better than before.
LangChain embeddings are just numerical representations of our text that are used by the model to know the user’s intent or to do a semantic search and know the semantic meaning. There are many use cases that we will discuss later in the blog.
The only reason to convert it into embeddings is that it is trained on numbers, so we need numbers.
Let’s learn it with a famous example of an apple. When we talk about apples, there are two possibilities: one is that this apple is a fruit, and the other is that this apple might be the company name.
To know its intention, a search engine like Google will look for the semantic meaning, which is saved in the form of vector embeddings. These numbers are features of two different things, and the ones that match are the most similar.
Both words might have features like: In stock, calories, electronic, company, can rot, etc.
S. No | Features | Apple Fruit | Apple Company | Mango |
1 | In stock | 1 | 1 | 1 |
2 | Rots | 1 | 0 | 1 |
3 | Fruit | 1 | 0 | 1 |
4 | Electronic | 0 | 1 | 0 |
5 | Revenue | 0 | 45 | 0 |
5 | employees | 0 | 65 | 0 |
The above table is a symbolic representation as an example to show how embedding knows what is similar and what is different 0 is where there is no possibility of having that feature, and 1 is a sign of possibility. Don’t get confused with 65 and 45; they are symbolic representations too.
As you can see in this table, mango and apple have similar features, so there is a high chance that they will have a high similarity score, which is calculated by using cosine similarity.
This is the most interesting part; we can even apply math to it, as in the famous example of King – Man + Woman = Queen. Here, when we subtract man from King, features like the monarch, throne, etc., remain with daughter, which is closer to a queen, and manly features are subtracted like son, father, prince, etc.
Now, let’s fetch real-vector embeddings using LangChain using the below code. you can replace the words with the ones that you want to compare I have chosen apple, banana and elecronics.
To run the below code, run pip install openai langchain first. Don’t forget to replace “paste your API key here” with your open API key that you can generate from here.
LangChain Embeddings: With Open AI
from langchain.embeddings.openai import OpenAIEmbeddings
# Replace 'YOUR_OPENAI_API_KEY' with your actual OpenAI API key
openai_api_key = "paste your API key here"
# Initialize the OpenAI embeddings model
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
# Define the words you want to convert into vector embeddings
words = ["apple", "banana", "electronic"]
# Convert each word into a vector embedding
embeddings_list = [embeddings.embed_query(word) for word in words]
# Print the vector embeddings
for word, embedding in zip(words, embeddings_list):
print(f"Word: {word} - Embedding: {embedding}")
Embeddings of the above words generated using the above code can also be seen by observing this document, in which I saved the embeddings. Only 3 words’ embeddings occupied 28 pages of the MS Word document. This is how many features only three words have
LangChain embeddings save the hassle of bringing embeddings with only the OpenAI API as raw.
Conclusion:
So this might be your first code using Langchain, or you may just be doing it to understand how embeddings work, or maybe you just searched for LangChain embeddings. Either way, I hope you’ve got an idea of what LangChain embeddings, embeddings, or vectors are. For further queries or suggestions, feel free to contact us, and you can also read our other posts.