Machine Learning Spot

LangChain Embeddings: An Easy Way to Understand Vector Embeddings

LangChain Embeddings

Do you know how to create embeddings using the LangChain and OpenAI APIs? With LangChain Embeddings, you can easily figure it out and understand what vector embeddings are.

Ever wondered how Google understands our intentions when we search for anything and how our faces are read and compared by models? How are our favorite products recommended?

Vector embeddings play a major role in that. Let’s quickly dive into what vector embeddings look like and how we can use them in LangChain. Since we will be using LangChain, I will be calling vector embeddings LangChain Embeddings most of the time.

LangChain Embeddings: What are They?

Well, programming languages were invented to talk with machines to program them; now embeddings are used by them to understand our intentions to make our communication with the machines even better than before.

LangChain embeddings are just numerical representations of our text that are used by the model to know the user’s intent or to do a semantic search and know the semantic meaning. There are many use cases that we will discuss later in the blog.

The only reason to convert it into embeddings is that it is trained on numbers, so we need numbers.

Let’s learn it with a famous example of an apple. When we talk about apples, there are two possibilities: one is that this apple is a fruit, and the other is that this apple might be the company name.

To know its intention, a search engine like Google will look for the semantic meaning, which is saved in the form of vector embeddings. These numbers are features of two different things, and the ones that match are the most similar.

Both words might have features like: In stock, calories, electronic, company, can rot, etc.

S. NoFeatures Apple FruitApple Company Mango
1In stock111
2Rots101
3Fruit101
4Electronic010
5Revenue0450
5employees0650

The above table is a symbolic representation as an example to show how embedding knows what is similar and what is different 0 is where there is no possibility of having that feature, and 1 is a sign of possibility. Don’t get confused with 65 and 45; they are symbolic representations too.

As you can see in this table, mango and apple have similar features, so there is a high chance that they will have a high similarity score, which is calculated by using cosine similarity.

This is the most interesting part; we can even apply math to it, as in the famous example of King – Man + Woman = Queen. Here, when we subtract man from King, features like the monarch, throne, etc., remain with daughter, which is closer to a queen, and manly features are subtracted like son, father, prince, etc.

Now, let’s fetch real-vector embeddings using LangChain using the below code. you can replace the words with the ones that you want to compare I have chosen apple, banana and elecronics.

To run the below code, run pip install openai langchain first. Don’t forget to replace “paste your API key here” with your open API key that you can generate from here.

LangChain Embeddings: API Key Generation
Click on Create New Secret Key

LangChain Embeddings: With Open AI

from langchain.embeddings.openai import OpenAIEmbeddings

# Replace 'YOUR_OPENAI_API_KEY' with your actual OpenAI API key
openai_api_key = "paste your API key here"

# Initialize the OpenAI embeddings model
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

# Define the words you want to convert into vector embeddings
words = ["apple", "banana", "electronic"]

# Convert each word into a vector embedding
embeddings_list = [embeddings.embed_query(word) for word in words]

# Print the vector embeddings
for word, embedding in zip(words, embeddings_list):
    print(f"Word: {word} - Embedding: {embedding}")
LangChain Embedding's Screen Shot of a portion
Screenshot of a small portion of LangChain Embeddings of a single word

Embeddings of the above words generated using the above code can also be seen by observing this document, in which I saved the embeddings. Only 3 words’ embeddings occupied 28 pages of the MS Word document. This is how many features only three words have

LangChain embeddings save the hassle of bringing embeddings with only the OpenAI API as raw.

Conclusion:

So this might be your first code using Langchain, or you may just be doing it to understand how embeddings work, or maybe you just searched for LangChain embeddings. Either way, I hope you’ve got an idea of what LangChain embeddings, embeddings, or vectors are. For further queries or suggestions, feel free to contact us, and you can also read our other posts.

Liked the Post?  You can share as well

Facebook
Twitter
LinkedIn

More From Machine Learning Spot

Get The Latest AI News and Insights

Directly To Your Inbox
Subscribe

Signup ML Spot Newsletter

What Will You Get?

Bonus

Get A Free Workshop on
AI Development