LangChain YouTube Summarizer: In simple and easy steps, you are going to make your own YouTube video summarizer using LangChain. You just need to keep following the instructions that I am going to share here.
What’s special about this tutorial is that it works for any duration of video, whether it be 1 hour, 2 hours, or 3 hours. It works for any duration, and it generates an AI summary according to your own given requirement.
Prerequisites of Tutorial
This tutorial on Langchain YouTube summarizer requires no prerequisites at all, yes! I am going to simplify it so much that even if you don’t have any idea of programming, by following the given instructions, you can make it yourself.
Knowledge of Python or any other programming language is just a plus, not a requirement for this tutorial, so it is going to be beneficial for every type of audience with a programming background or no programming background.
You just need to make an ID on Google Collab and make new notebook so that you can run the code in the cells that I am going to share here.
Note: For people with no programming language background, I want to mention that while reading blog post, don’t ignore comments in the code that start with the hash symbol #. After the entire code is successfully run, you just need to modify the text, which I will mention later, to get the summary the way you want it to be.
Installing Required Packages
Let’s start with installing some packages that we require.
We are going to install 4 packages.
- LangChain, around which everything revolves.
- LangChain-OpenAI as we are going to use OpenAI’s LLM
- Pytube is a dependency that will help us load YouTube videos.
- YouTube Transcript API to communicate with YouTube and get transcript.
%pip install langchain --quiet #installs LangChain
%pip install langchain-openai --quiet #Installs LangChain OpenAI
%pip install pytube --quiet # this installs Pytube
%pip install --upgrade --quiet youtube-transcript-api # dor YouTube API
#Remove --quiet if you want to see the outputs generated while installing the packages.
Importing Required Modules
After installing the packages, we are going to import some useful modules from the packages that we just installed so that we can use them in our code.
from langchain_community.document_loaders import YoutubeLoader # To load YouTube
from langchain_openai import OpenAI #This is to interact with OpenAI
import os # To make an OpenAI environment so that we can interact with its API
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chains import LLMChain # To Chain prompt template with LLM
from langchain import PromptTemplate # to craft a Peompt Template
import time #To introduce Delays within the code
Everything needed within the code is imported; now we need to set up an OpenAI API environment, so if you don’t have an OpenAI API key , just go here and come back with your API key so that we can move further with our tutorial on LangChain Youtube Summarizer and begin our work on this LangChain use case.
Setting UP an OpenAI Environment
There is nothing much to do here; you have your API key. Just replace “My API” in the below code with your API that starts with sk, and that’s it! Your environment is set!
os.environ["OPENAI_API_KEY"] = "My API" #Replace My API with your API
The Show Begins: Creation of LangChain Youtube Summarizer
Here we start with the creation of Langchain YouTube Summarizer. This LangChain use case is part of our LangChain series, and we have already completed the initial part of this use case.
We’ll now simply load our video into the code, bring its transcript, break it down into chunks, and then fetch the summary of each group of chunks using OpenAI’s LLM (AI model used to generate AI text responses).
At the end, send back the summaries of each chunk again and summarize that list or combination of summaries. This is our strategy to crack this LangChain use case and create a LangChain YouTube summarizer.
Loading YouTube Video
The first step is to obtain the video link that you want to test. I have tested it with a two-hour documentary and with a 25-minute short documentary. The longer the video, the more time it will take to process, and the longer it is, the more tokens it will cost
Since I am writing this blog for longer videos, I will be using this 25-minute video link in my blog You can replace it with a 3-minute video to save money.
Here is the link that I am using.
Link: [ https://www.youtube.com/watch?v=VCCgdRF0AIA&list=PPSV ]
Now, let’s load it using the below code.
loader = YoutubeLoader.from_youtube_url(
"https://www.youtube.com/watch?v=VCCgdRF0AIA", add_video_info=True)
transcript = loader.load()
The above script must have successfully load your YouTube video transcript into the variable transcript to see what is saved in it, let’s run the variable by running it in a separate cell.
transcript
The output it gave was more than 2500 words. To see it and compare it with the video, you can see this document.
Here is a little glance at the output.
Output:
[Document(page_content="[Music] if you're watching this chances are you're human but what does it actually mean to be human for Millennia the answer was simple we were the Pinnacle of creation we were the only ones who could make art talk to each other play chess drop bombs or vacuum our homes but bit by bit artificial intelligence and robots are catching up it....
But that’s more than 2500 words long! How can we send such a long text without hitting the token limit? Yes, you guessed it right by splitting it into chunks, so let’s do that.
Splitting of YouTube Transcript into Chunks
The below code will split the above text of the YouTube transcript into chunks; it depends on us how we define a single chunk.
The chunk size is to tell how many characters a chunk will have, and the chunk overlap is to tell how many characters of each chunk should be allowed to overlap with the adjacent chunk, i.e., how many characters can be similar. If we assign chunk overlap to be 0, then according to my personal experience, there is a high chance of information loss.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=30)
docs = text_splitter.split_documents(transcript)
The variable docs contain the split text of transcripts. Go here to see the output of split text, as I can’t show a complete 2500+ word output here.
Now let’s check how many chunks it got divided into.
len(docs)
20
So it has 20 chunks. Let’s create a loop in which we send 5 chunks at a time to the LLM to generate a summary of those chunks, then again send those summaries combined to the LLM to generate the summary of the summaries.
Defining Loop to Get Summaries For Every 5 Chunks
Beginning the loop by first defining how many chunks we want to process.
num_chunks_to_process = 5
Next, we initialize our AI model with a temperature. The higher the temperature, the more random and creative the answer will be. So, if your summary is related to facts, don’t set the temperature too high. However, to determine what suits best, I would recommend experimentation and comparison.
llm = OpenAI(temperature=0.7) #Intialized with randomness of 0.7
Now that we have initialized our LLM, we’ll make a prompt template where we give instructions to the LLM on how we want our summary to be.
You can customize it in your own way or copy my template. This template will be used for our first combination of chunks. Later, we’ll make another template that summarizes the list of summaries that we created using LLM, i.e., AI, marking the completion of LangChain YouTube summarizer our today’s LangChain Use case.
# Define the prompt template for the LLM
prompt_template = """
Please provide a summarized but comprehensive response based on the following transcript of a youtube video:
{input}
"""
prompt = PromptTemplate(template=prompt_template, input_variables=["input"])
# Initialize the chain
chain = LLMChain(llm=llm, prompt=prompt)
Here we have made a prompt template. The variable input is a placeholder that will be replaced in each iteration with the chunks of our transcript that are saved in docs.
Let’s initialize the loop that is responsible of requesting a response from the LLM.
# Store the responses
responses = []
#Loop Begins here to fill above list of responses
for i in range(min(num_chunks_to_process, len(docs))):
# Pass the input as a dictionary
response = chain.invoke(input={"input": docs[i].page_content})
# Ensure the response is a string
if isinstance(response, dict):
response = response.get("text", "")
responses.append(response)
# Sleep for a short duration to avoid hitting the rate limit
time.sleep(0.5) # Sleep for 0.5 second
The input written as a key in chain.invoke function is the same as the input in the prompt template. For every 5 chunks, it will fetch a response for the prompt template that asks for a summary, and if it’s a dictionary, it will save only the key with text in the empty list of responses.
We are ensuring our received responses are in strings before saving them into the empty list of responses because we have to send them back to the LLM after joining them together to get an overall summary.
The above loop will keep repeating until len(docs) The entire length of docs is executed; in our case, it’s 20.
So now that we have gathered list summaries for each of our group 5 chunks, we just need to send them again to the LLM, so let’s complete our LangChain YouTube Summarizer to end our LangChain use case.
Getting Overall Summary: Summary of Summaries
So again lets start with making another prompt template but this time for summary of summaries.
summary_prompt_template = """
Please provide a short 500 words summary of the following responses:
{responses}
"""
summary_prompt = PromptTemplate(template=summary_prompt_template, input_variables=["responses"])
# This summary Chain will be used to invoke
summary_chain = LLMChain(llm=llm, prompt=summary_prompt)
Now lets join our responses as strings and get the overall summary.
# Ensure that the responses are strings before joining
responses = [str(r) for r in responses]
summary = summary_chain.invoke(input={"responses": "\n\n".join(responses)}) # Join with double newline
print(summary)
Output:
{'responses': 'The video explores the concept of what it means to be human in a world where artificial intelligence and robots are becoming more advanced. For centuries, humans were seen as the ultimate creation, capable of creating art, communicating, playing games, and completing tasks such as cleaning. However,........
...... The speaker also raises thought-provoking questions about the role of technology in our lives and the unpredictability of its development. The video also highlights the ongoing debate around the definition of art and the impact of technology on traditional concepts of creativity. Overall, it encourages viewers to consider the implications of a world where machines continue to advance and blur the lines between what it means to be human and what it means to be a machine.'}
The above output is trimmed output of entire summary although it was way shorter than the original transcript but still to improve your reading experience I have saved it in a separate document.
Conclusion
Hurray! You have made your own personal LangChain YouTube Summarizer that summarizes the video provided as a link. I hope you have learned a lot by completing this LangChain use case.
This blog was a part of our LangChain learning series. To read more on LangChain visit our ML guide page
If you have any feedback or suggestions feel free to reach out.