Streaming for LangChain Agents + FastAPI
Streaming is a valuable tool for LangChain agents when used in conjunction with FastAPI. FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints. It’s easy to use and provides great performance.
With FastAPI, LangChain agents can easily set up streaming endpoints to handle real-time data. Streaming allows the continuous transmission of data over a network, which is crucial for agents who need to process large volumes of information in real time.
Setting up Streaming Endpoints with FastAPI
FastAPI makes it easy to set up streaming endpoints with Python. LangChain agents can use FastAPI to create WebSocket endpoints for real-time communication, as well as HTTP Live Streaming (HLS) for video streaming. This allows for the seamless transmission of data between agents and clients.
To set up a streaming endpoint with FastAPI, LangChain agents can define a route handler that returns an instance of StreamingResponse
. This allows for the continuous transmission of data to the client in a highly optimized manner, without having to buffer the entire response before sending it.
Real-Time Communication
FastAPI’s support for WebSocket endpoints enables LangChain agents to establish real-time communication channels with clients. This allows for efficient and immediate data transmission, making it possible for agents to receive and process data in real time.
By utilizing WebSocket endpoints, LangChain agents can keep clients updated on the latest information, enabling real-time collaboration and communication between all parties involved. This is particularly useful for scenarios where up-to-date data is of the essence, such as in financial markets, real-time analytics, and live events.
Video Streaming with HLS
In addition to real-time communication, FastAPI’s support for HTTP Live Streaming (HLS) allows LangChain agents to set up video streaming endpoints with ease. HLS is a standard for flexible and adaptive streaming over HTTP, and FastAPI provides support for its implementation.
By using HLS with FastAPI, LangChain agents can deliver high-quality, highly optimized video streams to clients. This is especially useful for applications that require seamless video playback, such as video conferencing, live streaming events, and on-demand video services.
Conclusion
Streaming is a crucial tool for LangChain agents, and when combined with FastAPI, it becomes a powerful asset for real-time data processing, communication, and video streaming. With its support for WebSocket endpoints and HLS, FastAPI enables agents to create efficient and effective streaming solutions for a wide range of applications.
For LangChain agents looking to harness the power of streaming with FastAPI, the possibilities are endless, and the potential for innovation is boundless.
How can i get to extract only the LLM Langchain thoughts and stream those.
Good example but the result still very summarised specially when using search engine like serper, so the question here how can control size of response text, because there are same cases need response contain more of details.
Great content. It works with Agent, in my case I need to use AgentExecutor instead of Agent, agent_executor = AgentExecutor(agent=agent_chain, memory=memory, verbose=True, tools=tool_list,return_intermediate_steps=False)
Look like AgentExecutor is not streaming with LCEL. Any ideas?
I am using flask and html. I add this callback and get the streaming response in terminal but not frontend. I also use websocket, SSE client but not succeed.
It seems there's some kind of issue when trying to perform the get_stream("Hi there") the second time. The first time i receive correctly:
{
"action": "Final Answer",
"action_input": "Hello! How can I assist you today?"
}
the second time i just receive:
Hello! How can I assist you today?
and that generates an exception
Langchain really needs to get it together with the streaming structure. This is terrible.
I need that but with pinecone
I have a question -> What encoding should I use to encode Polish characters, because line.decode("utf-8") returns errors for Polsih letters. I tried other encodings but its, not working. Any ideas @jamesbriggs ?
how to set a timeout for the agent, agent sometimes forgets the prompt and stuck there with streaming.
Great video! Do you know why this doesn't work with a GET request? If I send a GET request instead of a Post (of course I adapted the API) it loads the whole message first and then sends it to the client instead of doing it async.
These recommendations were really helpful! I'm excited to watch all the series you shared. Thanks a lot for sharing them with us
Anyone can streaming with any opensource model?
intresting , do you know what other llms or platforms that supports streaming ? like from replicate or clarifai ?
Thank you very much James for this tutorial!
The FastAPI template works perpectly without adding tools. When I add LLMMathChain tool to the agent, the application starts normally but it just stucks at the step "Entering new LLMMathChain chain…"
I have only access to AzureOpenAI. I wonder if it is an AzureOpenAI problem or it is a general problem. Have you tried to add tools for the Agent in the FastAPI code and did it work? I also tested with Zero-shot agent, the result is the same.
> Entering new AgentExecutor chain…
“`json
{ "action": "Calculator",
"action_input": "50 * 48"
}
“`
> Entering new LLMMathChain chain…
Many thanks in advance!
Over the weekend I was working on this exact problem. I couldn’t say which part I was missing, probably different parts at different times. What I can say is how epic it was to start the day with this video, thank you 🙏 legend!
… I don’t suppose you feel like taking a crack at autogen next 😅
In any case thanks again!
This is amazing video where we can get lot of information . Can you make a video on how we connect streaming with llmchain and memory and show the streaming in webpage or html or streamlit?
Thanks for the awesome walk through! Have you had a chance to test this flow with OpenAI Function Calls? I have a similar implementation for streaming, but once the LLM triggers an OpenAI Function Call, it fails to stream – It seems like the iterator just doesn't return any tokens.
the streaming is print in terminal , how we can streaming do with show in the webpage ?
Question – I followed along and ensured the code matches, however I am unable to emulate this streaming behavior when calling the "get_stream" function. It seems to still wait for the chain to complete gathering the text and then printing it all at once. Any pointers on what might have gone wrong?
Excellent video as always! Do you have any idea about how to use the "Professor Synapse" prompt with langchain?🙂