Ollama language model

Creating a AI Assistant in Langchain | Part 1

Sunday, May 5, 2024

The more I work in Langchain, the more I discover its strength for creating powerful, custom Large Language Model interactive AI powered chatbots. For example, right now I am using it to create a custom chatbot that is trained on our proprietary internal documentation, reports, and log files. This uses something called RAG (Retrieval Augmented Generation) to access up to date information. Our developers, technicians, and managers, can then all access the same information in a simple to use chat-like interface.

Let’s dig in to what I did to create this. For now, I’m going to focus on Langchain’s base case, and in future tutorials, I’ll discuss building a more robust “personality” to the AI assistant, and also how to encode your custom data in what’s called a ‘vector database.’

LangChain is a powerful framework designed to simplify and enhance the development of language-based applications, leveraging Large Language Models (LLMs). By breaking down each section of our example code, we’ll uncover how each component contributes to the chatbot’s functionality. We will also discuss the immense capabilities of LangChain and hint at exciting future developments in subsequent parts of this series, including training it on our own data.

Here’s the first half of our code before we do a full breakdown. Also, you can access the code on my github repository. Keep in mind that the github is under active development, so that code will likely differ a bit from what you are going to read below.

Building the Large Language Model (LLM)

llm = Ollama(

This first section is where we create the large language model llm itself. Let’s go step by step. I’m using Ollama on my local machine (an M3 Macbook Pro) in order to keep the LLM local and avoid using any cloud service APIs. I’ve been playing around with Mistral’s model, “mixtral:8x7b” but this code works equally as well with llama3.

LLM Initialization: We initiate an instance of Ollama, LangChain’s module for interacting with large language models. This setup uses the mixtral:8x7b model, a powerful AI capable of understanding and generating human-like text.

You can read more about Mistral’s model at their Hugging Face page for the mixtral:8x7b transformer. I chose this model due to its high performace and accuracy, combined with the fact that it Mistral published it under an Apache license.

Callback Manager: It is equipped with a CallbackManager that includes a StreamingStdOutCallbackHandler. This setup is crucial for real-time output streaming, enabling live feedback during model interaction which is essential for debugging and user experience enhancements. Without this, the transformer will wait until the entire response is generated, then display it. With Streaming, you see each token being displayed as it is generated. This also gives it a more real time feel.

Stopping Conditions: The stop parameter defines a list of tokens that signal the model to cease generating further text. This control mechanism prevents the model from producing unwanted or excessive output, ensuring concise and relevant responses. These specific tokens are required for llama3’s model since Mistral’s model handles stop tokens very well. But I’m finding it good practice to keep in.

Building the Prompt Template

template = """
    You are a helpful assistant. Answer the following questions
    considering the history of the conversation:
    Chat history: {chat_history}
    User question: {user_question}
prompt = ChatPromptTemplate.from_template(template)

The “prompt” in langchain is our way to describe the interactions with the language model. For example, we can explain the chatbot’s personality, we can explain context, feed it a history of previous messages, and so on.

Template Creation: This block defines a template string that outlines the role of the AI within the interaction – that of a helpful assistant. The template incorporates placeholders ({chat_history} and {user_question}) that dynamically insert the conversation’s context and the current user query into the prompt. This method ensures that each response is informed by the previous dialogue, maintaining continuity and relevance.

Prompt Initialization: Using ChatPromptTemplate.from_template, the string template is converted into a structured format that LangChain can utilize to generate responses effectively. This encapsulation ties the narrative setup directly to the LLM’s operational logic. This way we can feed it right into our chain later on.

Building Our LLM Response

def get_response(user_query, chat_history):
    chain = prompt | llm | StrOutputParser()
    return chain.stream({
        "chat_history": chat_history,
        "user_question": user_query,

Here we have our python function that will feed the user’s question into the transformer and it returns the generated response.

Function Definition: The get_response function is where the chatbot’s functionality materializes. It accepts a user query and the current chat history as inputs.

Chain Configuration: Inside the function, a chain of operations is defined. The prompt feeds into the llm, which in turn passes its output through a StrOutputParser. This chaining ensures a flow of data through the model and back to the user, with each component processing and passing data seamlessly.

Streaming the Response: The chain.stream method activates the chain with specific inputs, streaming back the generated response. This method is critical for live interaction, enabling the chatbot to operate in a real-time conversational context.

Giving Our Chatbot a Memory

Our Chatbot’s History (aka. Memory)

# Session State
if "chat_history" not in st.session_state:
    st.session_state.chat_history = [
        AIMessage(content="Hello, I'm Nikki. How can I help you?"),

# Conversation
for message in st.session_state.chat_history:
    if isinstance(message, AIMessage):
        with st.chat_message("AI"):
    elif isinstance(message, HumanMessage):
        with st.chat_message("Human"):

Session State: Streamlit is our UI that we will be using. I’m not going to go into that right now, but I’ve written about it before and will be writing more about Streamlit itself in the future as it is a great framework for building the UI for our chatbot. Streamlit uses session_state to maintain a stateful interaction across user sessions. This state is essential for a chatbot to remember the conversation history as the user interacts with the page.

Initialization: This block checks if chat_history is already defined in the session. If not, it initializes it with a greeting from the AI assistant, “Nikki”. This ensures that every new session starts with a welcome message, setting the tone for the interaction.

Iterating Over Messages: The code iterates through each message stored in chat_history. This will then be added to our history.

Message Type Checking: It uses isinstance to determine the type of each message (AIMessage or HumanMessage), which dictates how the message is displayed.

Display Context: Depending on the message type, it uses st.chat_message with either “AI” or “Human” as the sender. This contextualizes who is speaking in the chat interface, enhancing the chatbot’s usability and realism.

user_query = st.chat_input("Type your message here…")

if user_query is not None and user_query != "":
    with st.chat_message("Human"):

User Query: Captures text input from the user through st.chat_input.

Input Validation: Checks if the input is not null or empty, ensuring only meaningful interactions proceed.

Update Chat History: Adds the user’s message to chat_history using HumanMessage, keeping a record for future interactions and response generation.

Display User Message: Immediately shows the user’s message in the chat interface under “Human” to reflect the conversation flow.

Display the Chat in the UI

with st.chat_message("AI"):
    response = st.write_stream(get_response(user_query,     


Finally, we are ready to display our generated text.

Response Generation: Calls get_response, a function presumably linked to LangChain’s model, passing the latest user query and the entire chat history. This function processes the input to generate a contextually aware response.

Display AI Response: The response from the AI is streamed and displayed in real-time under the “AI” sender.

Update Chat History: Adds the AI’s response to the chat_history, ensuring that each part of the conversation is recorded for continuity.

The Power of Langchain

As I said above, I’m really enjoying the power of LangChain and how it facilitates the integration of advanced AI models into applications like chatbots by providing tools that streamline the handling of natural language processing tasks. It supports developers in creating more engaging and intelligent conversational agents.

Future Expansion

This approach not only demonstrates the implementation of an interactive AI chatbot but also highlights how developers can leverage LangChain for sophisticated language-based applications. The upcoming posts will delve deeper into customizing and augmenting this foundational work.

Ciao! I'm Scott Sullivan, an software developer and machine learning nerd. I divide my time between the tranquil countryside of Lancaster, Pennsylvania, and northern Italy, visiting family, close to Cinque Terre and La Spezia. Professionally, I'm using my Master's in Data Analytics and my Bachelor's degree in Computer Science, to create compelling software products that user AI, run lighting, robots, and automation effects for a large Christian theatrical productions to spread the message of Christ's salvation.