Langchain LLM Update

Saturday, April 20, 2024

As many of you know, I have been building a personal AI assistant using Large Language Model transformers. The goal is to have it access documents and internet feeds to take care of different tasks.

This week I have been experimenting with Meta’s newest iteration of their open source transformer, Llama3. I downloaded their 8B model and started some experiments. Initial results show their changes to the EOT (end of token) or stop token, which results in the model continuously answering itself. While this leads to some ammusing replies, it’s a little less than ideal. I need to go back to their document page and update the ‘assistant’ end of token standard (commonly <|eot|>). The Quant GGUF file for Llama (from Quant) works properly, but I’m not as happy with the decoded results that it seems to be giving. I’ll have to continue exploring solutions in Langchain.

I also have been comparing both Mixtral (8x7b and 8x22b) and Qwen (32b) and both have been giving amazing and snappy results on my Macbook M3 Max with 36 gigs of RAM. My results are consistent with other benchmarks and prototyping results that others are getting.

I’ll end by saying, it’s an exciting time to be experimenting with LLMs. There are so many open source models that are hitting very high marks, and fine tuning and building and chaining RAGs onto the base models is becoming significantly more powerful.
#ll #qwen #llama3 #langchain #mixtral #ai




Ciao! I'm Scott Sullivan, a software engineer with a specialty in machine learning. I spend my time in the tranquil countryside of Lancaster, Pennsylvania, and northern Italy, visiting family, close to Cinque Terre and La Spezia. Professionally, I'm using my Master's in Data Analytics and my Bachelor's degree in Computer Science, to turn code into insights with Python, PyTorch and DFE superpowers while on a quest to create AI that's smarter than your average bear.