loading
loading
loading
Using fully local semantic router for agentic AI with llama.cpp LLM and HuggingFace embedding models. There are many reasons we might decide to use local LLMs rather than use a third-party service like OpenAI. It could be cost, privacy, compliance, or fear of the OpenAI apocalypse. To help you out, we made Semantic Router fully local with local LLMs available via llama.cpp like Mistral 7B. Using llama.cpp also enables the use of quantized GGUF models, reducing the memory footprint of deployed models and allowing even 13-billion parameter models to run with hardware acceleration on an Apple M1 Pro chip. We also use LLM grammars to enable high output reliability even from the smallest of models. In this video, we'll use HuggingFace's MiniLM encoder, and llama.cpp's Mistral-7B-instruct GGUF quantized. ā GitHub Repo: https://github.com/aurelio-labs/semantic-router/ š Code: https://github.com/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb š„ Semantic Router Course: https://www.aurelio.ai/course/semantic-router šš¼ AI Consulting: https://aurelio.ai š¾ Discord: https://discord.gg/c5QtDB9RAP Twitter: https://twitter.com/jamescalam LinkedIn: https://www.linkedin.com/in/jamescalam/
The Generative AI and Large Language Models (LLMs) course covers everything you need to know about: - Generative AI - Large Language Models (LLMs) - OpenAI, Cohere, Hugging Face - Managed vs. Open Source - LLM Libraries like LangChain and GPT Index - Long-term memory and retrieval-augmentation And more to come...