Llama.cpp for FULL LOCAL Semantic Router
Generative AI and LLMs
Using fully local semantic router for agentic AI with llama.cpp LLM and HuggingFace embedding models. There are many reasons we might decide to use local LLMs rather than use a third-party service like OpenAI. It could be cost, privacy, compliance, or fear of the OpenAI apocalypse. To help you out, we made Semantic Router fully local with local LLMs available via llama.cpp like Mistral 7B. Using llama.cpp also enables the use of quantized GGUF models, reducing the memory footprint of deployed models and allowing even 13-billion parameter models to run with hardware acceleration on an Apple M1 Pro chip. We also use LLM grammars to enable high output reliability even from the smallest of models. In this video, we'll use HuggingFace's MiniLM encoder, and llama.cpp's Mistral-7B-instruct GGUF quantized. ā GitHub Repo: https://github.com/aurelio-labs/semantic-router/ š Code: https://github.com/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb š„ Semantic Router Course: https://www.aurelio.ai/course/semantic-router šš¼ AI Consulting: https://aurelio.ai š¾ Discord: https://discord.gg/c5QtDB9RAP Twitter: https://twitter.com/jamescalam LinkedIn: https://www.linkedin.com/in/jamescalam/

About the course
The Generative AI and Large Language Models (LLMs) course covers everything you need to know about: - Generative AI - Large Language Models (LLMs) - OpenAI, Cohere, Hugging Face - Managed vs. Open Source - LLM Libraries like LangChain and GPT Index - Long-term memory and retrieval-augmentation And more to come...
Lessons
- Prompt Engineering with OpenAI's GPT-3 and other LLMs
- Prompt Engineering with OpenAI's GPT-3 and other LLMs
- Getting Started with GPT-3 vs. Open Source LLMs - LangChain #1
- Getting Started with GPT-3 vs. Open Source LLMs - LangChain #1
- Prompt Templates for GPT 3.5 and other LLMs - LangChain #2
- Prompt Templates for GPT 3.5 and other LLMs - LangChain #2
- Generative AI and Long-Term Memory for LLMs (OpenAI, Cohere, OS, Pinecone)
- Generative AI and Long-Term Memory for LLMs (OpenAI, Cohere, OS, Pinecone)
- OpenAI's New GPT 3.5 Embedding Model for Semantic Search
- OpenAI's New GPT 3.5 Embedding Model for Semantic Search
- Cohere AI's LLM for Semantic Search in Python
- Cohere AI's LLM for Semantic Search in Python
- Generative Question-Answering with OpenAI's GPT-3.5 and Davinci
- Generative Question-Answering with OpenAI's GPT-3.5 and Davinci
- Open Source Generative AI in Question-Answering (NLP) using Python
- Open Source Generative AI in Question-Answering (NLP) using Python
- GPT 4: Hands on with the API
- GPT 4: Hands on with the API
- GPT 4: Superpower results with search
- GPT 4: Superpower results with search
- ChatGPT Plugins: Build Your Own in Python!
- ChatGPT Plugins: Build Your Own in Python!
- NEW Hugging Face Agents ā First Look
- NEW Hugging Face Agents ā First Look
- Using NEW MPT-7B in Hugging Face and LangChain
- Using NEW MPT-7B in Hugging Face and LangChain
- Hugging Face Agents ā Building Custom Tools
- Hugging Face Agents ā Building Custom Tools
- Llama Index 101 with Vector DBs and GPT 3.5
- Llama Index 101 with Vector DBs and GPT 3.5
- Open LLaMa in LangChain and Hugging Face
- Open LLaMa in LangChain and Hugging Face
- NEW GPT-4 Function Calling Model!
- NEW GPT-4 Function Calling Model!
- Building Chatbot Agents from Scratch with OpenAI Functions!
- Building Chatbot Agents from Scratch with OpenAI Functions!
- MPT-30B Chatbot with LangChain!
- MPT-30B Chatbot with LangChain!
- BEST Open Source LLM ā Falcon 40B Chatbot in LangChain
- BEST Open Source LLM ā Falcon 40B Chatbot in LangChain
- Llama 2 in LangChain ā FIRST Open Source Conversational Agent!
- Llama 2 in LangChain ā FIRST Open Source Conversational Agent!
- Hugging Face LLMs with SageMaker + RAG with Pinecone
- Hugging Face LLMs with SageMaker + RAG with Pinecone
- How to Make RAG Chatbots FAST
- How to Make RAG Chatbots FAST
- NEW AI Framework - Steerable Chatbots with Semantic Router
- NEW AI Framework - Steerable Chatbots with Semantic Router
- Llama.cpp for FULL LOCAL Semantic Router
- Llama.cpp for FULL LOCAL Semantic Router
- OpenAI's NEW 256-d Embeddings vs. Ada 002
- OpenAI's NEW 256-d Embeddings vs. Ada 002
- OpenAI's Sora: Incredible AI Generated Video
- OpenAI's Sora: Incredible AI Generated Video