With the rapid advancements in artificial intelligence (AI), running sophisticated models like Meta’s Llama 3.1 locally on personal computers is becoming increasingly popular. Running an LLM on your local PC or Mac provides a sandbox for experimentation and development without compromising data privacy and allows for more flexibility in model usage.
Also: Why the future must be BYO AI: Model lock-in deters users and stifles innovation
Here is a quick guide to help you set up and run Llama 3.1 — as well as many other models such as Google Gemma2 — on Mac, Linux, and Windows. I’ll also discuss the benefits of privately hosted models.
Why develop and test against different open-source models?
Developing and testing against various open source models you privately host and run offers several advantages over relying solely on publicly hosted large language models (LLMs) from providers like OpenAI, Microsoft CoPilot, Meta AI, and Google Gemini.
Data privacy: Publicly hosted LLMs require sending data over the internet, which can raise privacy and security concerns. Running models locally ensures that sensitive data remains on your own hardware.
Customization: Open-source models allow for greater customization. Developers can fine-tune models, adjust hyperparameters, and modify the architecture to suit specific use cases better.
Cost control: Cloud-based AI services can be costly, especially for large-scale applications. Hosting models locally can significantly reduce ongoing API usage and data transfer expenses.
–>
Offline capability: Local models can be used without an internet connection, which is essential for applications requiring high availability or in areas with unreliable internet access.
Flexibility and experimentation: Hosting your own models enables you to experiment with different algorithms and configurations, leading to innovative solutions and a deeper understanding of AI technologies.
Freedom from usage policies: Running LLMs locally means the usage policies of companies like OpenAI, Microsoft, Meta, and Google do not restrict you. You can use whatever prompts you want and employ modified LLMs with lifted restrictions, trained on data that these services might restrict.
Also: The best AI chatbots: ChatGPT, Copilot, and worthy alternatives
Introduction to Ollama
Ollama is a versatile and MIT-licensed open-source platform designed to help developers and researchers easily run and manage machine learning models locally on their own hardware. It was developed by a team of AI enthusiasts and engineers who aim to provide tools that ensure data privacy, flexibility, and control over AI applications. Ollama supports various AI models, making it a valuable resource for those looking to explore and utilize AI technologies without relying on third-party cloud services.
Here are some example models that can be downloaded:
Model | Parameters | Size | Download |
---|---|---|---|
Llama 3.1 | 8B | 4.7GB | ollama run llama3.1 |
Llama 3.1 | 70B | 40GB | ollama run llama3.1:70b |
Llama 3.1 | 405B | 231GB | ollama run llama3.1:405b |
Phi 3 Mini | 3.8B | 2.3GB | ollama run phi3 |
Phi 3 Medium | 14B | 7.9GB | ollama run phi3:medium |
Gemma 2 | 2B | 1.6GB | ollama run gemma2:2b |
Gemma 2 | 9B | 5.5GB | ollama run gemma2 |
Gemma 2 | 27B | 16GB | ollama run gemma2:27b |
Mistral | 7B | 4.1GB | ollama run mistral |
Moondream 2 | 1.4B | 829MB | ollama run moondream |
Neural Chat | 7B | 4.1GB | ollama run neural-chat |
Starling | 7B | 4.1GB | ollama run starling-lm |
Code Llama | 7B | 3.8GB | ollama run codellama |
Llama 2 Uncensored | 7B | 3.8GB | ollama run llama2-uncensored |
LLaVA | 7B | 4.5GB | ollama run llava |
Solar | 10.7B | 6.1GB | ollama run solar |
Per Ollama’s GitHub page, you should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
Our test systems
I tested Ollama using M1 Pro and M1 Ultra Macs with 32GB and 64GB of RAM, which are a few generations behind current MacBook Pro models<!–>. Despite this, using CPU-only assistance, we successfully ran 8B-10B parameter models of Meta’s Llama 3.1 and Google’s Gemma2, as well as various specifically trained variants from Ollama’s website, with better-than-acceptable performance.
Also: I broke Meta’s Llama 3.1 405B with one question (which GPT-4o gets right)
However, I experienced significant performance issues with the 70B parameter variant using these systems. I’m confident that more recent hardware can handle these models even more efficiently, especially with Linux PCs enabled by Nvidia and AMD GPUs.
Step-by-step setup
Download and install Ollama
- Go to Ollama’s download page and download the installer suitable for your operating system (MacOS, Linux, Windows).
- Follow the provided installation instructions for your specific operating system.
Load the 8B parameter Llama 3.1 Model
- Go to the Llama 3.1 library page on Ollama and copy the command for loading the 8B Llama 3.1 model: ollama run llama3.1:8b
- Open a terminal (MacOS, Linux) or Command Prompt/PowerShell (Windows), paste the above command, and hit .
- This command will start running Llama 3.1. In the terminal, you can then issue chat queries to the model to test its functionality.
Manage installed models
- List models: Use the command ollama list to see all models installed on your system.
- Remove models: To remove a model, use the command ollama rm . For example, to remove the 8B parameter Llama 3.1, you would use ollama rm llama3.1:8b
- Add new models: To add a new model, browse the Ollama library and then use the appropriate ollama run command to load it into your system.
Also: 3 ways Meta’s Llama 3.1 is an advance for Gen AI
Adding a WebUI
Install Docker Desktop
- Visit Docker’s Get Started page and download Docker Desktop<!–> for your operating system (MacOS, Linux, Windows).
- Follow the installation instructions for your specific operating system, and start Docker after installation.
Install Open WebUI
Open a terminal (MacOS, Linux) or Command Prompt/PowerShell (Windows) and run the following command to install Open WebUI:
docker run -d -p 3000:8080 -add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data -name open-webui -restart always ghcr.io/open-webui/open-webui:main