How to run dozens of AI models on your Mac or PC - no third-party cloud needed - technology-news.space

cofotoisme/Getty Images

With the rapid advancements in artificial intelligence (AI), running sophisticated models like Meta’s Llama 3.1 locally on personal computers is becoming increasingly popular. Running an LLM on your local PC or Mac provides a sandbox for experimentation and development without compromising data privacy and allows for more flexibility in model usage.

Also: Why the future must be BYO AI: Model lock-in deters users and stifles innovation

Here is a quick guide to help you set up and run Llama 3.1 — as well as many other models such as Google Gemma2 — on Mac, Linux, and Windows. I’ll also discuss the benefits of privately hosted models.

Why develop and test against different open-source models?

untitled — Llama 3.1 8b running on Ollama/Open WebUI
Jason Perlow/ZDNET

Developing and testing against various open source models you privately host and run offers several advantages over relying solely on publicly hosted large language models (LLMs) from providers like OpenAI, Microsoft CoPilot, Meta AI, and Google Gemini.

Data privacy: Publicly hosted LLMs require sending data over the internet, which can raise privacy and security concerns. Running models locally ensures that sensitive data remains on your own hardware.

Customization: Open-source models allow for greater customization. Developers can fine-tune models, adjust hyperparameters, and modify the architecture to suit specific use cases better.

Cost control: Cloud-based AI services can be costly, especially for large-scale applications. Hosting models locally can significantly reduce ongoing API usage and data transfer expenses.

–>

Offline capability: Local models can be used without an internet connection, which is essential for applications requiring high availability or in areas with unreliable internet access.

Flexibility and experimentation: Hosting your own models enables you to experiment with different algorithms and configurations, leading to innovative solutions and a deeper understanding of AI technologies.

Freedom from usage policies: Running LLMs locally means the usage policies of companies like OpenAI, Microsoft, Meta, and Google do not restrict you. You can use whatever prompts you want and employ modified LLMs with lifted restrictions, trained on data that these services might restrict.

Also: The best AI chatbots: ChatGPT, Copilot, and worthy alternatives

Introduction to Ollama

Ollama is a versatile and MIT-licensed open-source platform designed to help developers and researchers easily run and manage machine learning models locally on their own hardware. It was developed by a team of AI enthusiasts and engineers who aim to provide tools that ensure data privacy, flexibility, and control over AI applications. Ollama supports various AI models, making it a valuable resource for those looking to explore and utilize AI technologies without relying on third-party cloud services.

Here are some example models that can be downloaded:

Model	Parameters	Size	Download
Llama 3.1	8B	4.7GB	ollama run llama3.1
Llama 3.1	70B	40GB	ollama run llama3.1:70b
Llama 3.1	405B	231GB	ollama run llama3.1:405b
Phi 3 Mini	3.8B	2.3GB	ollama run phi3
Phi 3 Medium	14B	7.9GB	ollama run phi3:medium
Gemma 2	2B	1.6GB	ollama run gemma2:2b
Gemma 2	9B	5.5GB	ollama run gemma2
Gemma 2	27B	16GB	ollama run gemma2:27b
Mistral	7B	4.1GB	ollama run mistral
Moondream 2	1.4B	829MB	ollama run moondream
Neural Chat	7B	4.1GB	ollama run neural-chat
Starling	7B	4.1GB	ollama run starling-lm
Code Llama	7B	3.8GB	ollama run codellama
Llama 2 Uncensored	7B	3.8GB	ollama run llama2-uncensored
LLaVA	7B	4.5GB	ollama run llava
Solar	10.7B	6.1GB	ollama run solar

Per Ollama’s GitHub page, you should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

Our test systems

I tested Ollama using M1 Pro and M1 Ultra Macs with 32GB and 64GB of RAM, which are a few generations behind current MacBook Pro models<!–>. Despite this, using CPU-only assistance, we successfully ran 8B-10B parameter models of Meta’s Llama 3.1 and Google’s Gemma2, as well as various specifically trained variants from Ollama’s website, with better-than-acceptable performance.

Also: I broke Meta’s Llama 3.1 405B with one question (which GPT-4o gets right)

However, I experienced significant performance issues with the 70B parameter variant using these systems. I’m confident that more recent hardware can handle these models even more efficiently, especially with Linux PCs enabled by Nvidia and AMD GPUs.

Step-by-step setup

Download and install Ollama

Go to Ollama’s download page and download the installer suitable for your operating system (MacOS, Linux, Windows).
Follow the provided installation instructions for your specific operating system.

Load the 8B parameter Llama 3.1 Model

–>

The Ollama command line interface with chat functionality.

Screenshot by Jason Perlow/ZDNET

Go to the Llama 3.1 library page on Ollama and copy the command for loading the 8B Llama 3.1 model: ollama run llama3.1:8b
Open a terminal (MacOS, Linux) or Command Prompt/PowerShell (Windows), paste the above command, and hit .
This command will start running Llama 3.1. In the terminal, you can then issue chat queries to the model to test its functionality.

Manage installed models

List models: Use the command ollama list to see all models installed on your system.
Remove models: To remove a model, use the command ollama rm . For example, to remove the 8B parameter Llama 3.1, you would use ollama rm llama3.1:8b
Add new models: To add a new model, browse the Ollama library and then use the appropriate ollama run command to load it into your system.

Also: 3 ways Meta’s Llama 3.1 is an advance for Gen AI

Adding a WebUI

Install Docker Desktop

Visit Docker’s Get Started page and download Docker Desktop<!–> for your operating system (MacOS, Linux, Windows).
Follow the installation instructions for your specific operating system, and start Docker after installation.

Install Open WebUI

Open a terminal (MacOS, Linux) or Command Prompt/PowerShell (Windows) and run the following command to install Open WebUI:
docker run -d -p 3000:8080 -add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data -name open-webui -restart always ghcr.io/open-webui/open-webui:main

Access the Open WebUI

untitled-3 — Open WebUI running on Docker Desktop
Screenshot by Jason Perlow/ZDNET

Source: Robotics - zdnet.com

How to run dozens of AI models on your Mac or PC – no third-party cloud needed

Why develop and test against different open-source models?

Introduction to Ollama

Our test systems

Step-by-step setup

Download and install Ollama

Load the 8B parameter Llama 3.1 Model

Manage installed models

Adding a WebUI

Install Docker Desktop

Install Open WebUI

Access the Open WebUI

Create and log in to your Open WebUI account

Integration with IDEs and APIs

Using Continue for IDE integration

Scaling up with powerful GPUs

For NVIDIA GPUs:

For AMD GPUs:

Running Ollama in a Docker container

For NVIDIA GPUs with Docker

For AMD GPUs with Docker

Conclusion

Artificial Intelligence

TCL just unveiled its own version of Samsung’s Frame – and it’s just 1.1 inches thick

The AI scams infiltrating the knitting and crochet world – and how to spot them

ITALIAN LANGUAGE

ENGLISH LANGUAGE