Setup Ollama for running self-hosted LLMs on macOS
I needed a self-hosted LLM to run on my M4 Mac. This is how I did it.
Table of Contents
- Purpose
- Stack
- Why that stack?
- Installation
- Running Ollama as Background Service
- My Hardware Specifications
- Glossary and Links
Purpose
My new project requires a self-hosted LLM server to run in an offline or air-gapped environment.
I have set up a development environment on my M4 Mac. This is how I did it, and why.
Stack
- Ollama - A tool for running open-source LLMs locally via a simple API server.
- GPT-OSS 20B - A 20B parameter open-source LLM.
Why that stack?
I chose Ollama because:
- It has good M4 GPU support
- It was easy to install via brew
- Simple CLI/API interface
I chose GPT-OSS 20B because:
- It is a 20B parameter LLM
- It is a good balance of performance and cost
- It is designed for reasoning, agentic tasks, and developer use cases
- US developer (OpenAI)
- Open source (Apache 2.0 license)
- Relatively new (August 2025)
Installation
1. Install Ollama
# Install via Homebrew
brew install ollama
# Verify installation
ollama --version
2. Pull GPT-OSS Model
# Pull the 20B parameter model
ollama pull gpt-oss:20b
# This will download ~12GB, takes 5-15 minutes depending on connection
# Optional: Pull the larger 120B model if you have more RAM
# ollama pull gpt-oss:120b
3. Configure Ollama to Listen on All Interfaces (Optional)
I need to access Ollama from Kubernetes pods, so I need to configure it to listen on all interfaces.
# Set environment variable
export OLLAMA_HOST=0.0.0.0:11434
# Add to shell profile for persistence
echo 'export OLLAMA_HOST=0.0.0.0:11434' >> ~/.zshrc
4. Start Ollama Service
# Start Ollama server
ollama serve
This runs in the foreground. For background service, see LaunchAgent setup below.
5. Test the Model
# In a new terminal, test the model
ollama run gpt-oss:20b "Write a Python function to parse GitLab webhook payloads"
# Test chain-of-thought reasoning
ollama run gpt-oss:20b "Explain step-by-step how to implement a GitLab webhook parser"
Running Ollama as Background Service
Option 1: Using Brew Services (Recommended)
Easiest method - Homebrew manages the LaunchAgent for you:
# Start Ollama service (starts now and on boot)
brew services start ollama
# Verify it's running
brew services list | grep ollama
curl http://localhost:11434/api/tags
Configure to listen on all interfaces (optional):
# Set environment variable for brew service
# Create or edit Homebrew's service plist
mkdir -p ~/Library/LaunchAgents
# Homebrew creates the plist, but we need to add OLLAMA_HOST
# Stop the service first
brew services stop ollama
# Edit the plist to add environment variable
# Location: ~/Library/LaunchAgents/homebrew.mxcl.ollama.plist
# Add this inside the <dict> tag:
cat >> ~/Library/LaunchAgents/homebrew.mxcl.ollama.plist <<'EOF'
<key>EnvironmentVariables</key>
<dict>
<key>OLLAMA_HOST</key>
<string>0.0.0.0:11434</string>
</dict>
EOF
# Restart the service
brew services restart ollama
Manage the service:
# Start service
brew services start ollama
# Stop service
brew services stop ollama
# Restart service
brew services restart ollama
# Check status
brew services list
# View logs
tail -f ~/Library/Logs/homebrew.mxcl.ollama.log
tail -f ~/Library/Logs/homebrew.mxcl.ollama.err.log
Option 2: Manual LaunchAgent (Alternative)
If you need more control over the LaunchAgent configuration:
# Create custom LaunchAgent plist
cat > ~/Library/LaunchAgents/com.ollama.server.plist <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.ollama.server</string>
<key>ProgramArguments</key>
<array>
<string>/opt/homebrew/bin/ollama</string>
<string>serve</string>
</array>
<key>EnvironmentVariables</key>
<dict>
<key>OLLAMA_HOST</key>
<string>0.0.0.0:11434</string>
</dict>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/ollama.log</string>
<key>StandardErrorPath</key>
<string>/tmp/ollama.error.log</string>
</dict>
</plist>
EOF
# Load the LaunchAgent
launchctl load ~/Library/LaunchAgents/com.ollama.server.plist
# Start the service
launchctl start com.ollama.server
# Verify it's running
curl http://localhost:11434/api/tags
Manage manual LaunchAgent:
# Stop service
launchctl stop com.ollama.server
# Restart service
launchctl stop com.ollama.server
launchctl start com.ollama.server
# Unload service
launchctl unload ~/Library/LaunchAgents/com.ollama.server.plist
# View logs
tail -f /tmp/ollama.log
tail -f /tmp/ollama.error.log
My Hardware Specifications
- Machine: Apple M4 Mac
- RAM: 32GB unified memory
- GPU: M4 Metal GPU
- Cores: 10
- Architecture: ARM64
Glossary and Links
-
Ollama: https://ollama.com/
A tool for running open-source LLMs locally via a simple API server.
It provides a simple interface to download, manage and interact with various open-source models like GPT-OSS, Llama, Mistral, and others.
-
GPT-OSS 20B: https://github.com/openai/gpt-oss
A 20B parameter open-source LLM.
-
OpenAI announcement for GPT-OSS: https://openai.com/index/introducing-gpt-oss/