By Rachel Willmer in AI — 07 Nov 2025

Setup Ollama for running self-hosted LLMs on macOS

I needed a self-hosted LLM to run on my M4 Mac. This is how I did it.

Purpose
Stack
Why that stack?
Installation
Running Ollama as Background Service
- Option 1: Using Brew Services (Recommended)
- Option 2: Manual LaunchAgent (Alternative)
My Hardware Specifications
Glossary and Links

Purpose

My new project requires a self-hosted LLM server to run in an offline or air-gapped environment.

I have set up a development environment on my M4 Mac. This is how I did it, and why.

Stack

Ollama - A tool for running open-source LLMs locally via a simple API server.
GPT-OSS 20B - A 20B parameter open-source LLM.

Why that stack?

I chose Ollama because:

It has good M4 GPU support
It was easy to install via brew
Simple CLI/API interface

I chose GPT-OSS 20B because:

It is a 20B parameter LLM
It is a good balance of performance and cost
It is designed for reasoning, agentic tasks, and developer use cases
US developer (OpenAI)
Open source (Apache 2.0 license)
Relatively new (August 2025)

Installation

1. Install Ollama

# Install via Homebrew
brew install ollama

# Verify installation
ollama --version

2. Pull GPT-OSS Model

# Pull the 20B parameter model
ollama pull gpt-oss:20b

# This will download ~12GB, takes 5-15 minutes depending on connection

# Optional: Pull the larger 120B model if you have more RAM
# ollama pull gpt-oss:120b

3. Configure Ollama to Listen on All Interfaces (Optional)

I need to access Ollama from Kubernetes pods, so I need to configure it to listen on all interfaces.

# Set environment variable
export OLLAMA_HOST=0.0.0.0:11434

# Add to shell profile for persistence
echo 'export OLLAMA_HOST=0.0.0.0:11434' >> ~/.zshrc

4. Start Ollama Service

# Start Ollama server
ollama serve

This runs in the foreground. For background service, see LaunchAgent setup below.

5. Test the Model

# In a new terminal, test the model
ollama run gpt-oss:20b "Write a Python function to parse GitLab webhook payloads"

# Test chain-of-thought reasoning
ollama run gpt-oss:20b "Explain step-by-step how to implement a GitLab webhook parser"

Running Ollama as Background Service

Option 1: Using Brew Services (Recommended)

Easiest method - Homebrew manages the LaunchAgent for you:

# Start Ollama service (starts now and on boot)
brew services start ollama

# Verify it's running
brew services list | grep ollama
curl http://localhost:11434/api/tags

Configure to listen on all interfaces (optional):

# Set environment variable for brew service
# Create or edit Homebrew's service plist
mkdir -p ~/Library/LaunchAgents

# Homebrew creates the plist, but we need to add OLLAMA_HOST
# Stop the service first
brew services stop ollama

# Edit the plist to add environment variable
# Location: ~/Library/LaunchAgents/homebrew.mxcl.ollama.plist
# Add this inside the <dict> tag:
cat >> ~/Library/LaunchAgents/homebrew.mxcl.ollama.plist <<'EOF'
    <key>EnvironmentVariables</key>
    <dict>
        <key>OLLAMA_HOST</key>
        <string>0.0.0.0:11434</string>
    </dict>
EOF

# Restart the service
brew services restart ollama

Manage the service:

# Start service
brew services start ollama

# Stop service
brew services stop ollama

# Restart service
brew services restart ollama

# Check status
brew services list

# View logs
tail -f ~/Library/Logs/homebrew.mxcl.ollama.log
tail -f ~/Library/Logs/homebrew.mxcl.ollama.err.log

Option 2: Manual LaunchAgent (Alternative)

If you need more control over the LaunchAgent configuration:

# Create custom LaunchAgent plist
cat > ~/Library/LaunchAgents/com.ollama.server.plist <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.ollama.server</string>
    <key>ProgramArguments</key>
    <array>
        <string>/opt/homebrew/bin/ollama</string>
        <string>serve</string>
    </array>
    <key>EnvironmentVariables</key>
    <dict>
        <key>OLLAMA_HOST</key>
        <string>0.0.0.0:11434</string>
    </dict>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>/tmp/ollama.log</string>
    <key>StandardErrorPath</key>
    <string>/tmp/ollama.error.log</string>
</dict>
</plist>
EOF

# Load the LaunchAgent
launchctl load ~/Library/LaunchAgents/com.ollama.server.plist

# Start the service
launchctl start com.ollama.server

# Verify it's running
curl http://localhost:11434/api/tags

Manage manual LaunchAgent:

# Stop service
launchctl stop com.ollama.server

# Restart service
launchctl stop com.ollama.server
launchctl start com.ollama.server

# Unload service
launchctl unload ~/Library/LaunchAgents/com.ollama.server.plist

# View logs
tail -f /tmp/ollama.log
tail -f /tmp/ollama.error.log

My Hardware Specifications

Machine: Apple M4 Mac
RAM: 32GB unified memory
GPU: M4 Metal GPU
Cores: 10
Architecture: ARM64

Glossary and Links

Ollama: https://ollama.com/

A tool for running open-source LLMs locally via a simple API server.

It provides a simple interface to download, manage and interact with various open-source models like GPT-OSS, Llama, Mistral, and others.
GPT-OSS 20B: https://github.com/openai/gpt-oss

A 20B parameter open-source LLM.
OpenAI announcement for GPT-OSS: https://openai.com/index/introducing-gpt-oss/

Table of Contents