Ollama Local LLM Guide
11 posts in this series
Getting Started with Ollama: Your First Step to Running LLMs Locally
Want to run large language models on your own machine? This guide walks you through installing and configuring Ollama from scratch, covering multi-platform setup, model management, GPU acceleration, and API integration
Complete Guide to Ollama Model Management: Download, Switch, Delete & Version Control
Master Ollama model management with pull, run, list, rm commands. Learn version selection, batch deletion scripts, disk space optimization. Perfect for AI developers and OpenClaw deployers managing local LLM libraries.
Ollama Modelfile Parameters Explained: A Complete Guide to Creating Custom Models
A detailed guide to Ollama Modelfile's 10 core parameters, including optimization tips for temperature, num_ctx, and more. Includes 4 ready-to-use practical templates to help you create your own custom models.
Ollama API Calls: From curl to OpenAI SDK Compatible Interface
Learn two ways to call Ollama API: native REST API (curl) and OpenAI SDK compatible interface. Includes complete code examples, streaming response handling, and best practices guide
Ollama + Open WebUI: Build Your Own Local ChatGPT Interface (Complete Guide)
Step-by-step guide to setting up a ChatGPT-style AI interface locally with Ollama and Open WebUI. Covers installation, model selection, RAG knowledge base, API integration, and performance tuning. Get your local AI assistant running in 30 minutes.
Ollama Performance Optimization: Complete Guide to Quantization, Batch Processing, and Memory Tuning
A deep dive into Ollama quantization techniques (Q4/Q5/Q8 selection strategies), batch processing num_batch configuration for 50-150% throughput improvement, GPU memory management, and OOM solutions. Includes performance benchmarks across different hardware.
Ollama GPU Scheduling and Resource Management: VRAM Optimization, Multi-GPU Load Balancing
Deep dive into Ollama GPU scheduling and resource management, covering VRAM optimization parameters, multi-GPU load balancing architecture, and llama.cpp technical principles. Includes 3 real-world scenarios to help you run large models stably and maximize multi-GPU hardware
Ollama Multi-Model Deployment: Running Qwen, Llama, and DeepSeek in Parallel
A detailed guide on configuring Ollama for multi-model parallel execution, comparing Qwen, Llama, and DeepSeek models, and sharing GPU memory management tips for building an intelligent model switching system.
LangChain + Ollama Integration Guide: Complete Local LLM App Development
Complete guide to integrating LangChain with Ollama, with code examples for Chat, RAG, and Agent scenarios, plus OpenAI vs Ollama switching strategies for building enterprise LLM apps with local models.
Ollama Embedding in Practice: Local Vector Search and RAG Setup
Build a local RAG system with Ollama: mxbai-embed-large vs nomic-embed-text model comparison, ChromaDB/FAISS/Milvus vector database selection, complete Python code tutorial
Ollama Production Monitoring: Logging Configuration and Prometheus Alerting in Practice
A complete Ollama production deployment monitoring solution, including logging configuration, Prometheus metrics collection, AlertManager rules, and Grafana dashboard setup for multi-GPU monitoring and automated fault recovery