Running Phi-4 with Ollama and OpenWebUI - A Complete Guide
Looking to run Microsoft’s powerful Phi-4 language model locally? This guide will walk you through setting up and using Phi-4 with Ollama and OpenWebUI, providing both Thunder Compute and local installation options.
Why Choose Phi-4?
- 14B parameter state-of-the-art open model
- Excellent performance in memory-constrained environments
- Strong reasoning and logic capabilities
- 16k token context length
- Optimized for latency-bound scenarios
Installation Options
Option 1: Using Thunder Compute (Recommended for Beginners)
The easiest way to get started with Phi-4 on Ollama:
- Visit Thunder Compute
- Create an account
- Select the “ollama” template
- Launch your instance
- Run
start-ollama
in the terminal
You’ll get instant access to:
- Pre-configured Ollama installation
- OpenWebUI interface
- Optimized environment for running Phi-4
- Web-based access from any device
Option 2: Local Installation
If you prefer running locally, you’ll need to:
- Install Ollama:
```bash
For Linux
curl -L https://ollama.com/install.sh | sh
For Windows WSL2
First ensure WSL2 is installed and running
curl -L https://ollama.com/install.sh | sh
2. Install OpenWebUI:
```bash
docker run -d --name openwebui \
-p 3000:8080 \
-v open-webui:/app/backend/data \
--restart always \
ghcr.io/open-webui/open-webui:main
Setting Up Phi-4
- Pull the Phi-4 model:
ollama pull phi:4
- Access OpenWebUI:
- Thunder Compute: Use the provided URL
- Local: Visit
http://localhost:3000
Using Phi-4
Model Capabilities
Phi-4 excels at:
- General text generation
- Code completion
- Logical reasoning
- Academic question answering
- Task-specific instructions
Example Prompts
- Code Generation:
Write a Python function that implements binary search.
- Academic Questions:
Explain the concept of quantum entanglement in simple terms.
- Logical Reasoning:
If all A are B, and all B are C, what can we conclude about A and C?
Optimizing Performance
Memory Management
- Default quantized version (Q8_0) requires ~16GB RAM
- For lower memory systems, use Q4_K_M version
- Consider context length vs memory trade-offs
Speed Optimization
- Use appropriate batch sizes
- Leverage response streaming
- Consider hardware capabilities
Advanced Features
Custom Model Settings
Adjust model parameters in OpenWebUI:
- Temperature
- Top-p sampling
- Max tokens
- Stop sequences
System Prompts
Create custom system prompts for specific use cases:
You are a Python programming expert. Provide clear, efficient, and well-documented code examples.
Limitations and Considerations
- Resource Requirements
- Minimum 16GB RAM recommended
- SSD storage for model files
- Decent CPU for inference
- Use Case Restrictions
- Primary focus on English language
- Not designed for all downstream purposes
- Consider legal and regulatory compliance
Troubleshooting
Common Issues
- Out of Memory
- Use quantized version
- Reduce context length
- Close unnecessary applications
- Slow Responses
- Check hardware utilization
- Optimize prompt length
- Consider Thunder Compute for better performance
- Model Loading Errors
- Verify disk space
- Check network connection
- Ensure correct model name
Resources
Next Steps
After setting up Phi-4, explore:
- Custom prompt engineering
- Model fine-tuning options
- Integration with other tools
- Advanced parameter optimization
Stay tuned for our upcoming guides on running other powerful models like DeepSeek and advanced Ollama configurations!
Subscribe via RSS