Looking to run Microsoft’s powerful Phi-4 language model locally? This guide will walk you through setting up and using Phi-4 with Ollama and OpenWebUI, providing both Thunder Compute and local installation options.

Why Choose Phi-4?

14B parameter state-of-the-art open model
Excellent performance in memory-constrained environments
Strong reasoning and logic capabilities
16k token context length
Optimized for latency-bound scenarios

Installation Options

Option 1: Using Thunder Compute (Recommended for Beginners)

The easiest way to get started with Phi-4 on Ollama:

Visit Thunder Compute
Create an account
Select the “ollama” template
Launch your instance
Run start-ollama in the terminal

You’ll get instant access to:

Pre-configured Ollama installation
OpenWebUI interface
Optimized environment for running Phi-4
Web-based access from any device

Option 2: Local Installation

If you prefer running locally, you’ll need to:

Install Ollama: ```bash
For Linux

curl -L https://ollama.com/install.sh | sh

For Windows WSL2

First ensure WSL2 is installed and running

curl -L https://ollama.com/install.sh | sh

2. Install OpenWebUI:
```bash
docker run -d --name openwebui \
  -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Setting Up Phi-4

Pull the Phi-4 model:
```
ollama pull phi:4
```
Access OpenWebUI:
- Thunder Compute: Use the provided URL
- Local: Visit http://localhost:3000

Using Phi-4

Model Capabilities

Phi-4 excels at:

General text generation
Code completion
Logical reasoning
Academic question answering
Task-specific instructions

Example Prompts

Code Generation:

Write a Python function that implements binary search.

Academic Questions:

Explain the concept of quantum entanglement in simple terms.

Logical Reasoning:

If all A are B, and all B are C, what can we conclude about A and C?

Optimizing Performance

Memory Management

Default quantized version (Q8_0) requires ~16GB RAM
For lower memory systems, use Q4_K_M version
Consider context length vs memory trade-offs

Speed Optimization

Use appropriate batch sizes
Leverage response streaming
Consider hardware capabilities

Advanced Features

Custom Model Settings

Adjust model parameters in OpenWebUI:

Temperature
Top-p sampling
Max tokens
Stop sequences

System Prompts

Create custom system prompts for specific use cases:

You are a Python programming expert. Provide clear, efficient, and well-documented code examples.

Limitations and Considerations

Resource Requirements
- Minimum 16GB RAM recommended
- SSD storage for model files
- Decent CPU for inference
Use Case Restrictions
- Primary focus on English language
- Not designed for all downstream purposes
- Consider legal and regulatory compliance

Troubleshooting

Common Issues

Out of Memory
- Use quantized version
- Reduce context length
- Close unnecessary applications
Slow Responses
- Check hardware utilization
- Optimize prompt length
- Consider Thunder Compute for better performance
Model Loading Errors
- Verify disk space
- Check network connection
- Ensure correct model name

Resources

Next Steps

After setting up Phi-4, explore:

Custom prompt engineering
Model fine-tuning options
Integration with other tools
Advanced parameter optimization

Stay tuned for our upcoming guides on running other powerful models like DeepSeek and advanced Ollama configurations!