Looking to run Microsoft’s powerful Phi-4 language model locally? This guide will walk you through setting up and using Phi-4 with Ollama and OpenWebUI, providing both Thunder Compute and local installation options.

Why Choose Phi-4?

  • 14B parameter state-of-the-art open model
  • Excellent performance in memory-constrained environments
  • Strong reasoning and logic capabilities
  • 16k token context length
  • Optimized for latency-bound scenarios

Installation Options

The easiest way to get started with Phi-4 on Ollama:

  1. Visit Thunder Compute
  2. Create an account
  3. Select the “ollama” template
  4. Launch your instance
  5. Run start-ollama in the terminal

You’ll get instant access to:

  • Pre-configured Ollama installation
  • OpenWebUI interface
  • Optimized environment for running Phi-4
  • Web-based access from any device

Option 2: Local Installation

If you prefer running locally, you’ll need to:

  1. Install Ollama: ```bash

    For Linux

    curl -L https://ollama.com/install.sh | sh

For Windows WSL2

First ensure WSL2 is installed and running

curl -L https://ollama.com/install.sh | sh


2. Install OpenWebUI:
```bash
docker run -d --name openwebui \
  -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Setting Up Phi-4

  1. Pull the Phi-4 model:
    ollama pull phi:4
    
  2. Access OpenWebUI:
    • Thunder Compute: Use the provided URL
    • Local: Visit http://localhost:3000

Using Phi-4

Model Capabilities

Phi-4 excels at:

  • General text generation
  • Code completion
  • Logical reasoning
  • Academic question answering
  • Task-specific instructions

Example Prompts

  1. Code Generation:
    Write a Python function that implements binary search.
    
  2. Academic Questions:
    Explain the concept of quantum entanglement in simple terms.
    
  3. Logical Reasoning:
    If all A are B, and all B are C, what can we conclude about A and C?
    

Optimizing Performance

Memory Management

  • Default quantized version (Q8_0) requires ~16GB RAM
  • For lower memory systems, use Q4_K_M version
  • Consider context length vs memory trade-offs

Speed Optimization

  • Use appropriate batch sizes
  • Leverage response streaming
  • Consider hardware capabilities

Advanced Features

Custom Model Settings

Adjust model parameters in OpenWebUI:

  • Temperature
  • Top-p sampling
  • Max tokens
  • Stop sequences

System Prompts

Create custom system prompts for specific use cases:

You are a Python programming expert. Provide clear, efficient, and well-documented code examples.

Limitations and Considerations

  1. Resource Requirements
    • Minimum 16GB RAM recommended
    • SSD storage for model files
    • Decent CPU for inference
  2. Use Case Restrictions
    • Primary focus on English language
    • Not designed for all downstream purposes
    • Consider legal and regulatory compliance

Troubleshooting

Common Issues

  1. Out of Memory
    • Use quantized version
    • Reduce context length
    • Close unnecessary applications
  2. Slow Responses
    • Check hardware utilization
    • Optimize prompt length
    • Consider Thunder Compute for better performance
  3. Model Loading Errors
    • Verify disk space
    • Check network connection
    • Ensure correct model name

Resources

Next Steps

After setting up Phi-4, explore:

  • Custom prompt engineering
  • Model fine-tuning options
  • Integration with other tools
  • Advanced parameter optimization

Stay tuned for our upcoming guides on running other powerful models like DeepSeek and advanced Ollama configurations!