Claude Code Project Structure: Best Practices for AI Apps

Having worked with Anthropic's Claude API across numerous projects over the past year, I've learned that establishing a solid project structure from the outset can make the difference between a maintainable, scalable application and a tangled mess of code. Whether you're building a simple chatbot or a complex AI-powered analysis tool, the way you organise your Claude-based project will determine how easily you can extend, debug, and deploy your solution.

In this tutorial, I'll walk you through my recommended project structure for Claude-based applications, drawing from patterns I've refined across several commercial deployments. This approach has served me well in everything from maritime compliance systems to financial analysis tools.

Why Project Structure Matters for AI Applications

Claude applications often involve complex workflows: prompt engineering, response parsing, error handling, and integration with external systems. Without proper organisation, these components quickly become unwieldy. A well-structured project provides clear separation of concerns, makes testing straightforward, and ensures your code remains maintainable as requirements evolve.

From my experience building 19 different SaaS platforms, I've found that AI-powered applications require even more disciplined structure than traditional software, primarily because the interaction with large language models introduces additional layers of complexity and unpredictability.

Core Directory Structure

Here's the foundation structure I recommend for any Claude project:

claude-project/
├── README.md
├── requirements.txt
├── .env.example
├── .gitignore
├── config/
│   ├── __init__.py
│   ├── settings.py
│   └── prompts/
│       ├── __init__.py
│       ├── base_prompts.py
│       └── templates/
├── src/
│   ├── __init__.py
│   ├── claude_client/
│   │   ├── __init__.py
│   │   ├── client.py
│   │   ├── models.py
│   │   └── exceptions.py
│   ├── processors/
│   │   ├── __init__.py
│   │   ├── text_processor.py
│   │   └── response_parser.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── helpers.py
│   │   └── validators.py
│   └── main.py
├── tests/
│   ├── __init__.py
│   ├── test_claude_client.py
│   ├── test_processors.py
│   └── fixtures/
└── docs/
    ├── api.md
    └── deployment.md

Configuration Management

Proper configuration management is crucial for Claude applications. You'll need to handle API keys, model parameters, and prompt templates securely. Here's how I structure the configuration layer:

Settings Module

In config/settings.py, I centralise all configuration:

import os
from typing import Optional
from pydantic import BaseSettings

class ClaudeConfig(BaseSettings):
    api_key: str
    model: str = "claude-3-sonnet-20240229"
    max_tokens: int = 4000
    temperature: float = 0.7
    timeout: int = 30
    max_retries: int = 3
    
    class Config:
        env_file = ".env"
        env_prefix = "CLAUDE_"

class AppConfig(BaseSettings):
    debug: bool = False
    log_level: str = "INFO"
    
    class Config:
        env_file = ".env"

claude_config = ClaudeConfig()
app_config = AppConfig()

Prompt Management

I've found that separating prompts from code significantly improves maintainability. In config/prompts/base_prompts.py:

from enum import Enum
from typing import Dict

class PromptType(Enum):
    ANALYSIS = "analysis"
    SUMMARY = "summary"
    CLASSIFICATION = "classification"

SYSTEM_PROMPTS: Dict[PromptType, str] = {
    PromptType.ANALYSIS: """
    You are an expert analyst. Analyse the provided content and return 
    structured insights focusing on key patterns, anomalies, and recommendations.
    Always provide specific evidence for your conclusions.
    """,
    
    PromptType.SUMMARY: """
    You are a skilled summariser. Create concise, accurate summaries that 
    capture the essential information whilst maintaining context and nuance.
    """,
    
    PromptType.CLASSIFICATION: """
    You are a classification expert. Categorise the input according to the 
    provided criteria. Be consistent and explain your reasoning.
    """
}

USER_PROMPT_TEMPLATES: Dict[PromptType, str] = {
    PromptType.ANALYSIS: "Please analyse the following content:\n\n{content}",
    PromptType.SUMMARY: "Please summarise the following content:\n\n{content}",
    PromptType.CLASSIFICATION: "Please classify the following content according to {criteria}:\n\n{content}"
}

Claude Client Implementation

The heart of your application is the Claude client. I structure this as a separate module with clear interfaces:

Main Client Class

In src/claude_client/client.py:

import anthropic
from typing import Optional, Dict, Any
import logging
from ..config.settings import claude_config
from .models import ClaudeRequest, ClaudeResponse
from .exceptions import ClaudeAPIError, ClaudeTimeoutError

class ClaudeClient:
    def __init__(self):
        self.client = anthropic.Anthropic(api_key=claude_config.api_key)
        self.logger = logging.getLogger(__name__)
    
    async def send_message(
        self, 
        request: ClaudeRequest,
        **kwargs
    ) -> ClaudeResponse:
        """Send a message to Claude and return structured response."""
        
        try:
            message = self.client.messages.create(
                model=request.model or claude_config.model,
                max_tokens=request.max_tokens or claude_config.max_tokens,
                temperature=request.temperature or claude_config.temperature,
                system=request.system_prompt,
                messages=[{
                    "role": "user", 
                    "content": request.user_prompt
                }],
                **kwargs
            )
            
            return ClaudeResponse(
                content=message.content[0].text,
                usage=message.usage,
                model=message.model,
                stop_reason=message.stop_reason
            )
            
        except anthropic.APITimeoutError as e:
            self.logger.error(f"Claude API timeout: {e}")
            raise ClaudeTimeoutError(f"Request timed out: {e}")
            
        except anthropic.APIError as e:
            self.logger.error(f"Claude API error: {e}")
            raise ClaudeAPIError(f"API error: {e}")
    
    def health_check(self) -> bool:
        """Simple health check for the Claude API."""
        try:
            test_request = ClaudeRequest(
                system_prompt="You are a helpful assistant.",
                user_prompt="Hello"
            )
            response = self.send_message(test_request)
            return response.content is not None
        except Exception:
            return False

Data Models

In src/claude_client/models.py, I define clear data structures:

from pydantic import BaseModel
from typing import Optional, Dict, Any

class ClaudeRequest(BaseModel):
    system_prompt: str
    user_prompt: str
    model: Optional[str] = None
    max_tokens: Optional[int] = None
    temperature: Optional[float] = None
    metadata: Dict[str, Any] = {}

class ClaudeResponse(BaseModel):
    content: str
    usage: Dict[str, Any]
    model: str
    stop_reason: str
    timestamp: Optional[str] = None
    
    class Config:
        arbitrary_types_allowed = True

Processing Layer

The processing layer handles the business logic of your application. Here's an example text processor:

from typing import List, Dict, Any
from ..claude_client.client import ClaudeClient
from ..claude_client.models import ClaudeRequest
from ..config.prompts.base_prompts import SYSTEM_PROMPTS, USER_PROMPT_TEMPLATES, PromptType

class TextProcessor:
    def __init__(self):
        self.claude_client = ClaudeClient()
    
    async def analyse_text(
        self, 
        content: str, 
        analysis_type: PromptType = PromptType.ANALYSIS
    ) -> Dict[str, Any]:
        """Analyse text using Claude with specified analysis type."""
        
        system_prompt = SYSTEM_PROMPTS[analysis_type]
        user_prompt = USER_PROMPT_TEMPLATES[analysis_type].format(content=content)
        
        request = ClaudeRequest(
            system_prompt=system_prompt,
            user_prompt=user_prompt
        )
        
        response = await self.claude_client.send_message(request)
        
        return {
            "analysis": response.content,
            "type": analysis_type.value,
            "usage": response.usage,
            "model": response.model
        }
    
    async def batch_analyse(
        self, 
        contents: List[str], 
        analysis_type: PromptType = PromptType.ANALYSIS
    ) -> List[Dict[str, Any]]:
        """Analyse multiple texts in batch."""
        
        results = []
        for content in contents:
            result = await self.analyse_text(content, analysis_type)
            results.append(result)
        
        return results

Testing Strategy

Testing Claude applications requires special consideration due to API dependencies. I use a combination of unit tests with mocked responses and integration tests with actual API calls:

import pytest
from unittest.mock import Mock, patch
from src.claude_client.client import ClaudeClient
from src.claude_client.models import ClaudeRequest, ClaudeResponse

@pytest.fixture
def claude_client():
    return ClaudeClient()

@pytest.fixture
def sample_request():
    return ClaudeRequest(
        system_prompt="You are a helpful assistant.",
        user_prompt="Hello, how are you?"
    )

@patch('anthropic.Anthropic')
def test_send_message_success(mock_anthropic, claude_client, sample_request):
    # Mock the API response
    mock_message = Mock()
    mock_message.content = [Mock(text="Hello! I'm doing well, thank you.")]
    mock_message.usage = {"input_tokens": 10, "output_tokens": 8}
    mock_message.model = "claude-3-sonnet-20240229"
    mock_message.stop_reason = "end_turn"
    
    mock_anthropic.return_value.messages.create.return_value = mock_message
    
    response = claude_client.send_message(sample_request)
    
    assert isinstance(response, ClaudeResponse)
    assert response.content == "Hello! I'm doing well, thank you."
    assert response.usage["input_tokens"] == 10

Key Takeaways and Next Steps

This project structure provides a solid foundation for Claude-based applications of any complexity. The key principles are clear separation of concerns, comprehensive configuration management, and robust error handling. The modular approach makes it easy to extend functionality, add new prompt types, or integrate additional AI models.

Your next steps should be to adapt this structure to your specific use case, implement comprehensive logging, and consider adding monitoring for API usage and costs. For production deployments, you'll also want to implement rate limiting and caching strategies to optimise both performance and costs.

In my next article, I'll explore advanced patterns for prompt engineering and response validation within this structure. Until then, I encourage you to experiment with this foundation and adapt it to your particular domain requirements.