From 4175077b5d0f40fa5b46175d9c82257191cb373b Mon Sep 17 00:00:00 2001
From: JSC <jschoisy@carea5.com>
Date: Wed, 30 Jul 2025 21:44:43 +0200
Subject: [PATCH] Add CLAUDE.md for project documentation and development
 guidelines

---
 CLAUDE.md | 457 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 457 insertions(+)
 create mode 100644 CLAUDE.md

diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 0000000..7df4226
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,457 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Development Commands
+
+### Backend (Python/FastAPI)
+- **Development server**: `cd backend && uv run python run.py` or `cd backend && uv run uvicorn app.main:app --reload`
+- **Tests**: `cd backend && uv run pytest` (uses pytest with asyncio support)
+- **Coverage**: `cd backend && uv run coverage run -m pytest && uv run coverage report`
+- **Linting**: `cd backend && uv run ruff check` and `cd backend && uv run ruff format`
+- **Type checking**: `cd backend && uv run mypy .`
+- **Install dependencies**: `cd backend && uv sync`
+
+### Frontend (React/TypeScript/Vite)
+- **Development server**: `cd frontend && bun dev` (runs on port 8001)
+- **Build**: `cd frontend && bun run build`
+- **Linting**: `cd frontend && bun run lint`
+- **Preview build**: `cd frontend && bun run preview`
+- **Install dependencies**: `cd frontend && bun install`
+
+## Architecture Overview
+
+This is a soundboard application with a FastAPI backend and React frontend.
+
+### Backend Architecture
+- **Framework**: FastAPI with SQLModel for database ORM
+- **Database**: SQLite with aiosqlite async driver
+- **Authentication**: JWT tokens with OAuth2 support (Google, GitHub)
+- **Dependencies**: FastAPI, SQLModel, aiosqlite, bcrypt, PyJWT, pydantic-settings, uvicorn, ffmpeg-python, yt-dlp
+- **Structure**:
+  - `app/api/v1/`: API endpoints for v1 (auth.py, main.py, sounds.py, socket.py)
+  - `app/models/`: Database models (User, Sound, Playlist, Extraction, Plan, UserOAuth, CreditTransaction, SoundPlayed, etc.)
+  - `app/services/`: Business logic layer (auth.py, oauth.py, socket.py, sound_scanner.py, sound_normalizer.py, extraction.py, extraction_processor.py, credit.py)
+  - `app/repositories/`: Data access layer (base.py, user.py, user_oauth.py, sound.py, extraction.py, credit_transaction.py, playlist.py)
+  - `app/schemas/`: Pydantic schemas for API requests/responses (auth.py)
+  - `app/core/`: Configuration, database setup, logging, dependencies, seeds
+  - `app/middleware/`: Custom middleware (logging)
+  - `app/utils/`: Utility functions (auth.py, cookies.py, audio.py)
+  - `tests/`: Comprehensive test suite with pytest and asyncio support
+
+### Frontend Architecture
+- **Framework**: React 19 with TypeScript
+- **Build Tool**: Vite with SWC for fast development and builds
+- **UI Library**: Comprehensive Radix UI component system with shadcn/ui
+- **Styling**: Tailwind CSS v4 with custom components
+- **Theming**: next-themes with dark mode support via ThemeProvider and ModeToggle
+- **Routing**: React Router v7
+- **Package Manager**: Bun for fast package management
+- **Key Dependencies**: @radix-ui components, lucide-react icons, recharts, sonner notifications
+- **Structure**:
+  - `src/components/ui/`: Complete UI component library (button, card, dialog, table, etc.)
+  - `src/components/`: App-specific components (ThemeProvider, ModeToggle)
+  - `src/hooks/`: Custom React hooks (use-mobile.ts)
+  - `src/lib/`: Utility functions (utils.ts with cn helper)
+  - `src/contexts/`: React contexts (empty, ready for state management)
+  - `src/assets/`: Static assets
+
+### Key Models
+- **User**: Authentication, plans, credits, API tokens
+- **UserOAuth**: OAuth provider connections (Google, GitHub) with unique constraints
+- **Plan**: User subscription plans and limits
+- **Sound**: Audio files with metadata, normalization fields, play counts, **unique hash constraint**
+- **Playlist**: User-created sound collections
+- **PlaylistSound**: Many-to-many relationship between playlists and sounds
+- **Extraction**: Audio extraction jobs from external services with async processing and flexible service detection
+- **SoundPlayed**: Play history tracking with user and sound associations
+- **CreditTransaction**: Comprehensive credit system transaction logging with metadata
+
+### Database
+- SQLite database at `backend/data/soundboard.db`
+- Models use SQLModel (Pydantic + SQLAlchemy)
+- Async database operations with aiosqlite
+- **Data Integrity**: Unique constraints on sound hash, OAuth provider+user combinations
+- **Foreign Key Relationships**: Proper cascading and relationship management
+
+### Configuration
+- Backend settings in `backend/app/core/config.py` using pydantic-settings
+- Environment variables loaded from `.env` files
+- Configurable settings: database URL, JWT secrets, OAuth2 client credentials, logging, cookies, audio normalization, audio extraction, credits system
+- Default ports: Backend (8000), Frontend (8001)
+- OAuth redirect URL: `http://localhost:8001/auth/callback`
+
+### Development Notes
+- Backend runs on port 8000 by default (configurable via HOST/PORT env vars)
+- Frontend dev server runs on port 8001
+- Project uses Python 3.12+ with uv package manager for backend
+- Frontend uses TypeScript 5.8+ with strict mode enabled
+- Comprehensive linting: Ruff (backend), ESLint (frontend)
+- Type checking: mypy (backend), TypeScript (frontend)
+- Testing: pytest with asyncio support and coverage reporting (76+ repository tests)
+- Logs stored in `backend/logs/app.log` with rotation
+- Audio files stored in `backend/sounds/` directory structure (originals, normalized, extracted)
+- Database file at `backend/data/soundboard.db`
+- Extraction processing uses background workers with configurable concurrency limits
+
+## Credit System
+
+The application includes a comprehensive credit-based system for managing user actions and resource consumption.
+
+### Credit Features
+- **Action-based Deductions**: Credits are deducted for specific actions (VLC play, audio extraction, etc.)
+- **Transaction Logging**: All credit changes are logged with detailed metadata
+- **Plan Integration**: Credit limits and replenishment tied to user subscription plans
+- **Real-time Updates**: WebSocket events notify users of credit changes
+- **Admin Management**: Administrative controls for credit adjustments
+
+### Credit Actions
+- **VLC Play Sound**: Deducts credits when playing sounds through VLC
+- **Audio Extraction**: Deducts credits for extracting audio from external URLs
+- **Credit Addition**: Administrative credit bonuses and plan-based replenishment
+
+### Database Schema (CreditTransaction Model)
+- **Comprehensive Tracking**: User ID, action type, amount, balance before/after
+- **Metadata Storage**: JSON metadata for action-specific details
+- **Success Tracking**: Boolean flag for successful/failed transactions
+- **Temporal Ordering**: Created/updated timestamps for audit trails
+
+### API Integration
+- **Automatic Deduction**: Services automatically deduct credits during operations
+- **Balance Checking**: Credit validation before expensive operations
+- **Transaction History**: API endpoints for viewing credit transaction history
+- **Real-time Events**: WebSocket emission of `user_credits_changed` events
+
+### Technical Implementation
+- **Service**: `app/services/credit.py` - Core credit management with WebSocket integration
+- **Repository**: `app/repositories/credit_transaction.py` - Database operations for credit transactions
+- **Models**: `CreditTransaction` model with comprehensive metadata tracking
+- **Testing**: 14 comprehensive tests covering all credit scenarios
+
+## Sound Management System
+
+Enhanced sound management with comprehensive duplicate prevention and integrity features.
+
+### Sound Features
+- **Duplicate Prevention**: Unique hash constraint prevents duplicate audio files
+- **Metadata Tracking**: Complete audio file metadata (duration, size, hash, type)
+- **Play Count Tracking**: Usage statistics for popular sounds analysis
+- **Type Classification**: SDB (soundboard), TTS (text-to-speech), EXT (extracted) categorization
+- **Normalization Support**: Integration with audio normalization system
+- **File Integrity**: SHA-256 hash verification for data integrity
+
+### Database Constraints
+- **Unique Hash**: `UniqueConstraint("hash", name="uq_sound_hash")` prevents duplicate files
+- **Data Integrity**: Proper foreign key relationships and nullable field handling
+- **Indexed Fields**: Optimized queries for common operations (filename, hash, type)
+
+### Technical Implementation
+- **Repository**: `app/repositories/sound.py` - Complete CRUD operations with specialized queries
+- **Models**: Enhanced `Sound` model with unique constraints and relationship management
+- **API Integration**: Sound creation, update, deletion with duplicate prevention
+- **Testing**: 15 comprehensive tests covering all sound operations including constraint validation
+
+## Repository Pattern & Testing
+
+Comprehensive repository pattern implementation with full test coverage for data access layer.
+
+### Repository Architecture
+- **Base Repository**: `app/repositories/base.py` - Generic CRUD operations with type safety
+- **Specialized Repositories**: Domain-specific repositories extending base functionality
+- **Async Operations**: Full async/await support for non-blocking database operations
+- **Error Handling**: Comprehensive exception handling with logging
+
+### Repository Coverage
+- **User Repository**: User management, authentication, role-based operations
+- **Sound Repository**: Audio file management with specialized queries
+- **Credit Transaction Repository**: Credit system transaction management
+- **User OAuth Repository**: OAuth provider management and authentication
+- **Playlist Repository**: Playlist management and sound associations
+- **Extraction Repository**: Audio extraction job management
+
+### Testing Infrastructure
+- **76+ Repository Tests**: Comprehensive test coverage across all repositories
+- **Async Test Support**: Proper async/await testing with pytest-asyncio
+- **SQLAlchemy Integration**: Proper session management and lazy loading handling
+- **Type Safety**: Complete mypy type checking compliance
+- **Fixture Management**: Reusable test fixtures with proper dependency injection
+
+### Test Categories
+- **CRUD Operations**: Create, read, update, delete operations for all entities
+- **Constraint Validation**: Unique constraint and foreign key relationship testing
+- **Pagination Testing**: Limit/offset pagination with proper ordering
+- **Error Scenarios**: Exception handling and error condition testing
+- **Performance Tests**: Query optimization and efficient data access patterns
+
+## Sound Normalization System
+
+The application includes a comprehensive audio normalization system using FFmpeg's loudnorm filter for professional-quality audio processing.
+
+### Normalization Features
+- **Two-pass normalization**: Default high-quality mode with analysis and normalization phases
+- **One-pass normalization**: Fast mode for quick processing or as fallback
+- **Intelligent fallback**: Automatically switches to one-pass for problematic audio (infinite analysis values)
+- **Batch processing**: Normalize all sounds or filter by type (SDB, TTS, EXT)
+- **Admin-only access**: Normalization endpoints require administrator privileges
+- **Comprehensive logging**: Detailed FFmpeg output and error handling
+
+### Directory Structure
+```
+backend/sounds/
+├── originals/
+│   ├── soundboard/     # SDB type sounds
+│   ├── text_to_speech/ # TTS type sounds
+│   └── extracted/      # EXT type sounds
+└── normalized/
+    ├── soundboard/     # Normalized SDB sounds
+    ├── text_to_speech/ # Normalized TTS sounds
+    └── extracted/      # Normalized EXT sounds
+```
+
+### Configuration (Environment Variables)
+- `NORMALIZED_AUDIO_FORMAT`: Output format (default: "mp3")
+- `NORMALIZED_AUDIO_BITRATE`: Bitrate setting (default: "256k")
+- `NORMALIZED_AUDIO_PASSES`: 1 for one-pass, 2 for two-pass (default: 2)
+
+### Database Fields (Sound Model)
+- `is_normalized`: Boolean flag indicating normalization status
+- `normalized_filename`: Filename of normalized audio file
+- `normalized_duration`: Duration in milliseconds of normalized file
+- `normalized_size`: File size in bytes of normalized file
+- `normalized_hash`: SHA-256 hash of normalized file for integrity
+
+### API Endpoints
+- `POST /api/v1/sounds/normalize/all`: Normalize all unnormalized sounds
+- `POST /api/v1/sounds/normalize/type/{sound_type}`: Normalize sounds by type
+- `POST /api/v1/sounds/normalize/{sound_id}`: Normalize specific sound
+- **Parameters**: `force` (re-normalize already processed), `one_pass` (override config)
+
+### Technical Implementation
+- **Service**: `app/services/sound_normalizer.py` - Core normalization logic
+- **API**: `app/api/v1/sounds.py` - REST endpoints (consolidated with other sound endpoints)
+- **Repository**: Enhanced `app/repositories/sound.py` with normalization queries
+- **Dependencies**: Requires FFmpeg installed on system, uses ffmpeg-python library
+- **Error Handling**: Graceful fallback for edge cases (silent audio, infinite values)
+- **Session Management**: Handles SQLModel session detachment in batch operations
+
+### Normalization Process
+1. **Analysis Phase** (two-pass only): Analyze audio characteristics
+2. **Validation**: Check for invalid analysis values (inf, -inf, nan)
+3. **Fallback Logic**: Switch to one-pass if analysis contains invalid values
+4. **Normalization**: Apply loudnorm filter with target levels (I=-23, TP=-2, LRA=7)
+5. **Database Update**: Store normalized file metadata and set is_normalized flag
+
+### Testing
+- 17 comprehensive service tests covering all normalization scenarios
+- 16 API endpoint tests with authentication and authorization checks
+- Edge case handling for problematic audio files
+- Mock FFmpeg operations for reliable testing
+
+## Sound Scanner System
+
+The application includes a sound scanner service for automatically discovering, importing, and managing audio files in the filesystem.
+
+### Scanner Features
+- **File Discovery**: Recursively scans sound directories for audio files
+- **Format Support**: Handles multiple audio formats (.mp3, .wav, .flac, .ogg, .m4a, etc.)
+- **Metadata Extraction**: Uses FFmpeg to extract duration and file information
+- **Database Sync**: Automatically adds new files, updates existing ones, removes deleted files
+- **Admin-only Access**: Scanning operations require administrator privileges
+- **Comprehensive Reporting**: Detailed results showing added, updated, deleted, and skipped files
+- **Duplicate Prevention**: Integration with unique hash constraint system
+
+### Technical Implementation
+- **Service**: `app/services/sound_scanner.py` - Core scanning and import logic
+- **API**: `app/api/v1/sounds.py` - REST endpoint for scanning operations
+- **Dependencies**: Requires FFmpeg for metadata extraction
+- **Error Handling**: Graceful handling of corrupted or unreadable files
+- **Hash-based Detection**: Uses SHA-256 hashing to detect file changes and prevent duplicates
+
+### Scanning Process
+1. **Directory Traversal**: Recursively scan configured sound directories
+2. **File Validation**: Check file extensions and accessibility
+3. **Metadata Extraction**: Extract duration, size, and hash using FFmpeg
+4. **Database Comparison**: Compare with existing database records
+5. **Duplicate Detection**: Check unique hash constraint before insertion
+6. **Sync Operations**: Add new files, update changed files, remove deleted files
+7. **Results Reporting**: Return detailed scan results with statistics
+
+### API Endpoints
+- `POST /api/v1/sounds/scan`: Scan and sync sound directories
+
+## WebSocket/Socket.IO System
+
+Real-time communication system using WebSocket connections for live updates and messaging.
+
+### Socket Features
+- **Real-time Communication**: WebSocket-based messaging between users
+- **Connection Management**: Track connected users and connection status
+- **User-to-User Messaging**: Send messages to specific users
+- **Connection Status**: Get current connection status and user count
+- **Authentication Integration**: Uses existing user authentication system
+- **Credit Change Notifications**: Real-time credit balance updates via `user_credits_changed` events
+
+### Technical Implementation
+- **Service**: `app/services/socket.py` - Socket.IO manager and connection handling
+- **API**: `app/api/v1/socket.py` - REST endpoints for socket operations
+- **Manager**: Centralized socket connection management with user tracking
+- **Authentication**: Integrated with existing JWT authentication system
+- **Event System**: Structured event emission for various application events
+
+### API Endpoints
+- `GET /api/v1/socket/status`: Get current socket connection status
+- `POST /api/v1/socket/send-message`: Send a message to a specific user via WebSocket
+
+### Socket Events
+- **Connection Management**: Connection and disconnection tracking
+- **User Messages**: User-specific message routing
+- **Credit Updates**: `user_credits_changed` events with detailed transaction data
+- **Real-time Status**: Live application status updates
+
+## Audio Utilities
+
+Shared utility functions for audio file processing used across multiple services.
+
+### Audio Utility Functions
+- **File Hashing**: `get_file_hash()` - Calculate SHA-256 hash of audio files for integrity checking
+- **File Size**: `get_file_size()` - Get file size in bytes for metadata storage
+- **Duration Extraction**: `get_audio_duration()` - Extract audio duration in milliseconds using FFmpeg
+
+### Technical Implementation
+- **Module**: `app/utils/audio.py` - Shared audio processing utilities
+- **Dependencies**: Uses FFmpeg via ffmpeg-python for duration extraction
+- **Error Handling**: Graceful fallback for corrupted or unreadable files
+- **Consistent Interface**: Same function signatures across all audio services
+
+### Usage
+- **Sound Scanner**: Uses utilities for file discovery and metadata extraction
+- **Sound Normalizer**: Uses utilities for normalized file verification and metadata
+- **Audio Extraction**: Uses utilities for extracted audio file metadata and validation
+- **Duplicate Prevention**: Hash calculation for unique constraint enforcement
+- **Centralized Logic**: Eliminates code duplication between audio processing services
+
+## Audio Extraction System
+
+The application includes a comprehensive audio extraction system for downloading and processing audio content from external services using yt-dlp.
+
+### Extraction Features
+- **Immediate Response**: API endpoints return immediately without waiting for yt-dlp processing
+- **Background Processing**: Actual extraction happens asynchronously in background worker threads
+- **Multi-Service Support**: Supports YouTube, SoundCloud, Vimeo, DailyMotion, TikTok, Twitter, Instagram
+- **Non-blocking Operations**: yt-dlp operations run in thread pools to prevent event loop blocking
+- **Concurrent Processing**: Configurable maximum concurrent extractions with queue management
+- **Automatic Normalization**: Extracted audio is automatically normalized using the sound normalization system
+- **Error Handling**: Comprehensive error handling with detailed logging and status tracking
+- **Credit Integration**: Automatic credit deduction for extraction operations
+
+### Database Schema (Extraction Model)
+- **Flexible Service Detection**: `service` and `service_id` are nullable during creation, populated during processing
+- **Status Tracking**: `pending` → `processing` → `completed`/`failed`
+- **Metadata Storage**: URL, title, user association, linked sound record
+- **Error Logging**: Detailed error messages for failed extractions
+
+### Directory Structure
+```
+backend/sounds/temp/          # Temporary extraction workspace
+backend/sounds/originals/extracted/  # Final extracted audio files
+backend/sounds/originals/extracted/thumbnails/  # Extracted thumbnails
+```
+
+### Configuration (Environment Variables)
+- `EXTRACTION_AUDIO_FORMAT`: Output audio format (default: "mp3")
+- `EXTRACTION_AUDIO_BITRATE`: Audio bitrate setting (default: "256k")
+- `EXTRACTION_TEMP_DIR`: Temporary extraction directory (default: "sounds/temp")
+- `EXTRACTION_THUMBNAILS_DIR`: Thumbnail storage directory (default: "sounds/originals/extracted/thumbnails")
+- `EXTRACTION_MAX_CONCURRENT`: Maximum concurrent extractions (default: 2)
+
+### API Endpoints
+- `POST /api/v1/sounds/extract?url={url}`: Create extraction job (immediate response)
+- `GET /api/v1/sounds/extract/status`: Get extraction processor status
+- `GET /api/v1/sounds/extract/{extraction_id}`: Get specific extraction info
+- `GET /api/v1/sounds/extract`: Get user's extraction history
+
+### Technical Implementation
+- **Service**: `app/services/extraction.py` - Core extraction logic with async yt-dlp operations
+- **Processor**: `app/services/extraction_processor.py` - Background queue manager with concurrency control
+- **Repository**: `app/repositories/extraction.py` - Database operations for extraction records
+- **API**: `app/api/v1/sounds.py` - REST endpoints integrated with sound management
+- **Dependencies**: Requires yt-dlp for media extraction, FFmpeg for audio processing
+- **Async Operations**: All blocking I/O operations wrapped in `asyncio.to_thread()` for non-blocking execution
+
+### Extraction Process
+1. **Creation**: Immediate API response with extraction record (service info null)
+2. **Queue**: Background processor picks up pending extractions
+3. **Service Detection**: yt-dlp identifies service and media metadata (non-blocking)
+4. **Duplicate Check**: Verify no existing extraction for same service/media
+5. **Media Download**: Extract audio and thumbnails using yt-dlp (non-blocking)
+6. **File Processing**: Move files to final locations with sanitized names
+7. **Sound Creation**: Create Sound database record with metadata and unique hash
+8. **Normalization**: Automatically normalize extracted audio
+9. **Status Update**: Mark extraction as completed with sound association
+
+### Concurrency and Performance
+- **Thread Pool Execution**: yt-dlp operations run in separate threads
+- **Queue Management**: Background processor manages extraction queue
+- **Concurrent Limits**: Configurable maximum concurrent extractions
+- **Non-blocking API**: Other endpoints remain responsive during extraction
+- **Resource Management**: Automatic cleanup of temporary files
+
+### Error Handling
+- **Service Detection Failures**: Invalid URLs handled gracefully during processing
+- **Download Failures**: Network issues, geo-restrictions, or unavailable content
+- **Processing Failures**: File system errors, FFmpeg issues, or corruption
+- **Duplicate Prevention**: Service-level duplicate detection during processing
+- **Comprehensive Logging**: Detailed error messages and extraction status tracking
+
+### Testing
+- **16 comprehensive service tests** covering all extraction scenarios including async operations
+- **API endpoint tests** with authentication and background processing validation
+- **Error handling tests** for various failure scenarios
+- **Mock yt-dlp operations** for reliable testing without network dependencies
+- **Concurrency tests** validating non-blocking behavior and thread pool execution
+
+## Data Integrity & Performance
+
+### Database Constraints
+- **Sound Hash Uniqueness**: Prevents duplicate audio files via unique hash constraint
+- **OAuth Provider Uniqueness**: Prevents duplicate OAuth connections per provider
+- **Foreign Key Integrity**: Proper cascading relationships between all models
+- **Index Optimization**: Strategic indexing for common query patterns
+
+### Type Safety & Code Quality
+- **Full mypy Compliance**: Complete type checking across all Python code
+- **Async/Await Patterns**: Proper async programming throughout the stack
+- **Error Handling**: Comprehensive exception handling with detailed logging
+- **Test Coverage**: 76+ repository tests with 100% critical path coverage
+
+### Performance Optimizations
+- **Lazy Loading Management**: Proper SQLAlchemy relationship loading
+- **Query Optimization**: Efficient database queries with pagination support
+- **Background Processing**: Non-blocking operations for expensive tasks
+- **Resource Management**: Proper cleanup of temporary files and connections
+
+## Development Best Practices
+
+### Code Organization
+- **Repository Pattern**: Clean separation of data access logic
+- **Service Layer**: Business logic encapsulation with dependency injection
+- **Type Safety**: Comprehensive type annotations and mypy compliance
+- **Error Handling**: Structured exception handling with proper logging
+
+### Testing Strategy
+- **Unit Tests**: Comprehensive repository and service layer testing
+- **Integration Tests**: End-to-end API testing with authentication
+- **Async Testing**: Proper async/await testing patterns with pytest-asyncio
+- **Mock Strategies**: External service mocking for reliable testing
+
+### Security & Authentication
+- **JWT Token Management**: Secure token-based authentication
+- **OAuth Integration**: Third-party authentication with proper scoping
+- **Role-based Access**: Admin/user role separation for sensitive operations
+- **Input Validation**: Comprehensive request validation with Pydantic schemas
+
+### Monitoring & Logging
+- **Structured Logging**: Consistent logging patterns across all services
+- **Error Tracking**: Comprehensive exception logging with context
+- **Performance Monitoring**: Request timing and resource usage tracking
+- **Audit Trails**: Complete transaction history for credit and user operations
\ No newline at end of file