docs: add comprehensive refactoring task list

- Phase 1: Critical fixes (memory leaks, performance killers) - Phase 2: Architecture refactoring (break up god objects) - Phase 3: Performance optimization (image pipeline, NMS) - Phase 4: Code quality & testing (DI, unit tests, error handling) Total: 16 prioritized tasks with dependencies mapped Ready for branch-per-task development workflow 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
6 months ago · 067f42ed84
1 changed files with 583 additions and 0 deletions
--- a/REFACTORING_TASKS.md
+++ b/REFACTORING_TASKS.md
@ -0,0 +1,583 @@
+# Pokemon Detection App Refactoring Task List
+
+This document outlines a systematic approach to addressing critical issues identified in the codebase analysis. Tasks are organized by priority and include clear descriptions, attack plans, and dependencies.
+
+## 🚨 **Phase 1: Critical Fixes (Immediate - Week 1)**
+
+### CRITICAL-001: Fix OpenCV Mat Memory Leaks
+**Priority**: 🔴 Critical  
+**Effort**: 2-3 days  
+**Dependencies**: None
+
+**Issue Description**:
+OpenCV Mat objects are not properly released in exception paths, causing native memory leaks that lead to app crashes during extended use.
+
+**Files Affected**:
+- `YOLOOnnxDetector.kt:400-450` (preprocessImageAtScale method)
+- `YOLOOnnxDetector.kt:680-758` (coordinate transformation methods)
+- `ScreenCaptureService.kt:450-550` (image processing pipeline)
+
+**Attack Plan**:
+1. **Audit all Mat usage** - Search for `Mat()` instantiations
+2. **Implement try-finally pattern**:
+   ```kotlin
+   val mat = Mat()
+   try {
+       // processing
+   } finally {
+       mat.release()
+   }
+   ```
+3. **Create Mat resource management utility**:
+   ```kotlin
+   inline fun <T> Mat.use(block: (Mat) -> T): T {
+       try { return block(this) }
+       finally { this.release() }
+   }
+   ```
+4. **Add memory monitoring** in debug builds
+5. **Test with 1-hour continuous detection** to verify no leaks
+
+**Acceptance Criteria**:
+- [ ] All Mat objects have guaranteed release paths
+- [ ] Memory usage remains stable during extended testing
+- [ ] No native memory warnings in logcat
+
+---
+
+### CRITICAL-002: Fix Bitmap Memory Management
+**Priority**: 🔴 Critical  
+**Effort**: 1-2 days  
+**Dependencies**: None
+
+**Issue Description**:
+Large bitmaps (screen-sized ARGB_8888) are created but not always recycled, especially in exception paths.
+
+**Files Affected**:
+- `ScreenCaptureService.kt:380-420` (bitmap creation from ImageReader)
+- `YOLOOnnxDetector.kt:200-250` (bitmap preprocessing)
+
+**Attack Plan**:
+1. **Audit bitmap lifecycle** - Find all `createBitmap()` calls
+2. **Implement bitmap recycling**:
+   ```kotlin
+   bitmap?.let { 
+       if (!it.isRecycled) it.recycle() 
+   }
+   ```
+3. **Use bitmap pools** for frequently allocated sizes
+4. **Monitor bitmap memory** with Debug.getNativeHeapAllocatedSize()
+
+**Acceptance Criteria**:
+- [ ] All bitmaps properly recycled in all code paths
+- [ ] Bitmap memory usage stays under 50MB
+- [ ] No OutOfMemoryError exceptions
+
+---
+
+### CRITICAL-003: Disable Parallel Processing Performance Killer
+**Priority**: 🔴 Critical  
+**Effort**: 1 day  
+**Dependencies**: None
+
+**Issue Description**:
+`ENABLE_MULTI_PREPROCESSING = true` creates more CPU overhead than performance gain due to context switching on mobile CPUs.
+
+**Files Affected**:
+- `YOLOOnnxDetector.kt:300-335` (parallel preprocessing)
+- `YOLOOnnxDetector.kt:25` (configuration constant)
+
+**Attack Plan**:
+1. **Benchmark current performance** - measure detection times
+2. **Set ENABLE_MULTI_PREPROCESSING = false**
+3. **Remove ThreadPoolExecutor creation** in preprocessing
+4. **Benchmark improvement** - expect 20-30% faster detection
+5. **Remove dead code** - parallel processing methods
+
+**Acceptance Criteria**:
+- [ ] Single-threaded preprocessing only
+- [ ] Detection time improved by 20%+
+- [ ] CPU usage reduced during detection
+
+---
+
+### CRITICAL-004: Fix Thread Pool Resource Leaks
+**Priority**: 🔴 Critical  
+**Effort**: 1-2 days  
+**Dependencies**: None
+
+**Issue Description**:
+Thread pools and executors are created but not properly shut down, causing resource leaks.
+
+**Files Affected**:
+- `ScreenCaptureService.kt:150-200` (ocrExecutor management)
+- `YOLOOnnxDetector.kt:300-335` (temporary executors)
+
+**Attack Plan**:
+1. **Audit all Executor usage** - find creation without shutdown
+2. **Implement proper lifecycle management**:
+   ```kotlin
+   override fun onDestroy() {
+       ocrExecutor.shutdown()
+       if (!ocrExecutor.awaitTermination(5, TimeUnit.SECONDS)) {
+           ocrExecutor.shutdownNow()
+       }
+   }
+   ```
+3. **Use single shared executor** instead of creating multiple
+4. **Add executor monitoring** in debug builds
+
+**Acceptance Criteria**:
+- [ ] All executors have proper shutdown lifecycle
+- [ ] Thread count remains stable during app lifecycle
+- [ ] No executor-related memory leaks
+
+---
+
+## 🏗️ **Phase 2: Architecture Refactoring (Weeks 2-4)**
+
+### ARCH-001: Extract Screen Capture Manager
+**Priority**: 🟠 High  
+**Effort**: 3-4 days  
+**Dependencies**: CRITICAL-001, CRITICAL-002
+
+**Issue Description**:
+ScreenCaptureService has 8+ responsibilities. Screen capture logic should be separated into dedicated manager.
+
+**Files Affected**:
+- `ScreenCaptureService.kt:240-350` (MediaProjection + ImageReader setup)
+- Create new: `ScreenCaptureManager.kt`
+
+**Attack Plan**:
+1. **Create ScreenCaptureManager interface**:
+   ```kotlin
+   interface ScreenCaptureManager {
+       fun startCapture(resultData: Intent): Boolean
+       fun stopCapture()
+       fun setImageCallback(callback: (Image) -> Unit)
+   }
+   ```
+2. **Extract capture logic** from service
+3. **Implement dependency injection** for manager
+4. **Create unit tests** for isolated capture logic
+5. **Update service** to use manager
+
+**Acceptance Criteria**:
+- [ ] ScreenCaptureManager handles only capture logic
+- [ ] Service delegates capture operations to manager
+- [ ] Capture logic is unit testable
+- [ ] No functionality regression
+
+---
+
+### ARCH-002: Extract ML Inference Engine
+**Priority**: 🟠 High  
+**Effort**: 4-5 days  
+**Dependencies**: CRITICAL-003, ARCH-001
+
+**Issue Description**:
+ML model management, inference, and preprocessing should be separated from service into dedicated engine.
+
+**Files Affected**:
+- `YOLOOnnxDetector.kt` (consolidate as primary)
+- `ScreenCaptureService.kt:500-800` (ML integration logic)
+- Create new: `MLInferenceEngine.kt`
+
+**Attack Plan**:
+1. **Create MLInferenceEngine interface**:
+   ```kotlin
+   interface MLInferenceEngine {
+       suspend fun initialize(): Boolean
+       suspend fun detect(image: Bitmap): List<Detection>
+       fun setConfidenceThreshold(threshold: Float)
+       fun cleanup()
+   }
+   ```
+2. **Consolidate YOLO implementations** - keep only OnnxDetector
+3. **Remove YOLOTFLiteDetector** and YOLODetector classes
+4. **Extract preprocessing** into separate utility
+5. **Implement async interface** with coroutines
+6. **Add comprehensive error handling**
+
+**Acceptance Criteria**:
+- [ ] Single YOLO implementation (ONNX)
+- [ ] Clean async interface with coroutines
+- [ ] Preprocessing utilities separated
+- [ ] Engine is unit testable
+- [ ] All ML logic removed from service
+
+---
+
+### ARCH-003: Extract Pokemon Data Extractor
+**Priority**: 🟠 High  
+**Effort**: 3-4 days  
+**Dependencies**: ARCH-002
+
+**Issue Description**:
+OCR processing and Pokemon data extraction logic should be separated from service.
+
+**Files Affected**:
+- `ScreenCaptureService.kt:900-1100` (OCR and text processing)
+- Create new: `PokemonDataExtractor.kt`
+
+**Attack Plan**:
+1. **Create PokemonDataExtractor interface**:
+   ```kotlin
+   interface PokemonDataExtractor {
+       suspend fun extractPokemonInfo(detections: List<Detection>, image: Bitmap): PokemonInfo?
+       fun configureRegions(screenSize: Size)
+   }
+   ```
+2. **Extract OCR logic** from service
+3. **Create region-based processing** utility
+4. **Implement async OCR pipeline** (remove blocking latch)
+5. **Add text parsing utilities**
+6. **Create data validation** logic
+
+**Acceptance Criteria**:
+- [ ] OCR logic separated from detection logic
+- [ ] Async OCR pipeline (no blocking)
+- [ ] Region-based text extraction
+- [ ] Data validation and error handling
+- [ ] Unit testable extraction logic
+
+---
+
+### ARCH-004: Create Detection Coordinator
+**Priority**: 🟡 Medium  
+**Effort**: 3-4 days  
+**Dependencies**: ARCH-001, ARCH-002, ARCH-003
+
+**Issue Description**:
+Need central coordinator to orchestrate screen capture → ML detection → data extraction pipeline.
+
+**Files Affected**:
+- Create new: `DetectionCoordinator.kt`
+- Update: `ScreenCaptureService.kt` (reduce to service lifecycle only)
+
+**Attack Plan**:
+1. **Create DetectionCoordinator**:
+   ```kotlin
+   class DetectionCoordinator(
+       private val captureManager: ScreenCaptureManager,
+       private val inferenceEngine: MLInferenceEngine,
+       private val dataExtractor: PokemonDataExtractor
+   ) {
+       suspend fun performDetection(): DetectionResult
+   }
+   ```
+2. **Implement detection pipeline** orchestration
+3. **Add error handling** and retry logic
+4. **Implement result caching** for performance
+5. **Add detection state management**
+6. **Create comprehensive tests**
+
+**Acceptance Criteria**:
+- [ ] Clean pipeline orchestration
+- [ ] Proper error handling and recovery
+- [ ] Result caching implemented
+- [ ] Detection state properly managed
+- [ ] Service only handles Android service lifecycle
+
+---
+
+## ⚡ **Phase 3: Performance Optimization (Weeks 3-5)**
+
+### PERF-001: Optimize Image Conversion Pipeline
+**Priority**: 🟠 High  
+**Effort**: 2-3 days  
+**Dependencies**: ARCH-002
+
+**Issue Description**:
+Multiple redundant image conversions (Bitmap→Mat→Tensor) with unnecessary memory copies.
+
+**Files Affected**:
+- `YOLOOnnxDetector.kt:400-500` (preprocessing pipeline)
+
+**Attack Plan**:
+1. **Audit conversion pipeline** - map all transformations
+2. **Eliminate redundant conversions**:
+   - Direct Bitmap→Tensor when possible
+   - Cache intermediate formats
+   - Reuse allocated tensors
+3. **Implement zero-copy optimizations** where possible
+4. **Add conversion performance monitoring**
+5. **Benchmark improvements**
+
+**Acceptance Criteria**:
+- [ ] Minimum number of image conversions
+- [ ] 30%+ reduction in preprocessing time
+- [ ] Memory allocations reduced
+- [ ] Zero-copy optimizations implemented
+
+---
+
+### PERF-002: Implement Efficient NMS Algorithm
+**Priority**: 🟡 Medium  
+**Effort**: 2 days  
+**Dependencies**: ARCH-002
+
+**Issue Description**:
+Current NMS implementation is O(n²) - inefficient for large detection sets.
+
+**Files Affected**:
+- `YOLOTFLiteDetector.kt:200-250` (if still used)
+- `YOLOOnnxDetector.kt` (NMS implementation)
+
+**Attack Plan**:
+1. **Research efficient NMS algorithms** (sorted box approach)
+2. **Implement O(n log n) NMS**:
+   ```kotlin
+   fun efficientNMS(detections: List<Detection>): List<Detection> {
+       return detections
+           .sortedByDescending { it.confidence }
+           .fold(mutableListOf<Detection>()) { kept, current ->
+               if (kept.none { calculateIoU(it.box, current.box) > threshold }) {
+                   kept.add(current)
+               }
+               kept
+           }
+   }
+   ```
+3. **Add IoU calculation optimizations**
+4. **Benchmark against current implementation**
+
+**Acceptance Criteria**:
+- [ ] O(n log n) NMS implementation
+- [ ] 50%+ faster NMS processing
+- [ ] IoU calculations optimized
+- [ ] Maintains same accuracy
+
+---
+
+### PERF-003: Remove Synchronous OCR Blocking
+**Priority**: 🟡 Medium  
+**Effort**: 2-3 days  
+**Dependencies**: ARCH-003
+
+**Issue Description**:
+OCR uses CountDownLatch.await() which blocks the detection pipeline unnecessarily.
+
+**Files Affected**:
+- `ScreenCaptureService.kt:950-1000` (OCR pipeline)
+
+**Attack Plan**:
+1. **Replace CountDownLatch** with coroutines:
+   ```kotlin
+   suspend fun extractTextAsync(image: Bitmap): String {
+       return withContext(Dispatchers.IO) {
+           // OCR processing
+       }
+   }
+   ```
+2. **Implement timeout handling** with coroutines
+3. **Add OCR result caching** for repeated regions
+4. **Pipeline parallelization** - OCR while next detection runs
+
+**Acceptance Criteria**:
+- [ ] No blocking OCR operations
+- [ ] Coroutine-based async OCR
+- [ ] OCR result caching implemented
+- [ ] Pipeline parallelization working
+
+---
+
+## 🧹 **Phase 4: Code Quality & Testing (Weeks 4-6)**
+
+### QUALITY-001: Remove Magic Numbers and Hard-coded Values
+**Priority**: 🟡 Medium  
+**Effort**: 2-3 days  
+**Dependencies**: None (can run parallel)
+
+**Issue Description**:
+Numerous magic numbers and hard-coded values throughout codebase make it difficult to configure and maintain.
+
+**Files Affected**:
+- All major classes (50+ magic numbers identified)
+
+**Attack Plan**:
+1. **Create configuration data classes**:
+   ```kotlin
+   data class DetectionConfig(
+       val confidenceThreshold: Float = 0.55f,
+       val inputSize: Int = 640,
+       val maxInferenceTimeMs: Long = 4500L,
+       val captureIntervalMs: Long = 2000L
+   )
+   ```
+2. **Extract UI configuration** (FAB sizes, animation durations)
+3. **Create build-time configuration** for model paths
+4. **Add runtime configuration** validation
+5. **Update all usage sites**
+
+**Acceptance Criteria**:
+- [ ] All magic numbers extracted to configuration
+- [ ] Configuration validation implemented
+- [ ] Easy to modify behavior without code changes
+- [ ] Build-time and runtime configuration separated
+
+---
+
+### QUALITY-002: Implement Comprehensive Error Handling
+**Priority**: 🟠 High  
+**Effort**: 3-4 days  
+**Dependencies**: ARCH-001, ARCH-002, ARCH-003
+
+**Issue Description**:
+"Silent failure" anti-pattern appears 50+ times - exceptions are swallowed without proper handling.
+
+**Files Affected**:
+- All major classes (widespread issue)
+
+**Attack Plan**:
+1. **Define error handling strategy**:
+   ```kotlin
+   sealed class DetectionError {
+       object ModelNotLoaded : DetectionError()
+       data class InferenceTimeout(val timeoutMs: Long) : DetectionError()
+       data class ImageProcessingFailed(val cause: Throwable) : DetectionError()
+   }
+   ```
+2. **Replace try-catch-ignore** with proper error handling
+3. **Implement error reporting** mechanism
+4. **Add user-facing error messages**
+5. **Create error recovery strategies**
+
+**Acceptance Criteria**:
+- [ ] No silent exception swallowing
+- [ ] Comprehensive error type definitions
+- [ ] User-facing error messages
+- [ ] Error recovery mechanisms
+- [ ] Error logging and reporting
+
+---
+
+### QUALITY-003: Add Unit Test Coverage
+**Priority**: 🟡 Medium  
+**Effort**: 5-7 days  
+**Dependencies**: ARCH-001, ARCH-002, ARCH-003, ARCH-004
+
+**Issue Description**:
+Codebase has no unit tests, making refactoring risky and regression detection impossible.
+
+**Files Affected**:
+- Create: `test/` directory structure
+- All refactored classes need tests
+
+**Attack Plan**:
+1. **Set up testing infrastructure**:
+   - JUnit 5, Mockito, Robolectric
+   - Test data and mock image generation
+2. **Create test strategy**:
+   ```kotlin
+   // Example test structure
+   class MLInferenceEngineTest {
+       @Test fun `detect should return valid detections for test image`()
+       @Test fun `detect should handle empty image gracefully`()
+       @Test fun `detect should respect confidence threshold`()
+   }
+   ```
+3. **Achieve 80%+ coverage** on core logic classes
+4. **Add integration tests** for pipeline
+5. **Set up CI test execution**
+
+**Acceptance Criteria**:
+- [ ] 80%+ unit test coverage on core classes
+- [ ] Integration tests for detection pipeline
+- [ ] Mock-based testing for dependencies
+- [ ] CI pipeline runs tests automatically
+- [ ] Test data generation utilities
+
+---
+
+### QUALITY-004: Implement Dependency Injection
+**Priority**: 🟡 Medium  
+**Effort**: 3-4 days  
+**Dependencies**: ARCH-001, ARCH-002, ARCH-003, ARCH-004
+
+**Issue Description**:
+Hard dependencies throughout codebase make testing difficult and coupling tight.
+
+**Files Affected**:
+- All refactored classes
+- Add: Hilt/Dagger dependencies
+
+**Attack Plan**:
+1. **Add Hilt dependency injection**:
+   ```kotlin
+   @Module
+   @InstallIn(SingletonComponent::class)
+   object DetectionModule {
+       @Provides
+       fun provideMLInferenceEngine(): MLInferenceEngine = YOLOOnnxEngine()
+   }
+   ```
+2. **Create injection interfaces** for all major components
+3. **Update constructors** to accept dependencies
+4. **Create test modules** for mocking
+5. **Add scoping annotations** for lifecycle management
+
+**Acceptance Criteria**:
+- [ ] Hilt DI implemented throughout
+- [ ] No direct dependency instantiation
+- [ ] Test modules for easy mocking
+- [ ] Proper dependency scoping
+- [ ] Clean separation of concerns
+
+---
+
+## 📋 **Task Dependencies Visualization**
+
+```
+Phase 1 (Critical Fixes):
+CRITICAL-001 ────┐
+CRITICAL-002 ────┼─→ ARCH-001 ──┐
+CRITICAL-003 ────┘              │
+CRITICAL-004 ────────────────────┼─→ ARCH-002 ──┐
+                                 │              │
+Phase 2 (Architecture):          │              ├─→ ARCH-004
+                    CRITICAL-003 ─┘              │
+                                                 │
+                                   ARCH-002 ────┴─→ ARCH-003
+                                                     │
+Phase 3 (Performance):                              │
+PERF-001 ←─────────────────── ARCH-002              │
+PERF-002 ←─────────────────── ARCH-002              │
+PERF-003 ←─────────────────── ARCH-003 ←────────────┘
+
+Phase 4 (Quality):
+QUALITY-001 (can run parallel)
+QUALITY-002 ←─ All ARCH tasks
+QUALITY-003 ←─ All ARCH tasks  
+QUALITY-004 ←─ All ARCH tasks
+```
+
+## 🎯 **Success Metrics Per Phase**
+
+### Phase 1 Metrics:
+- [ ] Memory usage stable < 100MB during 1-hour test
+- [ ] Detection time improved by 20%+
+- [ ] Zero native memory leaks
+- [ ] No executor resource leaks
+
+### Phase 2 Metrics:
+- [ ] No classes > 300 lines
+- [ ] Clear separation of concerns
+- [ ] Service class < 200 lines
+- [ ] All components unit testable
+
+### Phase 3 Metrics:
+- [ ] Detection pipeline < 2 seconds
+- [ ] Image preprocessing 30% faster
+- [ ] NMS processing 50% faster
+- [ ] No blocking operations in pipeline
+
+### Phase 4 Metrics:
+- [ ] 80%+ test coverage on core logic
+- [ ] Zero magic numbers in business logic
+- [ ] Comprehensive error handling
+- [ ] Full dependency injection
+
+---
+
+**Next Steps**: Choose which task to start with. I recommend beginning with **CRITICAL-001 (OpenCV Mat Memory Leaks)** as it has no dependencies and addresses the most serious stability issue.