Browse Source

docs: add comprehensive refactoring task list

- Phase 1: Critical fixes (memory leaks, performance killers)
- Phase 2: Architecture refactoring (break up god objects)
- Phase 3: Performance optimization (image pipeline, NMS)
- Phase 4: Code quality & testing (DI, unit tests, error handling)

Total: 16 prioritized tasks with dependencies mapped
Ready for branch-per-task development workflow

๐Ÿค– Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
fix/critical-001-opencv-mat-memory-leaks
Quildra 5 months ago
parent
commit
067f42ed84
  1. 583
      REFACTORING_TASKS.md

583
REFACTORING_TASKS.md

@ -0,0 +1,583 @@
# Pokemon Detection App Refactoring Task List
This document outlines a systematic approach to addressing critical issues identified in the codebase analysis. Tasks are organized by priority and include clear descriptions, attack plans, and dependencies.
## ๐Ÿšจ **Phase 1: Critical Fixes (Immediate - Week 1)**
### CRITICAL-001: Fix OpenCV Mat Memory Leaks
**Priority**: ๐Ÿ”ด Critical
**Effort**: 2-3 days
**Dependencies**: None
**Issue Description**:
OpenCV Mat objects are not properly released in exception paths, causing native memory leaks that lead to app crashes during extended use.
**Files Affected**:
- `YOLOOnnxDetector.kt:400-450` (preprocessImageAtScale method)
- `YOLOOnnxDetector.kt:680-758` (coordinate transformation methods)
- `ScreenCaptureService.kt:450-550` (image processing pipeline)
**Attack Plan**:
1. **Audit all Mat usage** - Search for `Mat()` instantiations
2. **Implement try-finally pattern**:
```kotlin
val mat = Mat()
try {
// processing
} finally {
mat.release()
}
```
3. **Create Mat resource management utility**:
```kotlin
inline fun <T> Mat.use(block: (Mat) -> T): T {
try { return block(this) }
finally { this.release() }
}
```
4. **Add memory monitoring** in debug builds
5. **Test with 1-hour continuous detection** to verify no leaks
**Acceptance Criteria**:
- [ ] All Mat objects have guaranteed release paths
- [ ] Memory usage remains stable during extended testing
- [ ] No native memory warnings in logcat
---
### CRITICAL-002: Fix Bitmap Memory Management
**Priority**: ๐Ÿ”ด Critical
**Effort**: 1-2 days
**Dependencies**: None
**Issue Description**:
Large bitmaps (screen-sized ARGB_8888) are created but not always recycled, especially in exception paths.
**Files Affected**:
- `ScreenCaptureService.kt:380-420` (bitmap creation from ImageReader)
- `YOLOOnnxDetector.kt:200-250` (bitmap preprocessing)
**Attack Plan**:
1. **Audit bitmap lifecycle** - Find all `createBitmap()` calls
2. **Implement bitmap recycling**:
```kotlin
bitmap?.let {
if (!it.isRecycled) it.recycle()
}
```
3. **Use bitmap pools** for frequently allocated sizes
4. **Monitor bitmap memory** with Debug.getNativeHeapAllocatedSize()
**Acceptance Criteria**:
- [ ] All bitmaps properly recycled in all code paths
- [ ] Bitmap memory usage stays under 50MB
- [ ] No OutOfMemoryError exceptions
---
### CRITICAL-003: Disable Parallel Processing Performance Killer
**Priority**: ๐Ÿ”ด Critical
**Effort**: 1 day
**Dependencies**: None
**Issue Description**:
`ENABLE_MULTI_PREPROCESSING = true` creates more CPU overhead than performance gain due to context switching on mobile CPUs.
**Files Affected**:
- `YOLOOnnxDetector.kt:300-335` (parallel preprocessing)
- `YOLOOnnxDetector.kt:25` (configuration constant)
**Attack Plan**:
1. **Benchmark current performance** - measure detection times
2. **Set ENABLE_MULTI_PREPROCESSING = false**
3. **Remove ThreadPoolExecutor creation** in preprocessing
4. **Benchmark improvement** - expect 20-30% faster detection
5. **Remove dead code** - parallel processing methods
**Acceptance Criteria**:
- [ ] Single-threaded preprocessing only
- [ ] Detection time improved by 20%+
- [ ] CPU usage reduced during detection
---
### CRITICAL-004: Fix Thread Pool Resource Leaks
**Priority**: ๐Ÿ”ด Critical
**Effort**: 1-2 days
**Dependencies**: None
**Issue Description**:
Thread pools and executors are created but not properly shut down, causing resource leaks.
**Files Affected**:
- `ScreenCaptureService.kt:150-200` (ocrExecutor management)
- `YOLOOnnxDetector.kt:300-335` (temporary executors)
**Attack Plan**:
1. **Audit all Executor usage** - find creation without shutdown
2. **Implement proper lifecycle management**:
```kotlin
override fun onDestroy() {
ocrExecutor.shutdown()
if (!ocrExecutor.awaitTermination(5, TimeUnit.SECONDS)) {
ocrExecutor.shutdownNow()
}
}
```
3. **Use single shared executor** instead of creating multiple
4. **Add executor monitoring** in debug builds
**Acceptance Criteria**:
- [ ] All executors have proper shutdown lifecycle
- [ ] Thread count remains stable during app lifecycle
- [ ] No executor-related memory leaks
---
## ๐Ÿ—๏ธ **Phase 2: Architecture Refactoring (Weeks 2-4)**
### ARCH-001: Extract Screen Capture Manager
**Priority**: ๐ŸŸ  High
**Effort**: 3-4 days
**Dependencies**: CRITICAL-001, CRITICAL-002
**Issue Description**:
ScreenCaptureService has 8+ responsibilities. Screen capture logic should be separated into dedicated manager.
**Files Affected**:
- `ScreenCaptureService.kt:240-350` (MediaProjection + ImageReader setup)
- Create new: `ScreenCaptureManager.kt`
**Attack Plan**:
1. **Create ScreenCaptureManager interface**:
```kotlin
interface ScreenCaptureManager {
fun startCapture(resultData: Intent): Boolean
fun stopCapture()
fun setImageCallback(callback: (Image) -> Unit)
}
```
2. **Extract capture logic** from service
3. **Implement dependency injection** for manager
4. **Create unit tests** for isolated capture logic
5. **Update service** to use manager
**Acceptance Criteria**:
- [ ] ScreenCaptureManager handles only capture logic
- [ ] Service delegates capture operations to manager
- [ ] Capture logic is unit testable
- [ ] No functionality regression
---
### ARCH-002: Extract ML Inference Engine
**Priority**: ๐ŸŸ  High
**Effort**: 4-5 days
**Dependencies**: CRITICAL-003, ARCH-001
**Issue Description**:
ML model management, inference, and preprocessing should be separated from service into dedicated engine.
**Files Affected**:
- `YOLOOnnxDetector.kt` (consolidate as primary)
- `ScreenCaptureService.kt:500-800` (ML integration logic)
- Create new: `MLInferenceEngine.kt`
**Attack Plan**:
1. **Create MLInferenceEngine interface**:
```kotlin
interface MLInferenceEngine {
suspend fun initialize(): Boolean
suspend fun detect(image: Bitmap): List<Detection>
fun setConfidenceThreshold(threshold: Float)
fun cleanup()
}
```
2. **Consolidate YOLO implementations** - keep only OnnxDetector
3. **Remove YOLOTFLiteDetector** and YOLODetector classes
4. **Extract preprocessing** into separate utility
5. **Implement async interface** with coroutines
6. **Add comprehensive error handling**
**Acceptance Criteria**:
- [ ] Single YOLO implementation (ONNX)
- [ ] Clean async interface with coroutines
- [ ] Preprocessing utilities separated
- [ ] Engine is unit testable
- [ ] All ML logic removed from service
---
### ARCH-003: Extract Pokemon Data Extractor
**Priority**: ๐ŸŸ  High
**Effort**: 3-4 days
**Dependencies**: ARCH-002
**Issue Description**:
OCR processing and Pokemon data extraction logic should be separated from service.
**Files Affected**:
- `ScreenCaptureService.kt:900-1100` (OCR and text processing)
- Create new: `PokemonDataExtractor.kt`
**Attack Plan**:
1. **Create PokemonDataExtractor interface**:
```kotlin
interface PokemonDataExtractor {
suspend fun extractPokemonInfo(detections: List<Detection>, image: Bitmap): PokemonInfo?
fun configureRegions(screenSize: Size)
}
```
2. **Extract OCR logic** from service
3. **Create region-based processing** utility
4. **Implement async OCR pipeline** (remove blocking latch)
5. **Add text parsing utilities**
6. **Create data validation** logic
**Acceptance Criteria**:
- [ ] OCR logic separated from detection logic
- [ ] Async OCR pipeline (no blocking)
- [ ] Region-based text extraction
- [ ] Data validation and error handling
- [ ] Unit testable extraction logic
---
### ARCH-004: Create Detection Coordinator
**Priority**: ๐ŸŸก Medium
**Effort**: 3-4 days
**Dependencies**: ARCH-001, ARCH-002, ARCH-003
**Issue Description**:
Need central coordinator to orchestrate screen capture โ†’ ML detection โ†’ data extraction pipeline.
**Files Affected**:
- Create new: `DetectionCoordinator.kt`
- Update: `ScreenCaptureService.kt` (reduce to service lifecycle only)
**Attack Plan**:
1. **Create DetectionCoordinator**:
```kotlin
class DetectionCoordinator(
private val captureManager: ScreenCaptureManager,
private val inferenceEngine: MLInferenceEngine,
private val dataExtractor: PokemonDataExtractor
) {
suspend fun performDetection(): DetectionResult
}
```
2. **Implement detection pipeline** orchestration
3. **Add error handling** and retry logic
4. **Implement result caching** for performance
5. **Add detection state management**
6. **Create comprehensive tests**
**Acceptance Criteria**:
- [ ] Clean pipeline orchestration
- [ ] Proper error handling and recovery
- [ ] Result caching implemented
- [ ] Detection state properly managed
- [ ] Service only handles Android service lifecycle
---
## โšก **Phase 3: Performance Optimization (Weeks 3-5)**
### PERF-001: Optimize Image Conversion Pipeline
**Priority**: ๐ŸŸ  High
**Effort**: 2-3 days
**Dependencies**: ARCH-002
**Issue Description**:
Multiple redundant image conversions (Bitmapโ†’Matโ†’Tensor) with unnecessary memory copies.
**Files Affected**:
- `YOLOOnnxDetector.kt:400-500` (preprocessing pipeline)
**Attack Plan**:
1. **Audit conversion pipeline** - map all transformations
2. **Eliminate redundant conversions**:
- Direct Bitmapโ†’Tensor when possible
- Cache intermediate formats
- Reuse allocated tensors
3. **Implement zero-copy optimizations** where possible
4. **Add conversion performance monitoring**
5. **Benchmark improvements**
**Acceptance Criteria**:
- [ ] Minimum number of image conversions
- [ ] 30%+ reduction in preprocessing time
- [ ] Memory allocations reduced
- [ ] Zero-copy optimizations implemented
---
### PERF-002: Implement Efficient NMS Algorithm
**Priority**: ๐ŸŸก Medium
**Effort**: 2 days
**Dependencies**: ARCH-002
**Issue Description**:
Current NMS implementation is O(nยฒ) - inefficient for large detection sets.
**Files Affected**:
- `YOLOTFLiteDetector.kt:200-250` (if still used)
- `YOLOOnnxDetector.kt` (NMS implementation)
**Attack Plan**:
1. **Research efficient NMS algorithms** (sorted box approach)
2. **Implement O(n log n) NMS**:
```kotlin
fun efficientNMS(detections: List<Detection>): List<Detection> {
return detections
.sortedByDescending { it.confidence }
.fold(mutableListOf<Detection>()) { kept, current ->
if (kept.none { calculateIoU(it.box, current.box) > threshold }) {
kept.add(current)
}
kept
}
}
```
3. **Add IoU calculation optimizations**
4. **Benchmark against current implementation**
**Acceptance Criteria**:
- [ ] O(n log n) NMS implementation
- [ ] 50%+ faster NMS processing
- [ ] IoU calculations optimized
- [ ] Maintains same accuracy
---
### PERF-003: Remove Synchronous OCR Blocking
**Priority**: ๐ŸŸก Medium
**Effort**: 2-3 days
**Dependencies**: ARCH-003
**Issue Description**:
OCR uses CountDownLatch.await() which blocks the detection pipeline unnecessarily.
**Files Affected**:
- `ScreenCaptureService.kt:950-1000` (OCR pipeline)
**Attack Plan**:
1. **Replace CountDownLatch** with coroutines:
```kotlin
suspend fun extractTextAsync(image: Bitmap): String {
return withContext(Dispatchers.IO) {
// OCR processing
}
}
```
2. **Implement timeout handling** with coroutines
3. **Add OCR result caching** for repeated regions
4. **Pipeline parallelization** - OCR while next detection runs
**Acceptance Criteria**:
- [ ] No blocking OCR operations
- [ ] Coroutine-based async OCR
- [ ] OCR result caching implemented
- [ ] Pipeline parallelization working
---
## ๐Ÿงน **Phase 4: Code Quality & Testing (Weeks 4-6)**
### QUALITY-001: Remove Magic Numbers and Hard-coded Values
**Priority**: ๐ŸŸก Medium
**Effort**: 2-3 days
**Dependencies**: None (can run parallel)
**Issue Description**:
Numerous magic numbers and hard-coded values throughout codebase make it difficult to configure and maintain.
**Files Affected**:
- All major classes (50+ magic numbers identified)
**Attack Plan**:
1. **Create configuration data classes**:
```kotlin
data class DetectionConfig(
val confidenceThreshold: Float = 0.55f,
val inputSize: Int = 640,
val maxInferenceTimeMs: Long = 4500L,
val captureIntervalMs: Long = 2000L
)
```
2. **Extract UI configuration** (FAB sizes, animation durations)
3. **Create build-time configuration** for model paths
4. **Add runtime configuration** validation
5. **Update all usage sites**
**Acceptance Criteria**:
- [ ] All magic numbers extracted to configuration
- [ ] Configuration validation implemented
- [ ] Easy to modify behavior without code changes
- [ ] Build-time and runtime configuration separated
---
### QUALITY-002: Implement Comprehensive Error Handling
**Priority**: ๐ŸŸ  High
**Effort**: 3-4 days
**Dependencies**: ARCH-001, ARCH-002, ARCH-003
**Issue Description**:
"Silent failure" anti-pattern appears 50+ times - exceptions are swallowed without proper handling.
**Files Affected**:
- All major classes (widespread issue)
**Attack Plan**:
1. **Define error handling strategy**:
```kotlin
sealed class DetectionError {
object ModelNotLoaded : DetectionError()
data class InferenceTimeout(val timeoutMs: Long) : DetectionError()
data class ImageProcessingFailed(val cause: Throwable) : DetectionError()
}
```
2. **Replace try-catch-ignore** with proper error handling
3. **Implement error reporting** mechanism
4. **Add user-facing error messages**
5. **Create error recovery strategies**
**Acceptance Criteria**:
- [ ] No silent exception swallowing
- [ ] Comprehensive error type definitions
- [ ] User-facing error messages
- [ ] Error recovery mechanisms
- [ ] Error logging and reporting
---
### QUALITY-003: Add Unit Test Coverage
**Priority**: ๐ŸŸก Medium
**Effort**: 5-7 days
**Dependencies**: ARCH-001, ARCH-002, ARCH-003, ARCH-004
**Issue Description**:
Codebase has no unit tests, making refactoring risky and regression detection impossible.
**Files Affected**:
- Create: `test/` directory structure
- All refactored classes need tests
**Attack Plan**:
1. **Set up testing infrastructure**:
- JUnit 5, Mockito, Robolectric
- Test data and mock image generation
2. **Create test strategy**:
```kotlin
// Example test structure
class MLInferenceEngineTest {
@Test fun `detect should return valid detections for test image`()
@Test fun `detect should handle empty image gracefully`()
@Test fun `detect should respect confidence threshold`()
}
```
3. **Achieve 80%+ coverage** on core logic classes
4. **Add integration tests** for pipeline
5. **Set up CI test execution**
**Acceptance Criteria**:
- [ ] 80%+ unit test coverage on core classes
- [ ] Integration tests for detection pipeline
- [ ] Mock-based testing for dependencies
- [ ] CI pipeline runs tests automatically
- [ ] Test data generation utilities
---
### QUALITY-004: Implement Dependency Injection
**Priority**: ๐ŸŸก Medium
**Effort**: 3-4 days
**Dependencies**: ARCH-001, ARCH-002, ARCH-003, ARCH-004
**Issue Description**:
Hard dependencies throughout codebase make testing difficult and coupling tight.
**Files Affected**:
- All refactored classes
- Add: Hilt/Dagger dependencies
**Attack Plan**:
1. **Add Hilt dependency injection**:
```kotlin
@Module
@InstallIn(SingletonComponent::class)
object DetectionModule {
@Provides
fun provideMLInferenceEngine(): MLInferenceEngine = YOLOOnnxEngine()
}
```
2. **Create injection interfaces** for all major components
3. **Update constructors** to accept dependencies
4. **Create test modules** for mocking
5. **Add scoping annotations** for lifecycle management
**Acceptance Criteria**:
- [ ] Hilt DI implemented throughout
- [ ] No direct dependency instantiation
- [ ] Test modules for easy mocking
- [ ] Proper dependency scoping
- [ ] Clean separation of concerns
---
## ๐Ÿ“‹ **Task Dependencies Visualization**
```
Phase 1 (Critical Fixes):
CRITICAL-001 โ”€โ”€โ”€โ”€โ”
CRITICAL-002 โ”€โ”€โ”€โ”€โ”ผโ”€โ†’ ARCH-001 โ”€โ”€โ”
CRITICAL-003 โ”€โ”€โ”€โ”€โ”˜ โ”‚
CRITICAL-004 โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ†’ ARCH-002 โ”€โ”€โ”
โ”‚ โ”‚
Phase 2 (Architecture): โ”‚ โ”œโ”€โ†’ ARCH-004
CRITICAL-003 โ”€โ”˜ โ”‚
โ”‚
ARCH-002 โ”€โ”€โ”€โ”€โ”ดโ”€โ†’ ARCH-003
โ”‚
Phase 3 (Performance): โ”‚
PERF-001 โ†โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ARCH-002 โ”‚
PERF-002 โ†โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ARCH-002 โ”‚
PERF-003 โ†โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ARCH-003 โ†โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Phase 4 (Quality):
QUALITY-001 (can run parallel)
QUALITY-002 โ†โ”€ All ARCH tasks
QUALITY-003 โ†โ”€ All ARCH tasks
QUALITY-004 โ†โ”€ All ARCH tasks
```
## ๐ŸŽฏ **Success Metrics Per Phase**
### Phase 1 Metrics:
- [ ] Memory usage stable < 100MB during 1-hour test
- [ ] Detection time improved by 20%+
- [ ] Zero native memory leaks
- [ ] No executor resource leaks
### Phase 2 Metrics:
- [ ] No classes > 300 lines
- [ ] Clear separation of concerns
- [ ] Service class < 200 lines
- [ ] All components unit testable
### Phase 3 Metrics:
- [ ] Detection pipeline < 2 seconds
- [ ] Image preprocessing 30% faster
- [ ] NMS processing 50% faster
- [ ] No blocking operations in pipeline
### Phase 4 Metrics:
- [ ] 80%+ test coverage on core logic
- [ ] Zero magic numbers in business logic
- [ ] Comprehensive error handling
- [ ] Full dependency injection
---
**Next Steps**: Choose which task to start with. I recommend beginning with **CRITICAL-001 (OpenCV Mat Memory Leaks)** as it has no dependencies and addresses the most serious stability issue.
Loadingโ€ฆ
Cancel
Save