# BGtopoVJ Blue Rectangle/Square Detection PoC This is a practical first-pass pipeline for finding blue/light-blue square and rectangle symbols in BGtopoVJ raster maps, then using those detections to score a coordinate dataset and bootstrap a YOLO detector. The PoC is intentionally hybrid: 1. Download original BGtopoVJ `*.tif` + `*.map` sheet pairs. 2. Open the raster through GDAL/Rasterio, preferring the OziExplorer `.map` sidecar when available. 3. Mine weak candidates using OpenCV HSV thresholding + contour/rectangle filters. 4. Generate QA overlays and HTML report. 5. Score your known coordinates against nearby candidates. 6. Export weak labels into YOLO format. 7. Train a first YOLO model on your RTX 3080 FE after you review/clean the weak labels. This is not meant to be a final truth engine on day one. It is meant to rapidly produce reviewable candidates, hard negatives, and a training set. --- ## Hardware fit Your RTX 3080 FE is enough for the first detector. Start with: - `yolov8s.pt` - `imgsz=1024` - `batch=2` or `batch=4` - `epochs=80` If you hit CUDA OOM, lower batch first. Do not lower image size below 896 too early, because the target symbols are small. 16 GB system RAM is tight for country-scale processing, but fine for per-sheet scanning. Avoid loading the whole corpus at once. This PoC scans by windows. --- ## Install locally ### Linux / WSL / Manjaro-like ```bash python -m venv .venv source .venv/bin/activate pip install --upgrade pip pip install -r requirements.txt ``` GDAL/Rasterio can be the annoying part. If `rasterio.open("*.map")` fails, install GDAL from your OS package manager, or use the Docker option below. ### GPU Docker option ```bash docker compose -f docker-compose.gpu.yml build docker compose -f docker-compose.gpu.yml run --rm bgtopo-bluebox bash ``` Inside the container: ```bash ./scripts/run_pilot.sh ``` --- ## Run the pilot ```bash ./scripts/run_pilot.sh ``` This downloads only two sheets, scans them, writes candidate CSV files, draws overlays, and builds: ```text reports/poc_report.html reports/overlays/*.png data/interim/candidates/*_candidates.csv ``` Inspect the overlays. If too many rivers/text labels are detected, tighten `configs/blue_detector.yaml`. If real blue rectangles are missed, loosen the HSV ranges and size filters. --- ## Manual one-sheet run ```bash python -m bgtopo_poc.cli inventory \ --config configs/blue_detector.yaml \ --out data/manifest.csv \ --limit 1 python -m bgtopo_poc.cli download \ --manifest data/manifest.csv \ --out-dir data/raw \ --out-manifest data/manifest_downloaded.csv \ --limit 1 python -m bgtopo_poc.cli detect \ --config configs/blue_detector.yaml \ --sheet-id K-34-009-2 \ --map data/raw/K-34-009-2/K-34-009-2.map \ --tif data/raw/K-34-009-2/K-34-009-2.tif \ --out-dir data/interim/candidates python -m bgtopo_poc.cli overlay \ --tif data/raw/K-34-009-2/K-34-009-2.tif \ --candidates data/interim/candidates/K-34-009-2_candidates.csv \ --out reports/overlays/K-34-009-2_overlay.png ``` --- ## Score your 60k coordinates Expected coordinate CSV columns: ```csv id,lat,lon,expected pt001,42.58837223,23.19638729,unknown ``` Then run: ```bash python -m bgtopo_poc.cli score-coords \ --config configs/blue_detector.yaml \ --sheet-id K-34-009-2 \ --coordinates data/coordinates/your_60k_points.csv \ --candidates data/interim/candidates/K-34-009-2_candidates.csv \ --map data/raw/K-34-009-2/K-34-009-2.map \ --tif data/raw/K-34-009-2/K-34-009-2.tif \ --out-dir data/interim/coordinate_scores \ --coord-crs EPSG:4326 ``` Extract review crops for predicted positives/review cases: ```bash python -m bgtopo_poc.cli crops \ --scores data/interim/coordinate_scores/K-34-009-2_coordinate_scores.csv \ --map data/raw/K-34-009-2/K-34-009-2.map \ --tif data/raw/K-34-009-2/K-34-009-2.tif \ --out-dir data/interim/crops/K-34-009-2 \ --crop-size 256 ``` Important: The PoC currently scores coordinates sheet-by-sheet. The next production step is assigning every point to the right sheet footprint automatically. This requires confirming that `.map` georeferencing opens correctly on your system. --- ## Export YOLO dataset After reviewing/correcting candidates, export YOLO tiles: ```bash python -m bgtopo_poc.cli export-yolo \ --config configs/blue_detector.yaml \ --sheet-id K-34-009-2 \ --tif data/raw/K-34-009-2/K-34-009-2.tif \ --candidates data/interim/candidates/K-34-009-2_candidates.csv \ --out-dir data/yolo/K-34-009-2 \ --tile-size 1024 \ --overlap 128 ``` Then train: ```bash python -m bgtopo_poc.cli train-yolo \ --data-yaml data/yolo/K-34-009-2/data.yaml \ --model yolov8s.pt \ --imgsz 1024 \ --epochs 80 \ --batch 4 \ --device 0 ``` --- ## What to improve after this PoC works 1. Add automatic sheet-footprint discovery and coordinate-to-sheet assignment. 2. Add CVAT export/import so weak labels can be corrected by hand. 3. Add hard-negative mining for rivers, lakes, blue text and blue linework. 4. Add calibrated coordinate scoring using a small sklearn model trained on reviewed points. 5. Add active learning: prioritize review crops where the model and rule detector disagree. 6. Add full-map batch inference with overlap-aware de-duplication. --- ## Output files ```text data/manifest.csv # discovered remote assets data/manifest_downloaded.csv # local paths after download data/interim/candidates/*_candidates.csv # weak detections data/interim/coordinate_scores/*.csv # coordinate-level predictions data/interim/crops/*/*.png # review crops reports/overlays/*.png # visual QA overlays reports/poc_report.html # summary report data/yolo/*/data.yaml # YOLO training dataset runs/bgtopo_bluebox/* # YOLO training runs ```