Files
garmin-img-format-parsing/README.md
2026-05-03 22:00:06 +03:00

200 lines
5.8 KiB
Markdown

# BGtopoVJ Blue Rectangle/Square Detection PoC
This is a practical first-pass pipeline for finding blue/light-blue square and rectangle symbols in BGtopoVJ raster maps, then using those detections to score a coordinate dataset and bootstrap a YOLO detector.
The PoC is intentionally hybrid:
1. Download original BGtopoVJ `*.tif` + `*.map` sheet pairs.
2. Open the raster through GDAL/Rasterio, preferring the OziExplorer `.map` sidecar when available.
3. Mine weak candidates using OpenCV HSV thresholding + contour/rectangle filters.
4. Generate QA overlays and HTML report.
5. Score your known coordinates against nearby candidates.
6. Export weak labels into YOLO format.
7. Train a first YOLO model on your RTX 3080 FE after you review/clean the weak labels.
This is not meant to be a final truth engine on day one. It is meant to rapidly produce reviewable candidates, hard negatives, and a training set.
---
## Hardware fit
Your RTX 3080 FE is enough for the first detector. Start with:
- `yolov8s.pt`
- `imgsz=1024`
- `batch=2` or `batch=4`
- `epochs=80`
If you hit CUDA OOM, lower batch first. Do not lower image size below 896 too early, because the target symbols are small.
16 GB system RAM is tight for country-scale processing, but fine for per-sheet scanning. Avoid loading the whole corpus at once. This PoC scans by windows.
---
## Install locally
### Linux / WSL / Manjaro-like
```bash
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```
GDAL/Rasterio can be the annoying part. If `rasterio.open("*.map")` fails, install GDAL from your OS package manager, or use the Docker option below.
### GPU Docker option
```bash
docker compose -f docker-compose.gpu.yml build
docker compose -f docker-compose.gpu.yml run --rm bgtopo-bluebox bash
```
Inside the container:
```bash
./scripts/run_pilot.sh
```
---
## Run the pilot
```bash
./scripts/run_pilot.sh
```
This downloads only two sheets, scans them, writes candidate CSV files, draws overlays, and builds:
```text
reports/poc_report.html
reports/overlays/*.png
data/interim/candidates/*_candidates.csv
```
Inspect the overlays. If too many rivers/text labels are detected, tighten `configs/blue_detector.yaml`. If real blue rectangles are missed, loosen the HSV ranges and size filters.
---
## Manual one-sheet run
```bash
python -m bgtopo_poc.cli inventory \
--config configs/blue_detector.yaml \
--out data/manifest.csv \
--limit 1
python -m bgtopo_poc.cli download \
--manifest data/manifest.csv \
--out-dir data/raw \
--out-manifest data/manifest_downloaded.csv \
--limit 1
python -m bgtopo_poc.cli detect \
--config configs/blue_detector.yaml \
--sheet-id K-34-009-2 \
--map data/raw/K-34-009-2/K-34-009-2.map \
--tif data/raw/K-34-009-2/K-34-009-2.tif \
--out-dir data/interim/candidates
python -m bgtopo_poc.cli overlay \
--tif data/raw/K-34-009-2/K-34-009-2.tif \
--candidates data/interim/candidates/K-34-009-2_candidates.csv \
--out reports/overlays/K-34-009-2_overlay.png
```
---
## Score your 60k coordinates
Expected coordinate CSV columns:
```csv
id,lat,lon,expected
pt001,42.58837223,23.19638729,unknown
```
Then run:
```bash
python -m bgtopo_poc.cli score-coords \
--config configs/blue_detector.yaml \
--sheet-id K-34-009-2 \
--coordinates data/coordinates/your_60k_points.csv \
--candidates data/interim/candidates/K-34-009-2_candidates.csv \
--map data/raw/K-34-009-2/K-34-009-2.map \
--tif data/raw/K-34-009-2/K-34-009-2.tif \
--out-dir data/interim/coordinate_scores \
--coord-crs EPSG:4326
```
Extract review crops for predicted positives/review cases:
```bash
python -m bgtopo_poc.cli crops \
--scores data/interim/coordinate_scores/K-34-009-2_coordinate_scores.csv \
--map data/raw/K-34-009-2/K-34-009-2.map \
--tif data/raw/K-34-009-2/K-34-009-2.tif \
--out-dir data/interim/crops/K-34-009-2 \
--crop-size 256
```
Important: The PoC currently scores coordinates sheet-by-sheet. The next production step is assigning every point to the right sheet footprint automatically. This requires confirming that `.map` georeferencing opens correctly on your system.
---
## Export YOLO dataset
After reviewing/correcting candidates, export YOLO tiles:
```bash
python -m bgtopo_poc.cli export-yolo \
--config configs/blue_detector.yaml \
--sheet-id K-34-009-2 \
--tif data/raw/K-34-009-2/K-34-009-2.tif \
--candidates data/interim/candidates/K-34-009-2_candidates.csv \
--out-dir data/yolo/K-34-009-2 \
--tile-size 1024 \
--overlap 128
```
Then train:
```bash
python -m bgtopo_poc.cli train-yolo \
--data-yaml data/yolo/K-34-009-2/data.yaml \
--model yolov8s.pt \
--imgsz 1024 \
--epochs 80 \
--batch 4 \
--device 0
```
---
## What to improve after this PoC works
1. Add automatic sheet-footprint discovery and coordinate-to-sheet assignment.
2. Add CVAT export/import so weak labels can be corrected by hand.
3. Add hard-negative mining for rivers, lakes, blue text and blue linework.
4. Add calibrated coordinate scoring using a small sklearn model trained on reviewed points.
5. Add active learning: prioritize review crops where the model and rule detector disagree.
6. Add full-map batch inference with overlap-aware de-duplication.
---
## Output files
```text
data/manifest.csv # discovered remote assets
data/manifest_downloaded.csv # local paths after download
data/interim/candidates/*_candidates.csv # weak detections
data/interim/coordinate_scores/*.csv # coordinate-level predictions
data/interim/crops/*/*.png # review crops
reports/overlays/*.png # visual QA overlays
reports/poc_report.html # summary report
data/yolo/*/data.yaml # YOLO training dataset
runs/bgtopo_bluebox/* # YOLO training runs
```