200 lines
5.8 KiB
Markdown
200 lines
5.8 KiB
Markdown
# BGtopoVJ Blue Rectangle/Square Detection PoC
|
|
|
|
This is a practical first-pass pipeline for finding blue/light-blue square and rectangle symbols in BGtopoVJ raster maps, then using those detections to score a coordinate dataset and bootstrap a YOLO detector.
|
|
|
|
The PoC is intentionally hybrid:
|
|
|
|
1. Download original BGtopoVJ `*.tif` + `*.map` sheet pairs.
|
|
2. Open the raster through GDAL/Rasterio, preferring the OziExplorer `.map` sidecar when available.
|
|
3. Mine weak candidates using OpenCV HSV thresholding + contour/rectangle filters.
|
|
4. Generate QA overlays and HTML report.
|
|
5. Score your known coordinates against nearby candidates.
|
|
6. Export weak labels into YOLO format.
|
|
7. Train a first YOLO model on your RTX 3080 FE after you review/clean the weak labels.
|
|
|
|
This is not meant to be a final truth engine on day one. It is meant to rapidly produce reviewable candidates, hard negatives, and a training set.
|
|
|
|
---
|
|
|
|
## Hardware fit
|
|
|
|
Your RTX 3080 FE is enough for the first detector. Start with:
|
|
|
|
- `yolov8s.pt`
|
|
- `imgsz=1024`
|
|
- `batch=2` or `batch=4`
|
|
- `epochs=80`
|
|
|
|
If you hit CUDA OOM, lower batch first. Do not lower image size below 896 too early, because the target symbols are small.
|
|
|
|
16 GB system RAM is tight for country-scale processing, but fine for per-sheet scanning. Avoid loading the whole corpus at once. This PoC scans by windows.
|
|
|
|
---
|
|
|
|
## Install locally
|
|
|
|
### Linux / WSL / Manjaro-like
|
|
|
|
```bash
|
|
python -m venv .venv
|
|
source .venv/bin/activate
|
|
pip install --upgrade pip
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
GDAL/Rasterio can be the annoying part. If `rasterio.open("*.map")` fails, install GDAL from your OS package manager, or use the Docker option below.
|
|
|
|
### GPU Docker option
|
|
|
|
```bash
|
|
docker compose -f docker-compose.gpu.yml build
|
|
docker compose -f docker-compose.gpu.yml run --rm bgtopo-bluebox bash
|
|
```
|
|
|
|
Inside the container:
|
|
|
|
```bash
|
|
./scripts/run_pilot.sh
|
|
```
|
|
|
|
---
|
|
|
|
## Run the pilot
|
|
|
|
```bash
|
|
./scripts/run_pilot.sh
|
|
```
|
|
|
|
This downloads only two sheets, scans them, writes candidate CSV files, draws overlays, and builds:
|
|
|
|
```text
|
|
reports/poc_report.html
|
|
reports/overlays/*.png
|
|
data/interim/candidates/*_candidates.csv
|
|
```
|
|
|
|
Inspect the overlays. If too many rivers/text labels are detected, tighten `configs/blue_detector.yaml`. If real blue rectangles are missed, loosen the HSV ranges and size filters.
|
|
|
|
---
|
|
|
|
## Manual one-sheet run
|
|
|
|
```bash
|
|
python -m bgtopo_poc.cli inventory \
|
|
--config configs/blue_detector.yaml \
|
|
--out data/manifest.csv \
|
|
--limit 1
|
|
|
|
python -m bgtopo_poc.cli download \
|
|
--manifest data/manifest.csv \
|
|
--out-dir data/raw \
|
|
--out-manifest data/manifest_downloaded.csv \
|
|
--limit 1
|
|
|
|
python -m bgtopo_poc.cli detect \
|
|
--config configs/blue_detector.yaml \
|
|
--sheet-id K-34-009-2 \
|
|
--map data/raw/K-34-009-2/K-34-009-2.map \
|
|
--tif data/raw/K-34-009-2/K-34-009-2.tif \
|
|
--out-dir data/interim/candidates
|
|
|
|
python -m bgtopo_poc.cli overlay \
|
|
--tif data/raw/K-34-009-2/K-34-009-2.tif \
|
|
--candidates data/interim/candidates/K-34-009-2_candidates.csv \
|
|
--out reports/overlays/K-34-009-2_overlay.png
|
|
```
|
|
|
|
---
|
|
|
|
## Score your 60k coordinates
|
|
|
|
Expected coordinate CSV columns:
|
|
|
|
```csv
|
|
id,lat,lon,expected
|
|
pt001,42.58837223,23.19638729,unknown
|
|
```
|
|
|
|
Then run:
|
|
|
|
```bash
|
|
python -m bgtopo_poc.cli score-coords \
|
|
--config configs/blue_detector.yaml \
|
|
--sheet-id K-34-009-2 \
|
|
--coordinates data/coordinates/your_60k_points.csv \
|
|
--candidates data/interim/candidates/K-34-009-2_candidates.csv \
|
|
--map data/raw/K-34-009-2/K-34-009-2.map \
|
|
--tif data/raw/K-34-009-2/K-34-009-2.tif \
|
|
--out-dir data/interim/coordinate_scores \
|
|
--coord-crs EPSG:4326
|
|
```
|
|
|
|
Extract review crops for predicted positives/review cases:
|
|
|
|
```bash
|
|
python -m bgtopo_poc.cli crops \
|
|
--scores data/interim/coordinate_scores/K-34-009-2_coordinate_scores.csv \
|
|
--map data/raw/K-34-009-2/K-34-009-2.map \
|
|
--tif data/raw/K-34-009-2/K-34-009-2.tif \
|
|
--out-dir data/interim/crops/K-34-009-2 \
|
|
--crop-size 256
|
|
```
|
|
|
|
Important: The PoC currently scores coordinates sheet-by-sheet. The next production step is assigning every point to the right sheet footprint automatically. This requires confirming that `.map` georeferencing opens correctly on your system.
|
|
|
|
---
|
|
|
|
## Export YOLO dataset
|
|
|
|
After reviewing/correcting candidates, export YOLO tiles:
|
|
|
|
```bash
|
|
python -m bgtopo_poc.cli export-yolo \
|
|
--config configs/blue_detector.yaml \
|
|
--sheet-id K-34-009-2 \
|
|
--tif data/raw/K-34-009-2/K-34-009-2.tif \
|
|
--candidates data/interim/candidates/K-34-009-2_candidates.csv \
|
|
--out-dir data/yolo/K-34-009-2 \
|
|
--tile-size 1024 \
|
|
--overlap 128
|
|
```
|
|
|
|
Then train:
|
|
|
|
```bash
|
|
python -m bgtopo_poc.cli train-yolo \
|
|
--data-yaml data/yolo/K-34-009-2/data.yaml \
|
|
--model yolov8s.pt \
|
|
--imgsz 1024 \
|
|
--epochs 80 \
|
|
--batch 4 \
|
|
--device 0
|
|
```
|
|
|
|
---
|
|
|
|
## What to improve after this PoC works
|
|
|
|
1. Add automatic sheet-footprint discovery and coordinate-to-sheet assignment.
|
|
2. Add CVAT export/import so weak labels can be corrected by hand.
|
|
3. Add hard-negative mining for rivers, lakes, blue text and blue linework.
|
|
4. Add calibrated coordinate scoring using a small sklearn model trained on reviewed points.
|
|
5. Add active learning: prioritize review crops where the model and rule detector disagree.
|
|
6. Add full-map batch inference with overlap-aware de-duplication.
|
|
|
|
---
|
|
|
|
## Output files
|
|
|
|
```text
|
|
data/manifest.csv # discovered remote assets
|
|
data/manifest_downloaded.csv # local paths after download
|
|
data/interim/candidates/*_candidates.csv # weak detections
|
|
data/interim/coordinate_scores/*.csv # coordinate-level predictions
|
|
data/interim/crops/*/*.png # review crops
|
|
reports/overlays/*.png # visual QA overlays
|
|
reports/poc_report.html # summary report
|
|
data/yolo/*/data.yaml # YOLO training dataset
|
|
runs/bgtopo_bluebox/* # YOLO training runs
|
|
```
|