BGtopoVJ Blue Rectangle/Square Detection PoC
This is a practical first-pass pipeline for finding blue/light-blue square and rectangle symbols in BGtopoVJ raster maps, then using those detections to score a coordinate dataset and bootstrap a YOLO detector.
The PoC is intentionally hybrid:
- Download original BGtopoVJ
*.tif+*.mapsheet pairs. - Open the raster through GDAL/Rasterio, preferring the OziExplorer
.mapsidecar when available. - Mine weak candidates using OpenCV HSV thresholding + contour/rectangle filters.
- Generate QA overlays and HTML report.
- Score your known coordinates against nearby candidates.
- Export weak labels into YOLO format.
- Train a first YOLO model on your RTX 3080 FE after you review/clean the weak labels.
This is not meant to be a final truth engine on day one. It is meant to rapidly produce reviewable candidates, hard negatives, and a training set.
Hardware fit
Your RTX 3080 FE is enough for the first detector. Start with:
yolov8s.ptimgsz=1024batch=2orbatch=4epochs=80
If you hit CUDA OOM, lower batch first. Do not lower image size below 896 too early, because the target symbols are small.
16 GB system RAM is tight for country-scale processing, but fine for per-sheet scanning. Avoid loading the whole corpus at once. This PoC scans by windows.
Install locally
Linux / WSL / Manjaro-like
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
GDAL/Rasterio can be the annoying part. If rasterio.open("*.map") fails, install GDAL from your OS package manager, or use the Docker option below.
GPU Docker option
docker compose -f docker-compose.gpu.yml build
docker compose -f docker-compose.gpu.yml run --rm bgtopo-bluebox bash
Inside the container:
./scripts/run_pilot.sh
Run the pilot
./scripts/run_pilot.sh
This downloads only two sheets, scans them, writes candidate CSV files, draws overlays, and builds:
reports/poc_report.html
reports/overlays/*.png
data/interim/candidates/*_candidates.csv
Inspect the overlays. If too many rivers/text labels are detected, tighten configs/blue_detector.yaml. If real blue rectangles are missed, loosen the HSV ranges and size filters.
Manual one-sheet run
python -m bgtopo_poc.cli inventory \
--config configs/blue_detector.yaml \
--out data/manifest.csv \
--limit 1
python -m bgtopo_poc.cli download \
--manifest data/manifest.csv \
--out-dir data/raw \
--out-manifest data/manifest_downloaded.csv \
--limit 1
python -m bgtopo_poc.cli detect \
--config configs/blue_detector.yaml \
--sheet-id K-34-009-2 \
--map data/raw/K-34-009-2/K-34-009-2.map \
--tif data/raw/K-34-009-2/K-34-009-2.tif \
--out-dir data/interim/candidates
python -m bgtopo_poc.cli overlay \
--tif data/raw/K-34-009-2/K-34-009-2.tif \
--candidates data/interim/candidates/K-34-009-2_candidates.csv \
--out reports/overlays/K-34-009-2_overlay.png
Score your 60k coordinates
Expected coordinate CSV columns:
id,lat,lon,expected
pt001,42.58837223,23.19638729,unknown
Then run:
python -m bgtopo_poc.cli score-coords \
--config configs/blue_detector.yaml \
--sheet-id K-34-009-2 \
--coordinates data/coordinates/your_60k_points.csv \
--candidates data/interim/candidates/K-34-009-2_candidates.csv \
--map data/raw/K-34-009-2/K-34-009-2.map \
--tif data/raw/K-34-009-2/K-34-009-2.tif \
--out-dir data/interim/coordinate_scores \
--coord-crs EPSG:4326
Extract review crops for predicted positives/review cases:
python -m bgtopo_poc.cli crops \
--scores data/interim/coordinate_scores/K-34-009-2_coordinate_scores.csv \
--map data/raw/K-34-009-2/K-34-009-2.map \
--tif data/raw/K-34-009-2/K-34-009-2.tif \
--out-dir data/interim/crops/K-34-009-2 \
--crop-size 256
Important: The PoC currently scores coordinates sheet-by-sheet. The next production step is assigning every point to the right sheet footprint automatically. This requires confirming that .map georeferencing opens correctly on your system.
Export YOLO dataset
After reviewing/correcting candidates, export YOLO tiles:
python -m bgtopo_poc.cli export-yolo \
--config configs/blue_detector.yaml \
--sheet-id K-34-009-2 \
--tif data/raw/K-34-009-2/K-34-009-2.tif \
--candidates data/interim/candidates/K-34-009-2_candidates.csv \
--out-dir data/yolo/K-34-009-2 \
--tile-size 1024 \
--overlap 128
Then train:
python -m bgtopo_poc.cli train-yolo \
--data-yaml data/yolo/K-34-009-2/data.yaml \
--model yolov8s.pt \
--imgsz 1024 \
--epochs 80 \
--batch 4 \
--device 0
What to improve after this PoC works
- Add automatic sheet-footprint discovery and coordinate-to-sheet assignment.
- Add CVAT export/import so weak labels can be corrected by hand.
- Add hard-negative mining for rivers, lakes, blue text and blue linework.
- Add calibrated coordinate scoring using a small sklearn model trained on reviewed points.
- Add active learning: prioritize review crops where the model and rule detector disagree.
- Add full-map batch inference with overlap-aware de-duplication.
Output files
data/manifest.csv # discovered remote assets
data/manifest_downloaded.csv # local paths after download
data/interim/candidates/*_candidates.csv # weak detections
data/interim/coordinate_scores/*.csv # coordinate-level predictions
data/interim/crops/*/*.png # review crops
reports/overlays/*.png # visual QA overlays
reports/poc_report.html # summary report
data/yolo/*/data.yaml # YOLO training dataset
runs/bgtopo_bluebox/* # YOLO training runs