rnd-v1.2
This commit is contained in:
199
README.md
Normal file
199
README.md
Normal file
@@ -0,0 +1,199 @@
|
||||
# BGtopoVJ Blue Rectangle/Square Detection PoC
|
||||
|
||||
This is a practical first-pass pipeline for finding blue/light-blue square and rectangle symbols in BGtopoVJ raster maps, then using those detections to score a coordinate dataset and bootstrap a YOLO detector.
|
||||
|
||||
The PoC is intentionally hybrid:
|
||||
|
||||
1. Download original BGtopoVJ `*.tif` + `*.map` sheet pairs.
|
||||
2. Open the raster through GDAL/Rasterio, preferring the OziExplorer `.map` sidecar when available.
|
||||
3. Mine weak candidates using OpenCV HSV thresholding + contour/rectangle filters.
|
||||
4. Generate QA overlays and HTML report.
|
||||
5. Score your known coordinates against nearby candidates.
|
||||
6. Export weak labels into YOLO format.
|
||||
7. Train a first YOLO model on your RTX 3080 FE after you review/clean the weak labels.
|
||||
|
||||
This is not meant to be a final truth engine on day one. It is meant to rapidly produce reviewable candidates, hard negatives, and a training set.
|
||||
|
||||
---
|
||||
|
||||
## Hardware fit
|
||||
|
||||
Your RTX 3080 FE is enough for the first detector. Start with:
|
||||
|
||||
- `yolov8s.pt`
|
||||
- `imgsz=1024`
|
||||
- `batch=2` or `batch=4`
|
||||
- `epochs=80`
|
||||
|
||||
If you hit CUDA OOM, lower batch first. Do not lower image size below 896 too early, because the target symbols are small.
|
||||
|
||||
16 GB system RAM is tight for country-scale processing, but fine for per-sheet scanning. Avoid loading the whole corpus at once. This PoC scans by windows.
|
||||
|
||||
---
|
||||
|
||||
## Install locally
|
||||
|
||||
### Linux / WSL / Manjaro-like
|
||||
|
||||
```bash
|
||||
python -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install --upgrade pip
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
GDAL/Rasterio can be the annoying part. If `rasterio.open("*.map")` fails, install GDAL from your OS package manager, or use the Docker option below.
|
||||
|
||||
### GPU Docker option
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.gpu.yml build
|
||||
docker compose -f docker-compose.gpu.yml run --rm bgtopo-bluebox bash
|
||||
```
|
||||
|
||||
Inside the container:
|
||||
|
||||
```bash
|
||||
./scripts/run_pilot.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Run the pilot
|
||||
|
||||
```bash
|
||||
./scripts/run_pilot.sh
|
||||
```
|
||||
|
||||
This downloads only two sheets, scans them, writes candidate CSV files, draws overlays, and builds:
|
||||
|
||||
```text
|
||||
reports/poc_report.html
|
||||
reports/overlays/*.png
|
||||
data/interim/candidates/*_candidates.csv
|
||||
```
|
||||
|
||||
Inspect the overlays. If too many rivers/text labels are detected, tighten `configs/blue_detector.yaml`. If real blue rectangles are missed, loosen the HSV ranges and size filters.
|
||||
|
||||
---
|
||||
|
||||
## Manual one-sheet run
|
||||
|
||||
```bash
|
||||
python -m bgtopo_poc.cli inventory \
|
||||
--config configs/blue_detector.yaml \
|
||||
--out data/manifest.csv \
|
||||
--limit 1
|
||||
|
||||
python -m bgtopo_poc.cli download \
|
||||
--manifest data/manifest.csv \
|
||||
--out-dir data/raw \
|
||||
--out-manifest data/manifest_downloaded.csv \
|
||||
--limit 1
|
||||
|
||||
python -m bgtopo_poc.cli detect \
|
||||
--config configs/blue_detector.yaml \
|
||||
--sheet-id K-34-009-2 \
|
||||
--map data/raw/K-34-009-2/K-34-009-2.map \
|
||||
--tif data/raw/K-34-009-2/K-34-009-2.tif \
|
||||
--out-dir data/interim/candidates
|
||||
|
||||
python -m bgtopo_poc.cli overlay \
|
||||
--tif data/raw/K-34-009-2/K-34-009-2.tif \
|
||||
--candidates data/interim/candidates/K-34-009-2_candidates.csv \
|
||||
--out reports/overlays/K-34-009-2_overlay.png
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Score your 60k coordinates
|
||||
|
||||
Expected coordinate CSV columns:
|
||||
|
||||
```csv
|
||||
id,lat,lon,expected
|
||||
pt001,42.58837223,23.19638729,unknown
|
||||
```
|
||||
|
||||
Then run:
|
||||
|
||||
```bash
|
||||
python -m bgtopo_poc.cli score-coords \
|
||||
--config configs/blue_detector.yaml \
|
||||
--sheet-id K-34-009-2 \
|
||||
--coordinates data/coordinates/your_60k_points.csv \
|
||||
--candidates data/interim/candidates/K-34-009-2_candidates.csv \
|
||||
--map data/raw/K-34-009-2/K-34-009-2.map \
|
||||
--tif data/raw/K-34-009-2/K-34-009-2.tif \
|
||||
--out-dir data/interim/coordinate_scores \
|
||||
--coord-crs EPSG:4326
|
||||
```
|
||||
|
||||
Extract review crops for predicted positives/review cases:
|
||||
|
||||
```bash
|
||||
python -m bgtopo_poc.cli crops \
|
||||
--scores data/interim/coordinate_scores/K-34-009-2_coordinate_scores.csv \
|
||||
--map data/raw/K-34-009-2/K-34-009-2.map \
|
||||
--tif data/raw/K-34-009-2/K-34-009-2.tif \
|
||||
--out-dir data/interim/crops/K-34-009-2 \
|
||||
--crop-size 256
|
||||
```
|
||||
|
||||
Important: The PoC currently scores coordinates sheet-by-sheet. The next production step is assigning every point to the right sheet footprint automatically. This requires confirming that `.map` georeferencing opens correctly on your system.
|
||||
|
||||
---
|
||||
|
||||
## Export YOLO dataset
|
||||
|
||||
After reviewing/correcting candidates, export YOLO tiles:
|
||||
|
||||
```bash
|
||||
python -m bgtopo_poc.cli export-yolo \
|
||||
--config configs/blue_detector.yaml \
|
||||
--sheet-id K-34-009-2 \
|
||||
--tif data/raw/K-34-009-2/K-34-009-2.tif \
|
||||
--candidates data/interim/candidates/K-34-009-2_candidates.csv \
|
||||
--out-dir data/yolo/K-34-009-2 \
|
||||
--tile-size 1024 \
|
||||
--overlap 128
|
||||
```
|
||||
|
||||
Then train:
|
||||
|
||||
```bash
|
||||
python -m bgtopo_poc.cli train-yolo \
|
||||
--data-yaml data/yolo/K-34-009-2/data.yaml \
|
||||
--model yolov8s.pt \
|
||||
--imgsz 1024 \
|
||||
--epochs 80 \
|
||||
--batch 4 \
|
||||
--device 0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## What to improve after this PoC works
|
||||
|
||||
1. Add automatic sheet-footprint discovery and coordinate-to-sheet assignment.
|
||||
2. Add CVAT export/import so weak labels can be corrected by hand.
|
||||
3. Add hard-negative mining for rivers, lakes, blue text and blue linework.
|
||||
4. Add calibrated coordinate scoring using a small sklearn model trained on reviewed points.
|
||||
5. Add active learning: prioritize review crops where the model and rule detector disagree.
|
||||
6. Add full-map batch inference with overlap-aware de-duplication.
|
||||
|
||||
---
|
||||
|
||||
## Output files
|
||||
|
||||
```text
|
||||
data/manifest.csv # discovered remote assets
|
||||
data/manifest_downloaded.csv # local paths after download
|
||||
data/interim/candidates/*_candidates.csv # weak detections
|
||||
data/interim/coordinate_scores/*.csv # coordinate-level predictions
|
||||
data/interim/crops/*/*.png # review crops
|
||||
reports/overlays/*.png # visual QA overlays
|
||||
reports/poc_report.html # summary report
|
||||
data/yolo/*/data.yaml # YOLO training dataset
|
||||
runs/bgtopo_bluebox/* # YOLO training runs
|
||||
```
|
||||
Reference in New Issue
Block a user