opthq/garmin-img-format-parsing

Fork 0

Go to file

nq 50185a8050 rnd-v1.1

2026-05-03 21:59:22 +03:00

bgtopo_poc

rnd-v1

2026-05-03 21:58:47 +03:00

configs

rnd-v1

2026-05-03 21:58:47 +03:00

data/coordinates

rnd-v1

2026-05-03 21:58:47 +03:00

docker

rnd-v1

2026-05-03 21:58:47 +03:00

scripts

rnd-v1

2026-05-03 21:58:47 +03:00

deep-research-v1.md

rnd-v1.1

2026-05-03 21:59:22 +03:00

docker-compose.gpu.yml

rnd-v1

2026-05-03 21:58:47 +03:00

pyproject.toml

rnd-v1

2026-05-03 21:58:47 +03:00

readme.md

init-research

2026-05-03 21:49:36 +03:00

requirements.txt

rnd-v1

2026-05-03 21:58:47 +03:00

readme.md

Pipeline design for detecting blue rectangles in BGtopoVJ and scoring 60k coordinates

Executive summary

The most efficient pipeline is hybrid and source-aware, not “pure deep learning from day one.” The prioritized source pass suggests a strong split of responsibilities: use the per-sheet original raster files as the canonical training source; use the Garmin/vector products as a possible weak-label source if they can be decompiled or converted; use the processed and merged raster products for fast visual QA and large-area scanning; and train the final coordinate-level predictor on georeferenced centered crops plus detector-derived features. That recommendation comes directly from the BGtopoVJ format inventory, the site’s own build/toolchain references, and the GitHub tooling that already exists for Polish .mp, GeoJSON conversion, ECW access, raster tiling, and georeferencing. citeturn22view0turn28view0turn31search1turn19view0turn19view1turn19view2turn19view3turn19view4turn39search0

The best canonical source for training is the original per-sheet TIFF plus its .map sidecar, not the mobile/tiled variants. The university mirror states that the underlying data are 1:50,000 Soviet army raster TIF images, scanned at 250 dpi, 8-bit, with an effective ~5.5 m/pixel resolution, over 544 sheets and 3240 tiles. Per-sheet pages expose four downloadable items for each sheet: original raster image, processed raster image, Garmin vector map, and processed raster for MapNav; the original raster directory shows that each sheet is paired with a .tif image and a .map calibration file. citeturn22view0turn24search1turn28view0turn31search1

The best first production model is not a single end-to-end detector. It is a staged stack: OpenCV rules for candidate mining and hard-negative discovery, YOLOv8s or RetinaNet for scalable symbol detection, Mask R-CNN or a small U-Net only if fill-style discrimination remains weak, and then a calibrated coordinate scorer that converts local detections into the final “should this coordinate have a blue square/rectangle?” probability. The detector literature, official training docs, and calibration literature all support this structure: one-stage models are efficient, RetinaNet explicitly addresses foreground/background imbalance with focal loss, Mask R-CNN adds instance masks with modest overhead, U-Net remains strong for small-mask segmentation with limited data, and temperature scaling is still one of the simplest reliable ways to calibrate probabilities. citeturn40search0turn40search5turn40search6turn41search0turn41search4turn41search6turn42search8turn43search6turn43search8turn43search2turn45search12turn46search4

A separate pass over adspem.com did not surface actionable BGtopoVJ-specific technical material during this research run, and a direct access attempt timed out. In practical terms, that means the design below should be grounded in the much more concrete evidence from entity["company","GitHub","code hosting platform"], the mirrored BGtopoVJ site hosted by the CART Lab at entity["organization","University of Plovdiv Paisii Hilendarski","plovdiv bulgaria"], and the official documentation/original papers that describe the GIS and ML stack. citeturn7view0turn22view0

Source triage and acquisition

A pass over GitHub did not reveal a turnkey “detect blue BGtopoVJ squares” repository, but it did reveal several building blocks that materially change the project design. The highest-value repositories are: a QGIS Polish-format plugin that can import and export .mp files, an MP-to-GeoJSON converter, GDAL-based raster tiling/merging scripts, a rio-tiler wrapper around Rasterio/GDAL for point/part/tile reads, an ECW GDAL plugin helper, and MapWarper for map rectification/export to GeoTIFF/WMS/tiles. Those tools are exactly the pieces you need if you want a vector-assisted weak-label branch, georeferenced tile extraction, full-area scanning, and human QA over dedicated validation areas. citeturn19view0turn19view1turn19view2turn19view3turn19view4turn39search0turn39search1

The BGtopoVJ mirror is the authoritative source for map formats and technical metadata. It lists support for Garmin GPSr, Garmin MapSource/BaseCamp, ECW, JNX, MapNav, OsmAnd / Locus Map, OruxMaps, NaviComputer, WWW, and WMS. It also states that the project goal was to make popular Soviet military raster maps free to use on Garmin-compatible devices, and that the workflow used tools such as Mapwel, GMapTool, Global Mapper, and cGPSmapper. citeturn22view0turn17search0

The site’s download section is unusually informative and is directly useful for planning source ingestion. The merged ECW product is georeferenced in UTM zone 35 / WGS84; the merged JNX product is Geographic / WGS84; the MapNav version is Mercator and loads *.mno or *mnm raster maps; the OsmAnd/Locus and OruxMaps products expose pre-tiled zoom pyramids; and the NaviComputer product is a .nmap archive. The page also exposes full archive sizes, which matter for storage planning, and notes that the mobile/offline variants use zoom levels roughly in the 6–14 range. citeturn22view0turn24search1

The per-sheet pages matter even more than the merged downloads because they expose the data model you want to automate against. Each sheet page lists original raster image, processed raster image, vector map suitable for Garmin GPSr devices, and processed raster image for MapNav. The page icons and link behavior indicate that the original and processed raster products are TIFF-based, the Garmin map is IMG-based, and the MapNav product is packaged separately. The original raster directory listing further shows the crucial pairing of .tif image + .map calibration file, and it also exposes a 100k/ subdirectory, which means your ingestion manifest must explicitly separate 1:50k and 1:100k materials to avoid symbol-scale leakage. citeturn28view0turn29view0turn29view1turn29view2turn29view3turn31search1

That .map sidecar is not just a nuisance file; it is an advantage. Official OziExplorer documentation says a .map file stores the image path, datum, projection, and calibration/georeferencing information, and GDAL has a built-in MAP / OziExplorer raster driver with georeferencing support. In other words, a per-sheet BGtopoVJ “original raster” is a georeferenceable pair you can parse directly into a robust coordinate-to-pixel pipeline. citeturn37search0turn37search1turn37search2

The final source-triage conclusion is important: start with original TIFF+MAP, not with ECW and not with mobile tile packages, and treat the Garmin/vector branch as a high-upside auxiliary path. The reason is that the BGtopoVJ site explicitly says the separate vector archives are based on BGtopoVJ-v2.00 data, while the merged raster products and the current public release are v3.00. That makes vector-derived features or pseudo-labels very useful, but not safe as the sole source of truth unless you version-track them carefully. citeturn22view0

All format, projection, and packaging details in the following table come from the BGtopoVJ mirror and its per-sheet pages. citeturn22view0turn28view0turn31search1

Source product	What you actually get	Why it matters	Recommended use
Original per-sheet raster	`sheet.tif` + `sheet.map`	Canonical scan, per-sheet georeferencing, closest to source symbology	Primary training source
Processed per-sheet raster	TIFF-based processed raster	Cleaner colors and less noise than original	Weak-label mining, annotator QA
Garmin vector sheet	IMG-based vector map	Potential vector-assisted pseudo-labels or semantic cross-checks	Auxiliary branch only
Merged ECW	Country-scale georeferenced raster, UTM 35 / WGS84	Fast large-area scanning and QA mosaic	Secondary inference source
Merged JNX	Country-scale georeferenced raster, Geographic / WGS84	Easier coordinate lookup for some GPS workflows	Secondary/QA only
MapNav	Mercator raster package	Existing mobile-ready pyramid	Optional QA source
OsmAnd / Locus	Tiled zoom-pyramid package	Easy casual browsing, but resampled	Human review only
OruxMaps	Tiled zoom-pyramid package	Useful for spot-checking by area	Human review only
NaviComputer	`.nmap` archive	Offline browsing	Human review only
WWW / WMS	Web-served rendering	Useful for dedicated-area validation dashboards	QA and visualization

Data engineering and labeling design

The central engineering decision is to make the per-sheet original TIFF+MAP pair your canonical object space, and to preserve that provenance all the way through the pipeline. The BGtopoVJ changelog says later releases added despeckling, improved contrast, improved georeferencing precision, improved and harmonized resolution, reduced tile counts, and a harmonized 32-color palette. Those improvements are useful operationally, but they are exactly why you should not silently mix “original,” “processed,” “merged ECW,” and “mobile tile” imagery in one training pool without a source_format and source_version field. citeturn22view0

For georeferencing and coordinate alignment, use the .map sidecar through GDAL’s MAP driver or by parsing the OziExplorer file directly, then expose the transform through Rasterio. Rasterio’s transform utilities provide xy() and rowcol() conversions between pixel and projected/geographic coordinates, while its windowed-reading APIs and window transforms let you extract georeferenced crop windows without loading whole sheets into memory. That is the right primitive for both map-wide scanning and centered-crop extraction around the 60k coordinates. citeturn37search0turn37search1turn47search1turn47search3

A robust ingestion pipeline should therefore do five things before any ML starts. First, crawl the original-raster directory and per-sheet pages into a manifest keyed by sheet_id, scale, image_path, map_path, bounds, projection, edition_year, and source_format. Second, reject or quarantine the 100k/ materials until the 1:50k pipeline is stable. Third, convert all 60k coordinates into a single normalized CRS and then into each candidate sheet’s CRS. Fourth, assign each coordinate to one or more overlapping sheet footprints. Fifth, extract multi-scale crop windows around each candidate coordinate from the canonical source, not from screenshots or WMS renderings. citeturn31search1turn37search0turn47search1turn47search3

flowchart LR
    A[Inventory BGtopoVJ files] --> B[Parse TIFF plus MAP georeferencing]
    B --> C[Build sheet footprints and manifest]
    C --> D[Align 60k coordinates to sheets]
    D --> E[Extract centered multi-scale crops]
    E --> F[Weak-label candidates from color, shape, vectors]
    F --> G[Human review in CVAT or Label Studio]
    G --> H[Train detector and optional segmenter]
    H --> I[Train calibrated coordinate scorer]
    I --> J[Dedicated-area validation]
    J --> K[Expand dataset and rescan corpus]

For annotation, use a hierarchical schema, not a flat explosion of combined classes. Your task description naturally decomposes into geometry, fill style, and color shade; training will be much easier if you keep those separate at the data level even if the first-generation model collapses them into a smaller set. The site itself says BGtopoVJ uses standard Soviet military map symbols, and it points to the English-language Soviet Topographic Map Symbols manual as a reference. That is a strong reason to keep room for semantic expansion later, even if you begin with purely visual classes. citeturn22view0turn21search1turn21search10

Annotation field	Suggested values	Why it helps
`presence`	`positive`, `negative`, `uncertain`	Needed for the final 60k-coordinate decision task
`shape`	`square`, `rectangle`, `unknown_quad`	Preserves geometry without overfitting early
`fill_style`	`filled`, `hollow`, `border_emphasis`, `unknown_fill`	Handles your filled/empty/bordered requirement cleanly
`shade_bin`	`dark_blue`, `blue`, `light_blue`, `cyan_other`	Supports later color-robustness studies
`annotation_type`	`bbox`, `polygon`, `mask`, `weak_label`	Tracks label reliability
`source_format`	`orig_tif_map`, `proc_tif_map`, `ecw`, `img_vector`	Prevents silent domain mixing
`review_status`	`weak`, `verified`, `rejected`, `border_case`	Critical for auditing weak labels
`uncertain_reason`	`tile_edge`, `sheet_edge`, `low_contrast`, `blue_non_target`, `vector_mismatch`, `other`	Makes the error-analysis loop efficient

A good review overlay should make class distinctions obvious to annotators and reviewers. An illustrative overlay convention is below.

Example review overlay on a centered crop

┌─────────────────────────────────────────────┐
│ map background                              │
│                                             │
│   ┌──────┐                                  │
│   │██████│   cyan solid box  = filled       │
│   └──────┘                                  │
│                                             │
│                ┌──────┐                     │
│       ○        │      │   blue outline = hollow
│    coordinate  └──────┘                     │
│                                             │
│   ┏━━━━━━━━┓                                 │
│   ┃        ┃   navy thick outline = border  │
│   ┗━━━━━━━━┛                                 │
│                                             │
│   - - - dashed yellow box = uncertain       │
└─────────────────────────────────────────────┘

For label generation, do not start with manual boxing of all 60k points. Start with weak labels. The cheapest first pass is a color-and-shape detector in OpenCV: convert to HSV or Lab, threshold the blue/light-blue ranges, clean masks with morphology, recover contours, approximate them to quadrilaterals, and classify them by border occupancy and fill ratio. OpenCV’s inRange, contour extraction, polygon approximation, and template matching are all directly relevant here. This weak-label stage will not be your final model, but it will dramatically cut annotation burden and produce the hard negatives you need. citeturn40search6turn40search5turn40search11turn40search0turn40search7

The GitHub/vector branch can reduce annotation cost even further. Because the university workflow references Mapwel, GMapTool, and Garmin IMG outputs, and because GitHub tooling already exists for Polish .mp import/export and MP-to-GeoJSON conversion, you should explicitly test a side pipeline of IMG → MP/GeoJSON → candidate features. If a subset of the target blue symbols survives that conversion as stable vector features or type/style codes, those outputs can become high-quality pseudo-labels or at least a powerful review prior. If they do not, the test still pays for itself by proving that the raster detector must carry the load. citeturn17search0turn19view0turn19view1turn22view0

For the final coordinate labels, I recommend a strict three-state policy. A coordinate is positive if a reviewed target symbol is present in the search neighborhood and the coordinate aligns to that symbol under the project’s chosen tolerance. It is negative only if the local crop quality is acceptable and no target symbol is present after review or after a sufficiently reliable detector pass. It is uncertain if the coordinate is near a sheet edge, a tile seam, a clipped symbol, a color-ambiguous mark, a candidate produced only by one source domain, or a case where the cartographic placement is visually ambiguous. In practice, uncertain labels should be excluded from first-pass supervised loss and reserved for active-learning review and calibration.

CVAT is the cleanest annotation front end for this project because its COCO export format natively supports bounding boxes, polygons, masks, and tracks, and it emits a predictable zip layout that is easy to feed into PyTorch and Detectron2 pipelines. Label Studio is also viable, especially if your team prefers its export APIs and raw JSON model, but CVAT aligns more naturally with a box-first, mask-optional computer-vision workflow. citeturn46search0turn47search4turn47search5

Model options and recommended stack

Because the target objects are tiny, blue, and shape-constrained, different model families solve different parts of the task well. Classical CV and template matching are excellent for bootstrapping. One-stage detectors are best for scaling to map-wide scans. Two-stage detectors are strong when accuracy on small objects matters more than speed. Segmentation becomes worthwhile only when distinguishing filled versus hollow versus border-emphasis proves difficult from boxes alone. The final coordinate question is best answered by a hybrid coordinate scorer that consumes detector outputs rather than by a detector alone. citeturn40search0turn40search5turn40search6turn41search0turn41search4turn42search8turn43search6turn43search8turn45search12turn43search2

The comparison below combines published model characteristics with practical recommendations for this specific cartographic-symbol task. Published YOLOv8 sizes/speeds come from the Ultralytics model card, and Faster/Mask/Retina benchmark speed-memory figures come from the official MMDetection benchmark documentation. Practical budgets for 1024px map tiles are my estimates, because your tile sizes and augmentations will differ from those benchmark settings. citeturn41search4turn41search6turn42search8

Option	Best role	Main advantages	Main limitations	Likely data need	Practical starting point
Classical CV with color + contours	Weak-label miner, baseline	Explainable, fast, CPU-only, no training	Brittle to palette drift, blue hydrography, noise	50–100 reviewed examples to tune thresholds	HSV/Lab thresholding, morphology, contour quad filters
Template matching	High-precision candidate miner	Very good when symbol shapes are stable	Sensitive to scale, scan noise, and style variation	20–100 templates/class	Masked normalized cross-correlation across a small scale bank
YOLOv8n / YOLOv8s	Main production detector	Easy training, strong ecosystem, fast inference	Tiny hollow symbols may need large images and careful tiling	~1k–3k positive instances	`imgsz=1024–1280`, 100–150 epochs, low color jitter
Faster R-CNN	Accuracy-first detector	Strong small-object baseline, robust with FPN	Slower inference and heavier training	~2k–5k positives	ResNet-50-FPN, smaller anchors than COCO defaults
RetinaNet	Imbalance-aware detector	Focal loss helps with hard negatives	Can be slower than YOLO, still needs tuning	~2k–5k positives	ResNet-50-FPN, small anchors, hard-negative mining
Mask R-CNN	Box + mask + fill-style modeling	Distinguishes interior fill and border shape better	More compute, more label structure	~1k+ verified masks or auto-masks	Use if fill-style errors dominate after box detection
U-Net on centered crops	Coordinate-centered segmentation	Very strong on local pixel masks with little data	Not a full-scene detector by itself	~1k–3k crop masks	256–512 px crops, Dice + BCE or Dice + focal
Hybrid detector + coordinate scorer	Recommended final system	Optimizes the actual end task and supports QA expansion	More moving parts	Moderate	Detector outputs + local crop features + calibration

A practical compute view is below. Again, the published figures are not your exact training cost, but they are useful anchors for budget planning. citeturn41search4turn42search8turn41search0

Model family	Published reference point	Practical budget for this task
YOLOv8n	3.2M params; 0.99 ms A100 TensorRT at 640px	Good pilot model on a 12–16 GB GPU with 1024px tiles
YOLOv8s	11.2M params; 1.20 ms A100 TensorRT at 640px	Best first production detector on a 16–24 GB GPU
YOLOv8m	25.9M params; 1.83 ms A100 TensorRT at 640px	Use only if YOLOv8s misses too many small symbols
Faster R-CNN	Benchmark training memory about 3.0 GB in published setting	Budget 16 GB+ for larger map tiles and stable training
Mask R-CNN	Benchmark training memory about 3.4 GB in published setting	Budget 16–24 GB if masks are enabled on 1024px tiles
RetinaNet	Benchmark training memory about 3.9 GB in published setting	Budget 16 GB+ if using hard-negative-heavy training

For hyperparameters, I would start conservatively and preserve the map’s discriminative color information. For YOLOv8, use 1024 or 1280 px tiles, 100–150 epochs, light augmentations, and minimal hue shift because the blue-vs-non-blue distinction is central. Keep flips if symbol semantics allow them, but avoid strong perspective transforms and aggressive mosaics because they can distort cartographic context. For Faster R-CNN/RetinaNet/Mask R-CNN, use a ResNet-50-FPN backbone, shrink anchor sizes to cover very small quadrilateral symbols, and start from COCO-pretrained weights via TorchVision or Detectron2. The official TorchVision tutorial is a good reference for custom detection/instance-segmentation datasets, and Detectron2 remains a strong option if you want richer experiment control. citeturn41search0turn41search5turn42search9turn42search6

My recommended build order is straightforward. First, ship a rules-only candidate miner and use it to seed annotation. Second, train a YOLOv8s detector on boxes only. Third, if filled vs hollow vs border-emphasis remains an operational error source, add either Mask R-CNN or a small U-Net crop segmenter. Fourth, train the final coordinate scorer on features such as nearest-detection class, confidence, distance, local blue-mask statistics, crop classifier outputs, and source-format flags. That last stage is the part that answers the user’s actual question and should be treated as the system of record.

If the vector-assisted branch works, it can reduce annotation effort substantially. If it fails, the project still remains viable because the raster branch is already complete enough to support weak labels, human review, and robust detector training.

Evaluation and uncertainty

You need to evaluate two tasks, not one. The first task is global symbol detection on tiles or sheets. The second task is the real business target: coordinate-level presence prediction for the 60k points. For detection, the right core metrics are precision, recall, F1, IoU, AP50, AP75, and mAP@[0.5:0.95], ideally reported both overall and by fill-style subclass if you keep them separate. For coordinate scoring, the most important metrics are precision, recall, F1, and PR-AUC; if you need trustworthy probabilities, also report Brier score, expected calibration error, and reliability diagrams after calibration. citeturn41search0turn45search12turn46search4

Use spatial cross-validation, not random crop splits. BGtopoVJ has hundreds of sheets at a common source scale, and its later releases harmonized palette and resolution. If you randomly split crops, you will leak near-duplicate local styles, neighboring map content, and source-specific palette statistics across train and validation sets. The correct split primitive is the sheet ID or a larger contiguous geographic block. A good default is GroupKFold by sheet, plus one frozen dedicated validation area that is never touched during model development. The “dedicated area” should be a contiguous region rather than a random point sample, otherwise deployment risk will be underestimated. citeturn22view0turn31search1

Class imbalance should be treated as a first-class design issue because there are many blue non-target structures on topographic maps. Your hard negatives will include rivers, lakes, canals, blue labels, blue linework, and blue outlines that are not the target rectangles/squares. That is exactly the sort of dense foreground/background skew that RetinaNet’s focal loss was designed to address. Even if you choose YOLOv8 as the main detector, the lesson from RetinaNet still applies: populate the training set with hard negatives, not just random empty crops. citeturn43search6turn43search11

For uncertainty, use calibration, not just raw detector scores. The calibration literature remains clear that modern neural nets can be overconfident, and that temperature scaling is often a simple, strong post-processing fix. In practice, keep a clean validation split, fit a calibrator there, and then expose three operational zones: an auto-positive zone, an auto-negative zone, and a human-review zone in between. If your initial prior belief is that positives among candidate coordinates are around 80–90%, use that belief to prioritize active-learning review and to set starting thresholds, but let the actual held-out dedicated area determine the final deployed prevalence estimate. citeturn46search4turn46search1

A simple operating policy is:

Score band	Action
`p >= 0.90`	auto-positive
`0.40 <= p < 0.90`	review unless detector evidence is exceptionally strong
`0.10 < p < 0.40`	review if region is mission-critical; else uncertain
`p <= 0.10`	auto-negative

Those thresholds are not sacred; they are an operational starting point. Tune them to the precision target you actually care about.

The dedicated-area validation workflow should be explicit and repeatable. Pick one region that is symbol-dense and one region that is symbol-sparse. Freeze both. Run full tiling and full 60k-coordinate scoring there. Manually audit all auto-positives, all uncertains, and a random sample of auto-negatives. Only after that audit should you expand to neighboring areas and refresh the weak-label rules. That loop is where most real-world performance gains will come from.

Deployment, deliverables, and timeline

For deployment, there are really two modes. The first is batch scanning over whole sheets or a merged raster. The second is coordinate scoring for the 60k points. Batch scanning should use georeferenced windowed reads aligned to raster block structure, with tile overlap so edge symbols are not missed; Rasterio’s window APIs and rio-tiler’s point/part/tile readers are well-suited to that. Coordinate scoring is lighter: once the sheet assignment is built, you can extract centered windows for the 60k coordinates and score them directly. citeturn47search3turn47search1turn19view4

For scalability, keep runtime format complexity low. Although BGtopoVJ provides a merged ECW and GDAL supports ECW through the Hexagon SDK, the official GDAL ECW documentation is clear that support depends on the ECW SDK and that licensing differs between desktop decode, compression, server deployment, and mobile contexts. That makes ECW fine for ingestion or QA, but a poor long-term training/runtime dependency unless you truly need it. The cleanest path is to ingest once and then standardize your internal processing on TIFF / VRT / GeoTIFF / COG-like workflows with Rasterio/GDAL. citeturn47search0turn19view3

The recommended software stack is therefore:

Layer	Recommended tools
Ingestion and reprojection	GDAL, Rasterio, pyproj
Large-raster windows and tile serving	Rasterio windows, rio-tiler
Classical CV and weak labeling	OpenCV
Detection and segmentation	PyTorch, TorchVision, Ultralytics, Detectron2
Annotation	CVAT, optionally Label Studio
GIS QA and map review	QGIS, MapWarper-style overlay workflows
Vector-assisted parsing	Polish-format/QGIS tooling, MP2GeoJSON, GMapTool-style utilities

The concrete deliverables should be treated as first-class outputs, not side effects.

Deliverable	Suggested format
Source manifest	`manifest.parquet` or `manifest.csv` with sheet bounds, CRS, source version, paths
Reviewed symbol dataset	COCO zip for boxes and masks; optional YOLO text export
Coordinate label table	`coordinates_labels.parquet` with `x`, `y`, `crs`, `sheet_id`, `presence`, `class_attrs`, `review_status`
Train/val/test splits	`splits.yaml` or `folds.json` grouped by sheet/block
Weak-label artifacts	`weak_candidates.geojson` or GeoParquet
Trained detector weights	`model.pt` or `model.pth`
Calibrator and coordinate scorer	serialized sklearn or PyTorch artifact
Inference outputs	GeoParquet/GeoJSON/CSV with box geometry, class, confidence, calibrated score
Reproducible environment	`Dockerfile`, `environment.yml`, `requirements.txt`, `Makefile`
Validation package	overlay PNGs, HTML report, error slices, dedicated-area audit workbook

A good repository skeleton would look like this:

data/
  raw/
  interim/
  tiles/
  annotations/
  models/
src/
  01_inventory_bgtopovj.py
  02_parse_mapfiles.py
  03_assign_coordinates.py
  04_extract_tiles.py
  05_seed_candidates_cv.py
  06_vector_assisted_labels.py
  07_export_coco.py
  08_train_detector.py
  09_train_segmenter.py
  10_train_coordinate_scorer.py
  11_calibrate_eval.py
  12_infer_batch.py
  13_build_validation_report.py
configs/
docker/
reports/

The timeline below assumes one ML engineer and part-time annotation support.

Stage	Practical duration	Main outputs
Corpus inventory and georeferencing	3–5 days	Manifest, sheet footprints, coordinate-to-sheet assignment
Weak-label miner and pilot review	1–2 weeks	First reviewed boxes, hard-negative catalog
First detector baseline	1 week	YOLOv8s or RetinaNet baseline, initial metrics
Optional segmentation branch	1 week	Mask R-CNN or U-Net if fill-style errors remain
Coordinate scorer and calibration	3–5 days	Calibrated probabilities and uncertainty policy
Dedicated-area validation	3–5 days	Acceptance report, error taxonomy
Packaging and reproducibility	2–4 days	Docker/conda environment, scripts, docs

Resource planning should be modest but not tiny. The BGtopoVJ archive sizes imply that raw inputs alone are already multi-gigabyte, and derived tiles, masks, overlays, and experiment outputs can easily expand that by an order of magnitude. A practical workstation target is 16+ CPU cores, 64–128 GB RAM, 1 GPU with 16–24 GB VRAM, and 2–4 TB SSD. If you stay with original per-sheet TIFF+MAP instead of relying on ECW during training, operational complexity drops sharply. citeturn22view0turn31search1turn47search0

The strongest step-by-step implementation plan is this:

Inventory canonical sources from the original per-sheet TIFF+MAP corpus, and freeze a manifest.
Normalize CRS handling for the 60k coordinates and assign them to one or more sheets.
Build a weak-label miner from blue-thresholding, contour filters, and optional vector-assisted pseudo-labels.
Review a pilot set in CVAT and export COCO.
Train a YOLOv8s detector on 1024px tiles with source-aware metadata.
Add a mask branch only if needed for fill-style ambiguity.
Train a coordinate scorer that converts local detections into calibrated probabilities.
Freeze a dedicated validation area, audit it thoroughly, and only then expand to the broader corpus.
Package everything reproducibly in Docker/conda, with manifest + splits + model cards + validation report.

If I had to choose one default architecture before seeing a single label, I would choose this: original TIFF+MAP as truth, OpenCV weak-label miner, YOLOv8s as primary detector, calibrated coordinate scorer on top, CVAT for review, and a vector-assisted fallback branch tested early but trusted only after version-mismatch checks. That design fits the source evidence best, keeps annotation cost under control, and preserves a clean path from pilot work to map-wide deployment.

Releases 16

routes-v3 Latest

2026-06-04 10:44:04 +00:00

Languages

Python 74%

Jupyter Notebook 25.9%

readme.md Unescape Escape

Pipeline design for detecting blue rectangles in BGtopoVJ and scoring 60k coordinates

Executive summary

Source triage and acquisition

Data engineering and labeling design

Model options and recommended stack

Evaluation and uncertainty

Deployment, deliverables, and timeline

readme.md