nethq@DESKTOP-H2853B6 MINGW64 ~/vs-code-home-py/garmin-converter-2026-04-14 (master) $ python stage-2-parse-stage-1/landmarks_csv_to_osmand.py analyze parsed-landmarks/csv-from-stage-1/* --config-out parsed-landmarks/analysis-stage-2/all_config --dedupe-mode off [info] analyzed 490080 landmarks [info] wrote config: parsed-landmarks\analysis-stage-2\all_config nethq@DESKTOP-H2853B6 MINGW64 ~/vs-code-home-py/garmin-converter-2026-04-14 (master) $ python stage-2-parse-stage-1/landmarks_csv_to_osmand.py build --config parsed-landmarks/analysis-stage-2/all_config --osm parsed-landmarks/osm/all-v7.osm --dedupe-mode off parsed-landmarks/csv-from-stage-1/*.csv [info] wrote OSM: parsed-landmarks\osm\all-v7.osm
Found the two real problems.
First, your crash:
minidom.parseString(...)was reparsing the entire generated XML in memory- one of the landmark names or tag values contains an XML-illegal control character
- on a huge file, that blows up exactly like your
ExpatError: not well-formed (invalid token)
Second, the sampling:
- the previous script was taking the first few examples in a group
- now it samples spread out across the group, so the examples are taken from different positions in the dataset instead of only the front
Updated script: landmarks_csv_to_osmand_v3.py
What changed:
- removed the fragile
minidom.parseString()round-trip - writes XML directly with
ElementTree - sanitizes text and tag values to strip XML-invalid control characters
- supports
.gzoutputs correctly - group examples are now spread-out samples
- interactive group renaming still works, but now the shown examples are better distributed
- loader still accepts both raw landmark CSVs and slimmer derived CSVs
Use it instead of the old one:
python landmarks_csv_to_osmand_v3.py *.csv --osm bgmountains_poi.osm.gz
If you want to inspect grouping before writing:
python landmarks_csv_to_osmand_v3.py *.csv --show-groups --group-examples 5
If you want interactive naming with spread-out examples:
python landmarks_csv_to_osmand_v3.py *.csv --show-groups --interactive-group-names --group-examples 5 --gpx landmarks.gpx
If you want POI-only OSM for OsmAndMapCreator:
python landmarks_csv_to_osmand_v3.py *.csv --osm landmarks.osm.gz
Then:
java.exe -Xms256M -Xmx4096M -cp "OsmAndMapCreator.jar;lib/*" net.osmand.MainUtilities generate-poi C:\path\to\landmarks.osm.gz
The main fix is this: the script now treats dirty text safely instead of trusting all names/tags to already be valid XML.
If you want, the next refinement should be a --debug-bad-rows mode that writes out the original rows whose text had to be sanitized, so you can see exactly which source landmarks were malformed.