Files
garmin-img-format-parsing/stage-2-parse-stage-1
nq 5460669a12 v.5.0.6
nethq@DESKTOP-H2853B6 MINGW64 ~/vs-code-home-py/garmin-converter-2026-04-14 (master)
$ python stage-2-parse-stage-1/landmarks_csv_to_osmand.py analyze parsed-landmarks/csv-from-stage-1/* --config-out parsed-landmarks/analysis-stage-2/all_config --dedupe-mode off
[info] analyzed 490080 landmarks
[info] wrote config: parsed-landmarks\analysis-stage-2\all_config

nethq@DESKTOP-H2853B6 MINGW64 ~/vs-code-home-py/garmin-converter-2026-04-14 (master)
$ python stage-2-parse-stage-1/landmarks_csv_to_osmand.py build --config parsed-landmarks/analysis-stage-2/all_config --osm parsed-landmarks/osm/all-v7.osm --dedupe-mode off parsed-landmarks/csv-from-stage-1/*.csv
[info] wrote OSM: parsed-landmarks\osm\all-v7.osm
2026-04-16 02:30:01 -07:00
..
2026-04-16 02:30:01 -07:00
2026-04-15 04:34:02 -07:00

Found the two real problems.

First, your crash:

  • minidom.parseString(...) was reparsing the entire generated XML in memory
  • one of the landmark names or tag values contains an XML-illegal control character
  • on a huge file, that blows up exactly like your ExpatError: not well-formed (invalid token)

Second, the sampling:

  • the previous script was taking the first few examples in a group
  • now it samples spread out across the group, so the examples are taken from different positions in the dataset instead of only the front

Updated script: landmarks_csv_to_osmand_v3.py

What changed:

  • removed the fragile minidom.parseString() round-trip
  • writes XML directly with ElementTree
  • sanitizes text and tag values to strip XML-invalid control characters
  • supports .gz outputs correctly
  • group examples are now spread-out samples
  • interactive group renaming still works, but now the shown examples are better distributed
  • loader still accepts both raw landmark CSVs and slimmer derived CSVs

Use it instead of the old one:

python landmarks_csv_to_osmand_v3.py *.csv --osm bgmountains_poi.osm.gz

If you want to inspect grouping before writing:

python landmarks_csv_to_osmand_v3.py *.csv --show-groups --group-examples 5

If you want interactive naming with spread-out examples:

python landmarks_csv_to_osmand_v3.py *.csv --show-groups --interactive-group-names --group-examples 5 --gpx landmarks.gpx

If you want POI-only OSM for OsmAndMapCreator:

python landmarks_csv_to_osmand_v3.py *.csv --osm landmarks.osm.gz

Then:

java.exe -Xms256M -Xmx4096M -cp "OsmAndMapCreator.jar;lib/*" net.osmand.MainUtilities generate-poi C:\path\to\landmarks.osm.gz

The main fix is this: the script now treats dirty text safely instead of trusting all names/tags to already be valid XML.

If you want, the next refinement should be a --debug-bad-rows mode that writes out the original rows whose text had to be sanitized, so you can see exactly which source landmarks were malformed.