Files
garmin-img-format-parsing/stage-2-parse-stage-1
2026-04-16 04:54:30 -07:00
..
2026-04-15 04:34:02 -07:00

Found the two real problems.

First, your crash:

  • minidom.parseString(...) was reparsing the entire generated XML in memory
  • one of the landmark names or tag values contains an XML-illegal control character
  • on a huge file, that blows up exactly like your ExpatError: not well-formed (invalid token)

Second, the sampling:

  • the previous script was taking the first few examples in a group
  • now it samples spread out across the group, so the examples are taken from different positions in the dataset instead of only the front

Updated script: landmarks_csv_to_osmand_v3.py

What changed:

  • removed the fragile minidom.parseString() round-trip
  • writes XML directly with ElementTree
  • sanitizes text and tag values to strip XML-invalid control characters
  • supports .gz outputs correctly
  • group examples are now spread-out samples
  • interactive group renaming still works, but now the shown examples are better distributed
  • loader still accepts both raw landmark CSVs and slimmer derived CSVs

Use it instead of the old one:

python landmarks_csv_to_osmand_v3.py *.csv --osm bgmountains_poi.osm.gz

If you want to inspect grouping before writing:

python landmarks_csv_to_osmand_v3.py *.csv --show-groups --group-examples 5

If you want interactive naming with spread-out examples:

python landmarks_csv_to_osmand_v3.py *.csv --show-groups --interactive-group-names --group-examples 5 --gpx landmarks.gpx

If you want POI-only OSM for OsmAndMapCreator:

python landmarks_csv_to_osmand_v3.py *.csv --osm landmarks.osm.gz

Then:

java.exe -Xms256M -Xmx4096M -cp "OsmAndMapCreator.jar;lib/*" net.osmand.MainUtilities generate-poi C:\path\to\landmarks.osm.gz

The main fix is this: the script now treats dirty text safely instead of trusting all names/tags to already be valid XML.

If you want, the next refinement should be a --debug-bad-rows mode that writes out the original rows whose text had to be sanitized, so you can see exactly which source landmarks were malformed.