Found the two real problems. First, your crash: * `minidom.parseString(...)` was reparsing the entire generated XML in memory * one of the landmark names or tag values contains an XML-illegal control character * on a huge file, that blows up exactly like your `ExpatError: not well-formed (invalid token)` Second, the sampling: * the previous script was taking the first few examples in a group * now it samples **spread out across the group**, so the examples are taken from different positions in the dataset instead of only the front Updated script: [landmarks_csv_to_osmand_v3.py](sandbox:/mnt/data/landmarks_csv_to_osmand_v3.py) What changed: * removed the fragile `minidom.parseString()` round-trip * writes XML directly with `ElementTree` * sanitizes text and tag values to strip XML-invalid control characters * supports `.gz` outputs correctly * group examples are now **spread-out samples** * interactive group renaming still works, but now the shown examples are better distributed * loader still accepts both raw landmark CSVs and slimmer derived CSVs Use it instead of the old one: ```bash python landmarks_csv_to_osmand_v3.py *.csv --osm bgmountains_poi.osm.gz ``` If you want to inspect grouping before writing: ```bash python landmarks_csv_to_osmand_v3.py *.csv --show-groups --group-examples 5 ``` If you want interactive naming with spread-out examples: ```bash python landmarks_csv_to_osmand_v3.py *.csv --show-groups --interactive-group-names --group-examples 5 --gpx landmarks.gpx ``` If you want POI-only OSM for OsmAndMapCreator: ```bash python landmarks_csv_to_osmand_v3.py *.csv --osm landmarks.osm.gz ``` Then: ```bash java.exe -Xms256M -Xmx4096M -cp "OsmAndMapCreator.jar;lib/*" net.osmand.MainUtilities generate-poi C:\path\to\landmarks.osm.gz ``` The main fix is this: the script now treats dirty text safely instead of trusting all names/tags to already be valid XML. If you want, the next refinement should be a `--debug-bad-rows` mode that writes out the original rows whose text had to be sanitized, so you can see exactly which source landmarks were malformed.