2.1 KiB
2.1 KiB
Found the two real problems.
First, your crash:
minidom.parseString(...)was reparsing the entire generated XML in memory- one of the landmark names or tag values contains an XML-illegal control character
- on a huge file, that blows up exactly like your
ExpatError: not well-formed (invalid token)
Second, the sampling:
- the previous script was taking the first few examples in a group
- now it samples spread out across the group, so the examples are taken from different positions in the dataset instead of only the front
Updated script: landmarks_csv_to_osmand_v3.py
What changed:
- removed the fragile
minidom.parseString()round-trip - writes XML directly with
ElementTree - sanitizes text and tag values to strip XML-invalid control characters
- supports
.gzoutputs correctly - group examples are now spread-out samples
- interactive group renaming still works, but now the shown examples are better distributed
- loader still accepts both raw landmark CSVs and slimmer derived CSVs
Use it instead of the old one:
python landmarks_csv_to_osmand_v3.py *.csv --osm bgmountains_poi.osm.gz
If you want to inspect grouping before writing:
python landmarks_csv_to_osmand_v3.py *.csv --show-groups --group-examples 5
If you want interactive naming with spread-out examples:
python landmarks_csv_to_osmand_v3.py *.csv --show-groups --interactive-group-names --group-examples 5 --gpx landmarks.gpx
If you want POI-only OSM for OsmAndMapCreator:
python landmarks_csv_to_osmand_v3.py *.csv --osm landmarks.osm.gz
Then:
java.exe -Xms256M -Xmx4096M -cp "OsmAndMapCreator.jar;lib/*" net.osmand.MainUtilities generate-poi C:\path\to\landmarks.osm.gz
The main fix is this: the script now treats dirty text safely instead of trusting all names/tags to already be valid XML.
If you want, the next refinement should be a --debug-bad-rows mode that writes out the original rows whose text had to be sanitized, so you can see exactly which source landmarks were malformed.