garmin-img-format-parsing/stage-2-parse-stage-1/readme.md at v.6

opthq/garmin-img-format-parsing

Fork 0

Files

nq 3b1087c9ec v.5.0.1 - actually adding them

2026-04-15 04:34:02 -07:00

2.1 KiB

Raw Permalink Blame History

Found the two real problems.

First, your crash:

minidom.parseString(...) was reparsing the entire generated XML in memory
one of the landmark names or tag values contains an XML-illegal control character
on a huge file, that blows up exactly like your ExpatError: not well-formed (invalid token)

Second, the sampling:

the previous script was taking the first few examples in a group
now it samples spread out across the group, so the examples are taken from different positions in the dataset instead of only the front

Updated script: landmarks_csv_to_osmand_v3.py

What changed:

removed the fragile minidom.parseString() round-trip
writes XML directly with ElementTree
sanitizes text and tag values to strip XML-invalid control characters
supports .gz outputs correctly
group examples are now spread-out samples
interactive group renaming still works, but now the shown examples are better distributed
loader still accepts both raw landmark CSVs and slimmer derived CSVs

Use it instead of the old one:

python landmarks_csv_to_osmand_v3.py *.csv --osm bgmountains_poi.osm.gz

If you want to inspect grouping before writing:

python landmarks_csv_to_osmand_v3.py *.csv --show-groups --group-examples 5

If you want interactive naming with spread-out examples:

python landmarks_csv_to_osmand_v3.py *.csv --show-groups --interactive-group-names --group-examples 5 --gpx landmarks.gpx

If you want POI-only OSM for OsmAndMapCreator:

python landmarks_csv_to_osmand_v3.py *.csv --osm landmarks.osm.gz

Then:

java.exe -Xms256M -Xmx4096M -cp "OsmAndMapCreator.jar;lib/*" net.osmand.MainUtilities generate-poi C:\path\to\landmarks.osm.gz

The main fix is this: the script now treats dirty text safely instead of trusting all names/tags to already be valid XML.

If you want, the next refinement should be a --debug-bad-rows mode that writes out the original rows whose text had to be sanitized, so you can see exactly which source landmarks were malformed.

2.1 KiB Raw Permalink Blame History

2.1 KiB

Raw Permalink Blame History