Hacker News

Show HN: Open-source text-to-geolocation models

by yachayaion 11/21/2022, 3:10:16 PM with 7 comments

by neoncontrailson 11/21/2022, 7:29:46 PM
This is _really_ cool. Early in the pandemic I released a local news aggregation tool that aimed to aggregate COVID-related content and score it for relevance using an ensemble of ML classification models, including one that would attempt to infer an article's geographic coordinates. Accuracy peaked at about ~70-80%, which was just not quite high enough for this use case. With a large enough dataset of geotagged documents I'm pretty sure we could've improved that by another 10-15% which would've likely been "good enough" for our purposes. But one of the surprising things I took away from the project was that there's not a well-defined label for this category of classification problems, and as a result there's few datasets or benchmarks to encourage progress.
by DOsingaon 11/21/2022, 5:51:14 PM
This does look interesting but as other comments have pointed out without data or weights it's not clear how well this works. The training notebook seems to suggest it is not actually improving all that much on the training data
by JimDabellon 11/21/2022, 5:02:27 PM
Depending upon your use-case, you can get pretty good results by using spaCy for named entity recognition then matching on the titles of Wikipedia articles that have coördinates.
by tomtheon 11/21/2022, 5:12:29 PM
There are no weights and no data, only some code to create a pytorch character based network and train it. Will you provide weights or data in the future? Do you have any benchmark over Nominatim or Google maps?
I think something like this (but with more substance) could be helpful for some people, especially in the social sciences.
by rmbyrroon 11/21/2022, 4:46:46 PM
This would have been tremendously useful in a project I worked at a few years ago.
It's really a difficult task to parse text at large scale with accurate geographical tagging.
by cyanydeezon 11/22/2022, 12:39:01 AM
Probably should combine this with DELFT.
by TuringNYCon 11/21/2022, 4:23:08 PM
Has anyone got this working? Curious if someone could PR a dependencies file that can be used to run this?