HypheNN-De: German Hyphenation with Neural Networks

  • Did you consider putting the "center" of the detector somewhere other than in the middle of the vector? what would happen if you had 6 before, and 2 after, or 5 before, and 3 after?

    Another thought I had: for performance reasons, it might be nice to have something more compact than a one-hot vector for each letter. Have you looked at determining sets of characters which have a similar impact on hyphenation, and encoding them together?

    PS: do you have the extracted list of wiktionary hyphenations sitting in a text file somewhere that you could put up? I'm fixin' to quickly compare the accuracy to TeX's German hyphenation (once the 30+GiB TeXLive repository finishes downloading).

    PPS: You could improve the display of code blocks in your site on desktop by adding

        display: block;
        max-width: 710px;
        width: 80%;
        margin-left: auto;
        margin-right: auto;
    
    to your `.post-content pre code` rule. Or maybe slightly indent it by reducing the max width a small amount below that of the body text.

  • If you love spelling and hyphenation, you should star this issue in Chrome(ium):

    https://code.google.com/p/chromium/issues/detail?id=20667

    There are a lots of spelling and hyphenation libraries e.g. for Finnish language, but it is not possible to get them to working in Chrome cause there is no extension capability for it. It's really shame, since these odd languages probably never get support by Chrome team itself.

  • TeX implements a very good spelling engine that that is driven by patterns [1]. I would expect it very difficult to improve on this and as far as I can see, the article doesn't include a comparison.

    [1]: https://tex.stackexchange.com/questions/262588/how-are-hyphe...

  • This blog inspired me to hunt for some obscure machine learning papers from 80s and 90s that I may replicate and improve. Any idea where to start?