Show HN: Hck – a fast and flexible cut-like tool

  • I wrote something similar (but necet really finished it), called 'gut', in Go a few years back. Funny thing is, that I literally never use it. I thought splitting on regexes and that stuff would be super useful, but it turns out that I just use Perl one-liners instead. And Perl is available on something like 99.99% of all *nix machines, which my own 'cut'-substitute isn't.

    Still a good exercise for me to write it, and I assume for OP too.

  • Love seeing these modern alternatives to coreutils! Ripgrep, fd, hyperfine, bat, exa, bottom, gdu, wc, sd, hexyl...

    Yet to find a GNU 'tr' alternative though

  • Nice work!

    I don't know whether anyone here has used Rexx. The 'parse' instruction in Rexx was incredibly powerful, breaking up text by field/position/delimiter and assigning to variables all in one line.

    I've often wondered if there was a command-line equivalent. Awk is great but you have to 'program' the parsing spec, rather than declare it.

  • It is interesting to note how it compares to "choose" (also in Rust) in the benchmarks.

    single character

        hck           1.494 ± 0.026s
        hck (no-mmap) 1.735 ± 0.004s
        choose        4.597 ± 0.016s
    
    multi character

        hck           2.127 ± 0.004s
        hck (no-mmap) 2.467 ± 0.012s
        choose        3.266 ± 0.011s
    
    The single pass optimization trick[1] seems to be helping a lot in single character case.

    Of course, doing away with a pass is suppossed to give 2x, and I am wondering whether the regex constraint lead to this "side-effect".

    [1] fast mode - https://github.com/sstadick/hck/blob/master/src/lib/core.rs#... https://github.com/sstadick/hck/blob/master/src/lib/core.rs#...

  • I saw about `hck` recently on twitter, was impressed to see support for compressed files. From the current todo list, I hope complement is implemented for sure.

    I see Negative index is currently "unlikely". I'm writing a similar tool [0], but with bash+awk. I solved the negative index support with a `-n` option, which changes the range syntax to `:` instead of `-` character.

    My biggest trouble came with literal field separator [1], because FS can only be specified as a string in awk and backslash is a metacharacter for both string and regexp.

    [0] https://github.com/learnbyexample/regexp-cut

    [1] https://learnbyexample.github.io/escaping-madness-awk-litera...

  • <offtopic> I have implemented a `_split` command to split a line by a separator and `_stat` command that does basically `sort | uniq -c | sort -nr` counting elements and sorting by frequency. Really useful operations for me.

    When my one liners become 2-3 lines long I need to switch to a regular script, but I also log all my shell commands years back and have something a bit better than `history | grep word` to search it.</>

  • The README and description should not assume the reader knows what `cut` is or what it's used for. Maybe reference it and then ELI5

  • Nice one op. It’s mostly due to my lack of knowledge of rust but the code is not easy to read unlike golang. Does anyone feel the same ? (between nothing to do with how op wrote but rather the language itself)

  • Yay, no more piping multiple cuts when you have multiple delimiters.

  • Heck