Hacker News

RegExpBuilder – Create regular expressions using chained methods

by jrullmannon 2/11/2015, 2:59:23 PM with 15 comments

by draegtunon 2/11/2015, 6:21:59 PM
Thought this might be of interest; below shows how the examples provided would look in Rebol:
```
    digits: digit: charset "0123456789"

    rule: [
        thru "$"
        some digits
        "."
        digit
        digit
    ]

    parse "$10.00" rule    ;; true


    pattern: [
        some "p"
        2 "q" any "q"
    ]

    new-rule: [
        2 pattern
    ]

    parse "pqqpqq" new-rule    ;; true
```
Rebol doesn't have regular expressions instead it comes with a parse dialect which is a TDPL - http://en.wikipedia.org/wiki/Top-down_parsing_language
Some parse refs: http://en.wikibooks.org/wiki/REBOL_Programming/Language_Feat... | http://www.rebol.net/wiki/Parse_Project | http://www.rebol.com/r3/docs/concepts/parsing-summary.html
by tragomaskhaloson 2/11/2015, 4:21:14 PM
There have been many efforts similar to this in many languages, but most of us seem happy to stick to the more succinct canonical form, supplemented via /x # comments when things get too hairy
by marktangotangoon 2/11/2015, 4:52:08 PM
Generally, I find that if one's regexes are so complex that one needs visualizers or other aids in writing them, one doesn't have a regex problem, but a parsing problem. The method of parsing by recursive descent can often lead to much more understandable (if more verbose) "pattern matching".
by UnoriginalGuyon 2/11/2015, 5:02:43 PM
Looks like Linq (from .Net/C#). Pretty sexy way to write Regular Expressions if you ask me.
I've "learned" regular expressions multiple times but it just never sticks, I have no idea why. It certainly doesn't help that there are several different incompatible syntaxes (so what I remember and think "should" work doesn't).
I'd prefer to write RegX's in this style, however I would pay attention to performance (not that Regular Expressions are high performance, however I wouldn't want to see a large performance loss either).
by chris-aton 2/11/2015, 3:12:53 PM
Thanks, this is a lot better than writing this (even if the formatting worked here):
``` (?xi) \b ( # Capture 1: entire matched URL (?: [a-z][\w-]+: # URL protocol and colon (?: /{1,3} # 1-3 slashes | # or [a-z0-9%] # Single letter or digit or '%' # (Trying not to match e.g. "URI::Escape") ) | # or www\d{0,3}[.] # "www.", "www1.", "www2." … "www999." | # or [a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash ) (?: # One or more: [^\s()<>]+ # Run of non-space, non-()<> | # or $([^\s()<>]+|(\([^\s()<>]+$))\) # balanced parens, up to 2 levels )+ (?: # End with: $([^\s()<>]+|(\([^\s()<>]+$))\) # balanced parens, up to 2 levels | # or [^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars ) ) ```
by jluxenbergon 2/11/2015, 6:02:53 PM
S-expressions are a natural fit for construction of regular expressions, see http://community.schemewiki.org/?scheme-faq-programming#H-1w...
e.g.
```
  (: (or (in ("az")) (in ("AZ"))) 
    (* (uncase (in ("az09")))))
```
by jgalt212on 2/11/2015, 5:18:53 PM
Definitely a debugable way to write regexes. Whenever I have to maintain a hairy regex, I like to plot the regex as a railroad diagram.
These web based tools can do it:
https://www.debuggex.com/
http://jex.im/regulex/
by dkarapetyanon 2/11/2015, 4:56:00 PM
Generalize just a little bit and you got parser combinators.
by zzzcpanon 2/11/2015, 10:55:38 PM
Regexpes exist to avoid cumbersome code like this, to make it less error prone. Makes me sad to see so many upvotes.
I get that some people have a hard time understanding regexpes with all the backtracking and greediness. Yes, syntax is a bit complicated. Maybe simplified predictable default mode could help. But there is no problem with DSL being used as an abstraction. In fact, we need more DSLs, for everything!
by psychometryon 2/11/2015, 5:19:12 PM
Now you have three problems.

by kazinatoron 2/11/2015, 6:20:36 PM

Yes, regexes can have other syntactic representations, like:

    (compound "$" (1+ :digit) "." :digit :digit)

Run:

    $ txr -p "(regex-compile '(compound \"$\" (1+ :digit) \".\" :digit :digit))"
    #/$\d+\.\d\d/

by epicureanidealon 2/11/2015, 7:45:30 PM
Nice work! I don't know if it'll be ideal for all use cases, but it does add some readability.
by otakucodeon 2/11/2015, 10:56:55 PM
Now do an example where you create a regex to parse the IMDB movies.list data file!
by gcaoon 2/11/2015, 4:17:18 PM
Great work! This is very intriguing!
by pg_is_a_button 2/11/2015, 5:54:13 PM
you know what else can represent all regular expressions? regular expressions.
#dumb