RegExpBuilder โ€“ Create regular expressions using chained methods

  • Thought this might be of interest; below shows how the examples provided would look in Rebol:

        digits: digit: charset "0123456789"
    
        rule: [
            thru "$"
            some digits
            "."
            digit
            digit
        ]
    
        parse "$10.00" rule    ;; true
    
    
        pattern: [
            some "p"
            2 "q" any "q"
        ]
    
        new-rule: [
            2 pattern
        ]
    
        parse "pqqpqq" new-rule    ;; true
    
    Rebol doesn't have regular expressions instead it comes with a parse dialect which is a TDPL - http://en.wikipedia.org/wiki/Top-down_parsing_language

    Some parse refs: http://en.wikibooks.org/wiki/REBOL_Programming/Language_Feat... | http://www.rebol.net/wiki/Parse_Project | http://www.rebol.com/r3/docs/concepts/parsing-summary.html

  • There have been many efforts similar to this in many languages, but most of us seem happy to stick to the more succinct canonical form, supplemented via /x # comments when things get too hairy

  • Generally, I find that if one's regexes are so complex that one needs visualizers or other aids in writing them, one doesn't have a regex problem, but a parsing problem. The method of parsing by recursive descent can often lead to much more understandable (if more verbose) "pattern matching".

  • Looks like Linq (from .Net/C#). Pretty sexy way to write Regular Expressions if you ask me.

    I've "learned" regular expressions multiple times but it just never sticks, I have no idea why. It certainly doesn't help that there are several different incompatible syntaxes (so what I remember and think "should" work doesn't).

    I'd prefer to write RegX's in this style, however I would pay attention to performance (not that Regular Expressions are high performance, however I wouldn't want to see a large performance loss either).

  • Thanks, this is a lot better than writing this (even if the formatting worked here):

    ``` (?xi) \b ( # Capture 1: entire matched URL (?: [a-z][\w-]+: # URL protocol and colon (?: /{1,3} # 1-3 slashes | # or [a-z0-9%] # Single letter or digit or '%' # (Trying not to match e.g. "URI::Escape") ) | # or www\d{0,3}[.] # "www.", "www1.", "www2." โ€ฆ "www999." | # or [a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash ) (?: # One or more: [^\s()<>]+ # Run of non-space, non-()<> | # or \(([^\s()<>]+|(\([^\s()<>]+\)))\) # balanced parens, up to 2 levels )+ (?: # End with: \(([^\s()<>]+|(\([^\s()<>]+\)))\) # balanced parens, up to 2 levels | # or [^\s`!()\[\]{};:'".,<>?ยซยปโ€œโ€โ€˜โ€™] # not a space or one of these punct chars ) ) ```

  • S-expressions are a natural fit for construction of regular expressions, see http://community.schemewiki.org/?scheme-faq-programming#H-1w...

    e.g.

      (: (or (in ("az")) (in ("AZ"))) 
        (* (uncase (in ("az09")))))

  • Definitely a debugable way to write regexes. Whenever I have to maintain a hairy regex, I like to plot the regex as a railroad diagram.

    These web based tools can do it:

    https://www.debuggex.com/

    http://jex.im/regulex/

  • Generalize just a little bit and you got parser combinators.

  • Regexpes exist to avoid cumbersome code like this, to make it less error prone. Makes me sad to see so many upvotes.

    I get that some people have a hard time understanding regexpes with all the backtracking and greediness. Yes, syntax is a bit complicated. Maybe simplified predictable default mode could help. But there is no problem with DSL being used as an abstraction. In fact, we need more DSLs, for everything!

  • Now you have three problems.

  • Yes, regexes can have other syntactic representations, like:

        (compound "$" (1+ :digit) "." :digit :digit)
    
    Run:

        $ txr -p "(regex-compile '(compound \"$\" (1+ :digit) \".\" :digit :digit))"
        #/$\d+\.\d\d/

  • Nice work! I don't know if it'll be ideal for all use cases, but it does add some readability.

  • Now do an example where you create a regex to parse the IMDB movies.list data file!

  • Great work! This is very intriguing!

  • you know what else can represent all regular expressions? regular expressions.

    #dumb