Thread: What a regex is
View Single Post
Old 05-06-2010, 02:04 AM   #12
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Quote:
Originally Posted by Worldwalker View Post
I think you have to stretch the definition of "programming language" a lot before you can fit regexes into it.
Let's not go into definitions, OK? They are wider than you would expect. But that's not the point.

Quote:
And while admittedly I don't know what goes on under the hood in the average compiler, how can something entered at runtime be compiled?
Easily enough. Regular expressions are a form of regular language, which can be losslessly transformed into a finite-state machine and back. Finite-state machine can be easily transformed into actual code.

Besides, when you think about it, everything that is ever compiled is "entered at runtime". A compiler takes some input (source code) and produced output (compiled code) and it does not really matter if the compilation was started on user's request (besides, why wouldn't "entering regexp into a field and pressing enter" be considered a "user's request"), whether the user typed the code in advance or "on-the-fly" (after all, what's the difference between "typing that regexp into that editbox a symbol at a time" and "pasting the whole regexp from clipboard") and whether the "user" is a real person or another program.

Quote:
However, I think we can agree that they aren't what the ordinary person thinks of as a programming language -- they're just a form of representing data. So the troll was highly misleading, at best, and no doubt intentionally so.
Well, I wouldn't agree. Sorry. Not only don't regexps represent data at all, which isn't really relevant to the discussion anyway, but I have a completely different view of people's perception of programming languages. I think that to most people there really is little difference between machine code, Java source and regular expressions - all of them are a form of magic which they don't understand but which somehow performs these useful (or not so useful) tasks. They may appear slightly different (Java almost seems like a real language, only spoken by someone with a 100-word vocabulary and no grammar; machine code is an incomprehensible mix of numbers and a few letters; regexp is an even more incomprehensible mix of numbers, letters, symbols and various other gizmos no one can understand without being a bit crazy), but the difference is negligible.

Quote:
So even though there can't really be universally useful regexes, I'm very much in favor of some way of providing a good selection of models that can be modified to suit individual requirements. That would make life a lot easier for a lot of people. The suggestions that have come up in this thread seem like a Very Good Thing to me.
It might be worth trying, but I have my doubts. It's been my experience that either you don't understand regexps, and then you will most likely have very limited success adapting one to your particular situation, or you understand them, and then it is often easier to just write one from scratch than try to understand what the example mean.

I understand the value of examples if they are trying to illustrate simple points (as in, "Operator ? means either one or zero of the preceding symbol. So 'https?://' will match both 'http://' and 'https://', but not 'htp://' or 'httpss://' or 'http:www'."), but you can find those in any regular expression tutorial. For actual use, you would need something a lot more complicated, and unfortunately that also means "hard to understand", and thus "hard to adapt".

In fact, some time ago I had startedwriting a tool which would take a collection of regexps and apply it to a file or a number of files. I even managed to get the tool to an almost-usable state, but then I started to actually use it and found out that it doesn't really help at all - I would have to adapt the regexps (and I was careful to enhance them in such a way as to make adaptation easy!) for every single file separately and in effect would do more work than simply writing the regexps from scratch every time. So I stopped the task and got a strong feeling of futility about preparing some universal regexp framework.

Of course, I use regexps a lot so it isn't difficult for me to write a new one. Less experienced people might find it beneficial as a starting point. But I still doubt it. With every regexp, you will quickly find a situation where it doesn't work as it is, and without really understanding what's going on, you won't be able to get them to work.

A rather extreme example:
http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html
pepak is offline   Reply With Quote