Lord of the Regexes

Sunday, December 13th, 2009

Three regexps for the Python-kings under the sky,
Seven for the Ruby lords on their Rails of stone,
Nine for C++ doomed to die,
One for Larry Wall on his dark throne
In the Land of Perl where Obfuscation lies.

One regex to match them all
One regex to find them
One regex to replace them all and in the text file bind them
In the Land of Perl where Obfuscation lies.

Not Substring Regular Expressions

Monday, February 19th, 2007

I’m trying to devise a regular expression that will find all or most img tags that don’t have alt attributes. <img[^>]*/> will find all the img elements (or at least most of them). And I can easily find those that do contain an alt attribute. However, I’m stumped when it comes to finding those that do not contain the substring alt. Any ideas?

bad interpreter: No such file or directory

Thursday, January 11th, 2007

You sometimes see this error message when attempting to run a Perl, Python, or shell script that uses a shebang line to find the interpreter. For example,

$ hello.pl
-bash: hello.pl: command not found


Perl and Multiple Line Ending Characters

Saturday, January 6th, 2007

Perl uses \n (the linefeed) as its default end of line character (record separator). You can change this with -0 option on the command line to be \r (carriage return), \r\n (carriage return linefeed pair) or something else. For example, this command sets the record separator to \r before replacing every occurence of the string foo with the string bar:

$ perl –pi -e -00d ‘s/foo/bar/g’ test.html

However my files are a weird mix of Unix, Mac, and Windows conventions. A few files may even use several line ending conventions in one file. Most modern text editors can autodetect and deal with this without any problem, as can XML parsers. However as near as I can figure, Perl cannot. It expects me to know in advance what kind of file I’m feeding it.

Is there any simple way around this? There’s more than one way to do it, but is there more than one $/?