Manfred B. Perl: One of my most used and least understood perl snippets

Monday, June 18, 2012

One of my most used and least understood perl snippets

# ----- this perl snippet will remove any leading or trailing white spaces
# ----- all spaces in the variable will NOT be reduced to one space

my $variable =~ s/^\s*(.*\S)\s*$/$1/; # trm ld/trl whtspc

18 comments:

Anonymous said...: Wouldn't that be more readable if you make it into to calls?; 2:22 AM
abraxxa said...: The regex matches the start of the line with ^, followed by zero or more white space, captures what comes next until again zero or more white space and the end of the line.
$1 holds the captured string from the first capturing braces.
Does that explain it for you?; 4:28 AM
Anonymous said...: I'm probably missing something obvious, but isn't s/^\s+|\s+$//g a simpler RE?; 4:54 AM
Anonymous said...: I can't find where I read it right now, but I remember a discussion (was it on perlmonks?) saying that it was faster to do it with two substitions :

$var =~ s/^\s+//;
$var =~ s/\s+$//;

However, that discussion happened several years ago and with the recent changes that occurred in the regexp engine(s), it might be worth double checking.

Anyway, I got used to the two-step dance and when it comes to readability I find it easier to grok.; 5:30 AM
Anonymous said...: Let's take a look:

s/^\s*(.*\S)\s*$/$1/;

Looking at the operation, this s a search and replace, so:

s/stringa/stringb/

would replace all instances of stringa with stringb.

Looking at the specifics, let's break down the regular expression:

^\s*(.*\S)\s*$

^ - start of the string
\s* - zero or more whitespace characters

The brackets create a capture buffer for zero or more of any character (.*) followed by a non-whitespace character (\S). This capture buffer is then used later by the $1.

\s* - zero or more whitespace characters

$ - the end of the string

The capture buffer then allows us to use $1 in the second half of the replace to extract just the bit we want.

Hope that helps :)

Ian; 7:36 AM
Anonymous said...: it's faster to use two separate statements like so: s/^\s+//; s/\s+$//;

there's plenty of benchmarks floating around the webs to confirm that.; 2:40 PM
A. Sinan Unur said...: perldoc perlfaq explains why it is better to do this differently. Namely:

$string =~ s/\A\s+//;
$string =~ s/\s+\z//;

This is both easier to read and more efficient.; 5:01 PM
Anonymous said...: try this:
s/^\s*|\s*$//g; 3:09 PM
stas said...: /^\s+|\s+$//g does the same :); 7:30 AM
stas said...: s/^\s+|\s+$//g does the same :); 7:30 AM
Lyle said...: Abraxxa - that does explain it, thanks!

Those of you recommending two RE - Thanks! I may start using that in the future. I found the one I posted maybe 10 years ago, and have replicated it in many of my perls. The whtspc comment is how I find it with grep, and I just lazily replicate it year in year out.

The two calls may actually fix the problem if the variable contains only spaces and more than one space...; 10:39 PM
Lyle said...: Ian - your explanation is most excellent as well. It breaks it down into small steps which appeals to my RISC brain. Thanks!; 10:42 PM
szabgab said...: The original regex does NOT remove leading (or trailing) white spaces from a string that *only* has white spaces: ' ';; 6:32 AM
Anonymous said...: How about :

$x=~s/^\s*(.+?)\s*$/$1/;

one less char (maybe); 10:20 AM
Anonymous said...: Let's take YET ANOTHER look:

s/^\s*(.*\S)\s*$/$1/;

This expression presumes that there is at least 1 non-whitespace character in the string. That's not always a safe and reasonable assumption.

All of the other solutions presented above correctly handle all-blank strings such as " ", reducing them to zero length strings.; 11:33 AM
Unknown said...: $var =~ s/^\s+|\s+$//g;; 11:59 AM
Anonymous said...: XS module is even faster:
benchmark by Sam Graham; 4:29 AM
Lyle said...: szabgab - that's my one complaint with this perl. I've had to account for that numerous times. But I'm so lazy and it's always commented with "whtspc" so it's easy to find, copy & paste.

I think I'll start using the two line solution once I prove to myself that it covers the " " variable.; 8:55 PM

Manfred B. Perl

Monday, June 18, 2012

One of my most used and least understood perl snippets

18 comments:

About Me

Blog Archive