pay no attention to the Man(fred) Behind the (Perl) curtain!
Wouldn't that be more readable if you make it into to calls?
The regex matches the start of the line with ^, followed by zero or more white space, captures what comes next until again zero or more white space and the end of the line.$1 holds the captured string from the first capturing braces.Does that explain it for you?
I'm probably missing something obvious, but isn't s/^\s+|\s+$//g a simpler RE?
I can't find where I read it right now, but I remember a discussion (was it on perlmonks?) saying that it was faster to do it with two substitions : $var =~ s/^\s+//; $var =~ s/\s+$//;However, that discussion happened several years ago and with the recent changes that occurred in the regexp engine(s), it might be worth double checking.Anyway, I got used to the two-step dance and when it comes to readability I find it easier to grok.
Let's take a look: s/^\s*(.*\S)\s*$/$1/;Looking at the operation, this s a search and replace, so: s/stringa/stringb/would replace all instances of stringa with stringb.Looking at the specifics, let's break down the regular expression: ^\s*(.*\S)\s*$^ - start of the string\s* - zero or more whitespace charactersThe brackets create a capture buffer for zero or more of any character (.*) followed by a non-whitespace character (\S). This capture buffer is then used later by the $1.\s* - zero or more whitespace characters$ - the end of the stringThe capture buffer then allows us to use $1 in the second half of the replace to extract just the bit we want.Hope that helps :)Ian
it's faster to use two separate statements like so: s/^\s+//; s/\s+$//;there's plenty of benchmarks floating around the webs to confirm that.
perldoc perlfaq explains why it is better to do this differently. Namely:$string =~ s/\A\s+//;$string =~ s/\s+\z//;This is both easier to read and more efficient.
/^\s+|\s+$//g does the same :)
s/^\s+|\s+$//g does the same :)
Abraxxa - that does explain it, thanks!Those of you recommending two RE - Thanks! I may start using that in the future. I found the one I posted maybe 10 years ago, and have replicated it in many of my perls. The whtspc comment is how I find it with grep, and I just lazily replicate it year in year out.The two calls may actually fix the problem if the variable contains only spaces and more than one space...
Ian - your explanation is most excellent as well. It breaks it down into small steps which appeals to my RISC brain. Thanks!
The original regex does NOT remove leading (or trailing) white spaces from a string that *only* has white spaces: ' ';
How about :$x=~s/^\s*(.+?)\s*$/$1/;one less char (maybe)
Let's take YET ANOTHER look:s/^\s*(.*\S)\s*$/$1/;This expression presumes that there is at least 1 non-whitespace character in the string. That's not always a safe and reasonable assumption. All of the other solutions presented above correctly handle all-blank strings such as " ", reducing them to zero length strings.
$var =~ s/^\s+|\s+$//g;
XS module is even faster:benchmark by Sam Graham
szabgab - that's my one complaint with this perl. I've had to account for that numerous times. But I'm so lazy and it's always commented with "whtspc" so it's easy to find, copy & paste.I think I'll start using the two line solution once I prove to myself that it covers the " " variable.
Post a Comment