Perl初级教程 - 第三天


 

 

 

 

第二页:字符串匹配

Perl的最有用的特征之一是

它的强大的字符串处理能力。其中的核心是被很多其它UNIX工具使用的规则表达式(regular expression - RE)。

规则表达式

规则表达式包含在斜线内,匹配通过=~操作符进行。如果字符串the出现在变量$sentence中,则下面的表达式为真:

$sentence =~ /the/

RE是大小写敏感的,所以如果

$sentence = "The quick brown fox";

那么上面的匹配结果为false。操作符!~用在“非匹配”时,在上面的例子中

$sentence !~ /the/

是真,因为字符串the没有出现在$sentence中。

特殊变量$_

在条件语句

if ($sentence =~ /under/)
{
	print "We're talking about rugby\n";
}

中,如果我们有下面两个表达式中的一个:

$sentence = "Up and under";
$sentence = "Best winkles in Sunderland";

将打印出一条信息。

但是如果我们把这个句子赋值给特殊变量$_,用起来会更容易些。如果这样,我们可以避免使用匹配和非匹配操作符,上面的例子可以写成:

if (/under/)
{
	print "We're talking about rugby\n";
}

$_变量是很多Perl操作的缺省变量,经常被使用。

其它的RE

在RE中有大量的特殊字符,既使它们功能强大,又使它们看起来很复杂。最好在用RE时慢慢来,对它们的使用是一种艺术。

下面是一些特殊的RE字符和它们的意义:

.	# Any single character except a newline
^	# The beginning of the line or string
$	# The end of the line or string
*	# Zero or more of the last character
+	# One or more of the last character
?	# Zero or one of the last character

下面是一些匹配的例子,在使用时应加上/.../:

t.e	# t followed by anthing followed by e
	# This will match the
	#                 tre
	#                 tle
	# but not te
	#         tale
^f	# f at the beginning of a line
^ftp	# ftp at the beginning of a line
e$	# e at the end of a line
tle$	# tle at the end of a line
und*	# un followed by zero or more d characters
	# This will match un
	#                 und
	#                 undd
	#                 unddd (etc)
.*	# Any string without a newline. This is because
	# the . matches anything except a newline and
	# the * means zero or more of these.
^$	# A line with nothing in it.

还有更多的用法。方括号用来匹配其中的任何一个字符。在方括号中"-"表明"between","^"表示"not":

[qjk]		# Either q or j or k
[^qjk]		# Neither q nor j nor k
[a-z]		# Anything from a to z inclusive
[^a-z]		# No lower case letters
[a-zA-Z]	# Any letter
[a-z]+		# Any non-zero sequence of lower case letters

上面提到的已经基本够用了,下面介绍的只做参考:

竖线"|"表示"or",括号(...)可以进行集合:

jelly|cream	# Either jelly or cream
(eg|le)gs	# Either eggs or legs
(da)+		# Either da or dada or dadada or...

下面是一些其它的特殊字符:

\n		# A newline
\t		# A tab
\w		# Any alphanumeric (word) character.
		# The same as [a-zA-Z0-9_]
\W		# Any non-word character.
		# The same as [^a-zA-Z0-9_]
\d		# Any digit. The same as [0-9]
\D		# Any non-digit. The same as [^0-9]
\s		# Any whitespace character: space,
		# tab, newline, etc
\S		# Any non-whitespace character
\b		# A word boundary, outside [] only
\B		# No word boundary

象$, |, [, ), \, /这样的字符是很特殊的,如果要引用它们,必须在前面加一个反斜线:

\|		# Vertical bar
\[		# An open square bracket
\)		# A closing parenthesis
\*		# An asterisk
\^		# A carat symbol
\/		# A slash
\\		# A backslash

RE的例子

我们前面提到过,用RE最好慢慢来。下面是一些例子,当使用它们时应方在/.../中。

[01]		# Either "0" or "1"
\/0		# A division by zero: "/0"
\/ 0		# A division by zero with a space: "/ 0"
\/\s0		# A division by zero with a whitespace:
		# "/ 0" where the space may be a tab etc.
\/ *0		# A division by zero with possibly some
		# spaces: "/0" or "/ 0" or "/  0" etc.
\/\s*0		# A division by zero with possibly some
		# whitespace.
\/\s*0\.0*	# As the previous one, but with decimal
		# point and maybe some 0s after it. Accepts
		# "/0." and "/0.0" and "/0.00" etc and
		# "/ 0." and "/  0.0" and "/   0.00" etc.

>>

Perl初级教程
第一页 条件语句
第二页 字符串匹配
第三页 替换和翻译

[第1天][第2天][第3天][第4天]

 

 



本文根据 网猴 相关文章改编,版权归原作者所有。