作者:empty 页数:128 出版社:empty |
ReferenceRegularexpression sate a language used for parsing andmanipulating text.Theyareoftenusedtoperform.complexsearch-and-replace operations, and to validate that textdataswell-formed.Today, regular expressions are in eluded in must program-ming languages, as well as in many scripting Languages.editors, applications, databases.and command-line tools.This book aims to give quick access to the syntax andpattern-matching operations of the most popular of theselanguages so that you can apply your regular-expressionknowledge in any environmentThe second edition of this book adds sections on Ruby andApachewebserver.common regular expressions, andaloupdates existing languages.
About This BookThis book starts with a general introduction to regularexpressions.The first section describes and defines theconstructs used in regular expressions, and estab h shes thecommon prine ip les of pattern matching.The remaining see-tions of the book are dev ated to the syntax, features, andusage ol regular expressions in various implementations.The implementations covered in this book are Perl, JavaTM,NET and C#, Ruby, Python, PCRE, PHP, Apache webserver, vi editor, Java Seri pt.and shell tools.Conventions Used in This Bookshould be typed literally by the userThe following typographical conventions are used in this
AcknowledgmentsJel reyE.F.Pried Ts Mastering Regular Expressions(O'Rely)s the definitive work on regular expressions.While writing.Irelied heavily on hs book and his advice.As a convenience,Expressions, Third Edition(MRE) for expanded discussion ofregularexpression syntax and concepts.Nar Torkington and Linda Mui were excellent editors whoguided me through what fur ned out to beat ricky frst edi-tion.This edition was aided by the excellent editorial skils ofAnd yOram.Sarah Burcham deserves special thanks forgiving me the opportunity to write this book, and for hercontributions to the“Shell Tools section.More thanks forthe input and technical reviews from Jeffrey Friedl.PhlipHazel.Steve Friedl, O laB in i, Ian Darwin, Zak Greant, RonHitchens, A.M.Kuchling, Tim Ali wine.Schuyler Erle, DavidLents.Rabble, Rich Bow an, Eric Eisenhart, and Brad MerrilL
A regular express to nisa string contain nga combinat on ofnormal characters and special metacharacters or meta sequences.The normal characters match themselves.Metacharacters and mela sequences are characters or sequencesof characters that represent ideas such as quantity.locations,or types of characters.The list in“Regex Metacharacters,Modes.and Constructs shows the most common meta char-acters and meta sequences in the regularexpression world.Later sections list the availability of and syntax for sup-ported metacharacters for pat ticular implementations ofregular expressions.Pattern matching consists of finding a section of text thaisde seri bed(matched) by a regularexpression.The underlyingcode that searches the text is the regular expressionengine.You can pre diet the results of most mate hes by keeping tworules in mind:
1.The earliest(leftmost) match winsRegular expressions are applied to the input staring atthe first character and proceeding toward the last.Assoon as the rc gular expressionengine in dsa match, itreturns.(SeeM RE 148-149)2.5tandardquantifers are greedyQuantifiers specify how many times something can berepeated.The standard quantifiers attempt to match asmany times as possible.They settle for less than the maximum onyihsnecssayfortheueee of thematch.The process of giving up characters and tryingless-greedy matches is called backtracking.(SeeM RE151-153.)Regularexpression engines have differences based on theirtype.There are twa classes of engines:Deterministic FiniteAutomaton(DFA) and Non determin stic Finite AutomatonIn traduction to Regexes and Pat tem Matching I 3(NFA) .DFAs are faster, but lack many of the features of anNFA, such as capturing.lou kato und, and non greedy quanti-fiers In the NFA word, there are two types:tradtional andPOSIX.
DFAs compare each character of the input string to theregularexpression, keeping track of all matches inprogress Since each character is examined at most once,the DFA engine is the fastest.One additional rule toremember with DFAs is that the alternation meta se-quence is greedy.When more than one option in anJ temat on(fool foobar) mate hes, the long et one isselected So, rule No I can be amended to read“thelongest leftmost match wins. (SeeM RE 155-156)T raul i tional NFA enginesTraditonal NFA engines compare each element of theregex to the input string, keeping track of pa sitionsoption fails, the engine backtracks to the most recentlysaved position.For standard quantifiers, the enginechooses the greedy option of matching more text; how-ever, if that option leads to the failure of the match, theengine returns to a saved postion and tries a less greedypatnTheurduonlNFAengne use sar deedt eaton, whreethopuonmiheaemnstre asequentially.A longer match maybe ignored if a near laeroption leads to a successful match.So, here rule#l canbe amended to read“the first leftmost match after greedyquant fiers have had their fll wins.(SeeM RE 153-154)POSIX NFA enginesPOSIX NFA Engines work similarly to Traditional NFAswith one exception:a POSIX engine always picks thelongest of the leftmost matches.For example, the alter-hat on atleateaorywoudmathhefl word-e ategory whenever possible, even if the furst alternat ve(car ) matched and appeared earlier in the alternation.(SeeM RE 153-154)4|Regular Expresion Pocket ReferenceRegex Metacharacters, Modes, and ConstructsThe metacharacters and meta sequences shown here repre-sent most available types uf regularexpression constructsand their most con mon ayn tax.However, syntax and a val-ability vary by implementation,Character representationsMany implementations provide shortcuts to represent char-acters that maybe diffie ult to input.(SeeM RE 115-118)Character shorthandsMost imp lernen rations have specifc shorthands for thealert, backspace, escape character, for n feed, newline,carriage return, horizontal tab, and vertical tabcharacters For example, n is often a short had for thenewline character, which is usually LF(012octal, burcan sometimes beCR(015octab, depending on the oper-ating system.Confusingly, many implementations use Abto mean both backspace and word boundary(positionbetween a“word”character and an on word character) .For these implementations, b means backspace in acha-acter class(a ser of possible characters to march in thestring) , and word boundary elsewhere.Octal escape:S nueRepresents a character corresponding to a two-or three-digit octal number.For example, 015 012matehesanAS ClIC R/LF sequence.Hex and Unicode escape sAx run, xfn ut) , Au nox, Au nurRepresent characters corresponding to hexadecimal num-bers.Four-digit and larger hex numbers can represent therange of Unicode characters.For example, xoD xo Amatches an ASCII CR/LF sequenceControl characters:Ac charCorresponds to ASCII control characters encoded withvalues less than 32.To be safe, always use an uppercasechor-some implement at tons do not handle lowercaseIn traduction to Regexes and Pat tem Matching 15