Book description
This cookbook provides more than 100 recipes to help you crunch data and manipulate text with regular expressions. Every programmer can find uses for regular expressions, but their power doesn't come worry-free. Even seasoned users often suffer from poor performance, false positives, false negatives, or perplexing bugs. Regular Expressions Cookbook offers step-by-step instructions for some of the most common tasks involving this tool, with recipes for C#, Java, JavaScript, Perl, PHP, Python, Ruby, and VB.NET. With this book, you will:
Understand the basics of regular expressions through a concise tutorial
Use regular expressions effectively in several programming and scripting languages
Learn how to validate and format input
Manage words, lines, special characters, and numerical values
Find solutions for using regular expressions in URLs, paths, markup, and data exchange
Learn the nuances of more advanced regex features
Understand how regular expressions' APIs, syntax, and behavior differ from language to language
Write better regular expressions for custom needs
Whether you're a novice or an experienced user, Regular Expressions Cookbook will help deepen your knowledge of this unique and irreplaceable tool. You'll learn powerful new tricks, avoid language-specific gotchas, and save valuable time with this huge library of proven solutions to difficult, real-world problems.
Table of contents
-
Regular Expressions Cookbook
- SPECIAL OFFER: Upgrade this ebook with O’Reilly
- Preface
- 1. Introduction to Regular Expressions
-
2. Basic Regular Expression Skills
- 2.1. Match Literal Text
- 2.2. Match Nonprintable Characters
- 2.3. Match One of Many Characters
- 2.4. Match Any Character
- 2.5. Match Something at the Start and/or the End of a Line
- 2.6. Match Whole Words
- 2.7. Unicode Code Points, Properties, Blocks, and Scripts
- 2.8. Match One of Several Alternatives
- 2.9. Group and Capture Parts of the Match
- 2.10. Match Previously Matched Text Again
- 2.11. Capture and Name Parts of the Match
- 2.12. Repeat Part of the Regex a Certain Number of Times
- 2.13. Choose Minimal or Maximal Repetition
- 2.14. Eliminate Needless Backtracking
- 2.15. Prevent Runaway Repetition
- 2.16. Test for a Match Without Adding It to the Overall Match
- 2.17. Match One of Two Alternatives Based on a Condition
- 2.18. Add Comments to a Regular Expression
- 2.19. Insert Literal Text into the Replacement Text
- 2.20. Insert the Regex Match into the Replacement Text
- 2.21. Insert Part of the Regex Match into the Replacement Text
- 2.22. Insert Match Context into the Replacement Text
-
3. Programming with Regular Expressions
- Programming Languages and Regex Flavors
- 3.1. Literal Regular Expressions in Source Code
- 3.2. Import the Regular Expression Library
- 3.3. Creating Regular Expression Objects
- 3.4. Setting Regular Expression Options
- 3.5. Test Whether a Match Can Be Found Within a Subject String
- 3.6. Test Whether a Regex Matches the Subject String Entirely
- 3.7. Retrieve the Matched Text
- 3.8. Determine the Position and Length of the Match
- 3.9. Retrieve Part of the Matched Text
- 3.10. Retrieve a List of All Matches
- 3.11. Iterate over All Matches
- 3.12. Validate Matches in Procedural Code
- 3.13. Find a Match Within Another Match
- 3.14. Replace All Matches
- 3.15. Replace Matches Reusing Parts of the Match
- 3.16. Replace Matches with Replacements Generated in Code
- 3.17. Replace All Matches Within the Matches of Another Regex
- 3.18. Replace All Matches Between the Matches of Another Regex
- 3.19. Split a String
- 3.20. Split a String, Keeping the Regex Matches
- 3.21. Search Line by Line
-
4. Validation and Formatting
- 4.1. Validate Email Addresses
- 4.2. Validate and Format North American Phone Numbers
- 4.3. Validate International Phone Numbers
- 4.4. Validate Traditional Date Formats
- 4.5. Accurately Validate Traditional Date Formats
- 4.6. Validate Traditional Time Formats
- 4.7. Validate ISO 8601 Dates and Times
- 4.8. Limit Input to Alphanumeric Characters
- 4.9. Limit the Length of Text
- 4.10. Limit the Number of Lines in Text
- 4.11. Validate Affirmative Responses
- 4.12. Validate Social Security Numbers
- 4.13. Validate ISBNs
- 4.14. Validate ZIP Codes
- 4.15. Validate Canadian Postal Codes
- 4.16. Validate U.K. Postcodes
- 4.17. Find Addresses with Post Office Boxes
- 4.18. Reformat Names From “FirstName LastName” to “LastName, FirstName”
- 4.19. Validate Credit Card Numbers
- 4.20. European VAT Numbers
-
5. Words, Lines, and Special Characters
- 5.1. Find a Specific Word
- 5.2. Find Any of Multiple Words
- 5.3. Find Similar Words
- 5.4. Find All Except a Specific Word
- 5.5. Find Any Word Not Followed by a Specific Word
- 5.6. Find Any Word Not Preceded by a Specific Word
- 5.7. Find Words Near Each Other
- 5.8. Find Repeated Words
- 5.9. Remove Duplicate Lines
- 5.10. Match Complete Lines That Contain a Word
- 5.11. Match Complete Lines That Do Not Contain a Word
- 5.12. Trim Leading and Trailing Whitespace
- 5.13. Replace Repeated Whitespace with a Single Space
- 5.14. Escape Regular Expression Metacharacters
- 6. Numbers
-
7. URLs, Paths, and Internet Addresses
- 7.1. Validating URLs
- 7.2. Finding URLs Within Full Text
- 7.3. Finding Quoted URLs in Full Text
- 7.4. Finding URLs with Parentheses in Full Text
- 7.5. Turn URLs into Links
- 7.6. Validating URNs
- 7.7. Validating Generic URLs
- 7.8. Extracting the Scheme from a URL
- 7.9. Extracting the User from a URL
- 7.10. Extracting the Host from a URL
- 7.11. Extracting the Port from a URL
- 7.12. Extracting the Path from a URL
- 7.13. Extracting the Query from a URL
- 7.14. Extracting the Fragment from a URL
- 7.15. Validating Domain Names
- 7.16. Matching IPv4 Addresses
- 7.17. Matching IPv6 Addresses
- 7.18. Validate Windows Paths
- 7.19. Split Windows Paths into Their Parts
- 7.20. Extract the Drive Letter from a Windows Path
- 7.21. Extract the Server and Share from a UNC Path
- 7.22. Extract the Folder from a Windows Path
- 7.23. Extract the Filename from a Windows Path
- 7.24. Extract the File Extension from a Windows Path
- 7.25. Strip Invalid Characters from Filenames
-
8. Markup and Data Interchange
- 8.1. Find XML-Style Tags
- 8.2. Replace <b> Tags with <strong>
- 8.3. Remove All XML-Style Tags Except <em> and <strong>
- 8.4. Match XML Names
- 8.5. Convert Plain Text to HTML by Adding <p> and <br> Tags
- 8.6. Find a Specific Attribute in XML-Style Tags
- 8.7. Add a cellspacing Attribute to <table> Tags That Do Not Already Include It
- 8.8. Remove XML-Style Comments
- 8.9. Find Words Within XML-Style Comments
- 8.10. Change the Delimiter Used in CSV Files
- 8.11. Extract CSV Fields from a Specific Column
- 8.12. Match INI Section Headers
- 8.13. Match INI Section Blocks
- 8.14. Match INI Name-Value Pairs
- Index
- About the Authors
- Colophon
- SPECIAL OFFER: Upgrade this ebook with O’Reilly
Product information
- Title: Regular Expressions Cookbook
- Author(s):
- Release date: May 2009
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9780596520687
You might also like
book
Regular Expressions Cookbook, 2nd Edition
Take the guesswork out of using regular expressions. With more than 140 practical recipes, this cookbook …
book
Mastering Regular Expressions, 3rd Edition
Regular expressions are an extremely powerful tool for manipulating text and data. They are now standard …
video
Understanding Regular Expressions
The regular expression (regex) has been around for over 60 years and it's still poorly understood—and …
book
An Introduction to Regular Expressions
The ubiquity of regular expressions must mean they offer universal utility, and, surprisingly, they do not …