View Single Post
Old 05-20-2009, 06:04 AM   #1
rogue_ronin
Banned
rogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-books
 
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
Lightbulb Tyrannosaurus Regex

In a couple of other threads, there was talk about starting a thread for useful regex. Stuff we use all the time, or stuff to solve a difficult, but rare, problem.

Maybe we'd like to do that here.

If you don't know what regex is, it stands for "regular expressions". Perhaps someone below could offer a good explanation, and I'll edit this first post, replacing this paragraph.

I'll suggest the following format for submissions:
========================================
  • What it does: description of changes to text
  • Best used on: Text, HTML, Word, etc.
  • Regex Find:
    Code:
    Search regex code here.
  • Find Translation: Explain what the Find code stands for.
  • Regex Replace:
    Code:
    Substitution regex code here.
  • Replace Translation: Explain what the Replace code stands for.
  • Variants/Comments:
    Code:
    Similar regex for similar problems, or notes/warnings.
========================================

Here's one I use:
  • What it does: Finds Right-Single-Quotes ( ’ ) in contractions and replaces them with the ' entity-name.
  • Best used on: HTML
  • Regex Find:
    Code:
    ([a-zI])\&rsquo\;([a-z]+)
  • Find Translation: Find any one lower case letter or capital "I" followed by the string "’" followed by at least one lower case letter
  • Regex Replace:
    Code:
    $1'$2
  • Replace Translation: Put that first lower case letter or "I" back, put the string "'" in, then put whatever one or more lower case letters followed the original "’"
  • Variants/Comments: For text files, you can use the actual characters instead of the entity names. This works the other way, too; swap the entity names, and you can put a rsquo where an apos (or a literal apostrophe) was.
============================

Submitted Regex:
  • Swap apostrophes in contractions: Post #1
  • Change quotes using apostrophes to curly quotes: Post #3
  • Un-break hard-returns in a paragraph: Post #12
  • Format simple chapter headings: Post #14
  • Calibre: Import book metadata from filename Post #17

Last edited by rogue_ronin; 05-29-2009 at 08:46 PM.
rogue_ronin is offline   Reply With Quote