![]() |
Regex
This thread is for those wanting to use Regex and needing some starters.
It isn't intended to teach Regex but hopefully the expressions can be usefully adapted to your needs. To understand the purpose of this Regex you should know that I modify novels solely for our own use. Also my missus and I are happy with the default values of styles for standard elements such as paras and headings. Only minor tweaks are needed. Why do it? Because I like the idea of something being as trim and efficient as possible. Maybe it improves speed of loading and page turning. Here's the CSS stylesheet which is used for all our novels: Code:
@namespace h "http://www.w3.org/1999/xhtml";Modify as required; however, it's use of defaults makes it fine for most novels. Possible changes could be leading and a bottom-margin in the p element. (We prefer first-line indent to paragraph spacing so that we get more text on the page. Here's what to add to the 'p' block: Code:
line-height: 1.2em;After a brief look at the original stylesheet of the novel, in partlicular looking for how italics and images, if any, have been tagged, the original stylesheet is trashed and replaced with the stylesheet above. How? Code:
Right click on stylesheet.css - "Remove". Right click on any folder "Add Existing Items...". Locate your new 'stylesheet.css'.Here's where the Regex comes in. (Regex = Regular Expression) 'Match Case' and 'Minimal matching' can be left unchecked. They don't matter here. Carry out all operations in Code View If there are multiple HTMLs select All HTML Files rather than 'Current File'. The description has turned out to be somewhat long-winded, sorry, but there's a summary for copying at the end. -------------------------------------------------------------- TO FIND AND DELETE ALMOST ALL calibre TAGS -------------------------------------------------------------- Here's an example of some original code: Code:
</head>Code:
Find: (<\w+) class="calibre(\d+)?"?[^>]*(>)Code:
</head>Sigil corrects <p> back to <body> and removes superfluous <p>s. finally it becomes:- Code:
</head>The span class="italic" is preserved; however, if italic style is tagged using calibre tags you would need to do a Find/Replace prior to the above. Here's how: In Book View select some text in italics. Go to Code View note the tag applied. Example: some text <span class="italic calibre7"> some text </span> some text In THIS case use: Code:
Find: <span class="italic calibre7">Code:
<p><img alt="" class="calibre32" src="../Images/00002.jpg" /></p>-------------------------------------------------------------- INSERT HEADING TAGS PLUS CHAPTER SPLITS IF REQUIRED -------------------------------------------------------------- If the word 'CHAPTER' or 'Chapter' is present Note: if 'class' and 'span' are still present they will be removed. Here's some examples of some original code: Code:
<p>Chapter 5</p>Here's what happens to the first example:- Code:
<h2>Chapter 5</2>Here's what happens to the second example:- Code:
<hr class="sigilChapterBreak" /><h2>Chapter five</2>Code:
Find: <(p|h\d)[^>]*>(<span[^>]*>)*((Chapt|CHAPT|chapt)[^</]*)(</span>)*<(/p|/h\d)>To retain all the classes and spans use the following Find and Replace: Code:
Find: <p([^>]*>(<span[^>]*>)*(Chapt|CHAPT|chapt)[^</]*(</span>)*</)p>Code:
<p>2</p> Code:
Find: <(p|h\d)[^>]*>(<span[^>]*>)*((\d+)[^</]*)(</span>)*<(/p|/h\d)>Code:
<p class="calibre4"><span class="calibre7">5</span></span></p>Code:
<hr class="sigilChapterBreak" /><h2>Chapter 2</2>Use the same Find code as above: Code:
Find: <(p|h\d)[^>]*>(<span[^>]*>)*((\d+)[^</]*)(</span>)*<(/p|/h\d)>To retain all the classes and spans use the following Find and Replace: Code:
Find: <p([^>]*>(<span[^>]*>)*(\d+)[^</]*(</span>)*</)p>The CHAPTER heading shown by NUMBERS in WORDs only This finds a single word or hyphenated words with no spaces in p... /p or hx.../hx tags eg Two, Thirty, Forty-five, Sixty-Nine - and puts it in Heading 2 tags Here's some examples of original code: Code:
<p>Twenty-one</p>Code:
<h2>Twenty-one</h2>Code:
<h2 class="chapterNumber" id="heading_id_2" style="text-indent: 0%;"><span class="bold">Thirty-four</span></h2>Code:
<h2>Thirty-four</h2>Code:
Find: <(p|h\d)[^>]*>(<span[^>]*>)*([A-z]+[\-]?([a-z]*)?)(</span>)*<(/p|/h\d)>To retain all the classes and spans use the following Find and Replace: Code:
Find: <p([^>]*>(<span[^>]*>)*[A-z]+[\-]?([a-z]*)?(</span>)*</)p>INCLUDING THE NEW STYLESHEET ------------------------------------------------------------- This removes style descriptions from <head>_</head> and replace it with the link to the stylesheet. (Remember 'All HTML Files'!) Code:
Find: (<style)[^</style>].*(</style>)Here it is again briefly to put in Notepad for Copy/Pasting: -------------------------------------------------------------- Code:
Delete all Calibre tagsYour best approach is to grab a copy of an epub with lots of 'calibre' tags and have fun trying out the expressions. Remember to use the down arrows at the right-hand side of the Find and Replace boxes to recall recently used expressions. |
Thanks for sharing this!!!
|
I am new to regex, limping along at the most basic level. This will both be useful and instructive.
Many thanks - John |
| All times are GMT -4. The time now is 10:51 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.