Book editor help - function for EPUB correction

soucedz · 06-28-2018, 02:02 PM

Hey guys, i need help (since i don't know anything about Pyhton or RegEx functions) with creating a function that eliminates the unnecessary paragraphs that occur when converting PDFs to EPUB.
i have tried using Find&Replace with a simple expression like:

 [a-z]

since correct paragraphs are succeeded with a capital letter, but the problem is that i don't want it to select the matched lower case letter, i tried something like:

 ?([a-z])

But the matched lower case letter still gets selected.

Thanks in advance.

theducks · 06-28-2018, 03:52 PM

I have a series of 'cleanups' I use
Note: This is copied from Sigils saved search, ignore leading numbers an the line with Name= (describes what it does) and the escape before the \ (\\) should be a Single\

Code:

80\Name=Cleanup/Joins/Join to lower
80\Find="([[:alpha:],]\x201d*)</p>\\s*<p\\b[^>]*>([a-z\x201c])"
80\Replace=\\1 \\2
81\Name=Cleanup/Joins/Join to upper
81\Find="([[:alpha:],]\x201d*)</p>\\s*<p\\b[^>]*>([A-Z\x201c])"
81\Replace=\\1 \\2
87\Name=Cleanup/Joins/Honorifics
87\Find="(Mr|Mrs|Ms|Dr|Prof)\\.</p>\\s+<p class=\"calibre\\d+\">([A-Z])"
87\Replace=\\1. \\2
88\Name=Cleanup/Joins/de BR w/punct
88\Find="([[:punct:]])<br class=\"calibre4\" />\\s+(\"*[A-Za-z\x201c])"
88\Replace="\\1</p><p class=\"calibre4\">\\2"

* adjust the RED
Note: I kept it simple and replace the capture

(Wishlist PI: Import Sigil saved searches)

Divingduck · 06-29-2018, 03:04 PM

It is maybe better to check your conversion preferences first. Your problem is a very common issue for a wrong conversion setup for PDF.

Reduce the standard line unwrapping factor of 0.45 at PDF input preferences to a value between 0.25 to 0.12

You will find out that this will reduce the most of your problem to a minimum.

DNSB · 06-29-2018, 03:36 PM

Quote:

Originally Posted by soucedz

Hey guys, i need help (since i don't know anything about Pyhton or RegEx functions) with creating a function that eliminates the unnecessary paragraphs that occur when converting PDFs to EPUB.
i have tried using Find&Replace with a simple expression like:

 [a-z]

since correct paragraphs are succeeded with a capital letter, but the problem is that i don't want it to select the matched lower case letter, i tried something like:

 ?([a-z])

But the matched lower case letter still gets selected.

Thanks in advance.

I've used a regex similar to your second example with the replacement string being " \1" (a space followed by whatever lower case letter was selected in the search).

deback · 06-30-2018, 07:23 PM

"Reduce the standard line unwrapping factor of 0.45 at PDF input preferences to a value between 0.25 to 0.12."

Enable Heuristics and change the line unwrap factor to 0.22. This will help to keep paragraphs together, so editing will be minimal.

06-28-2018, 02:02 PM	#1
soucedz Junior Member Posts: 1 Karma: 10 Join Date: Jun 2018 Device: Kobo Aura 2nd Edition	Book editor help - function for EPUB correction Hey guys, i need help (since i don't know anything about Pyhton or RegEx functions) with creating a function that eliminates the unnecessary paragraphs that occur when converting PDFs to EPUB. i have tried using Find&Replace with a simple expression like: </p> <p class="calibre2">[a-z] since correct paragraphs are succeeded with a capital letter, but the problem is that i don't want it to select the matched lower case letter, i tried something like: </p> <p class="calibre2">?([a-z]) But the matched lower case letter still gets selected. Thanks in advance.

06-28-2018, 03:52 PM	#2
theducks Well trained by Cats Posts: 31,270 Karma: 61916422 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A	I have a series of 'cleanups' I use Note: This is copied from Sigils saved search, ignore leading numbers an the line with Name= (describes what it does) and the escape before the \ (\\) should be a Single\ Code: 80\Name=Cleanup/Joins/Join to lower 80\Find="([[:alpha:],]\x201d)</p>\\s<p\\b[^>]>([a-z\x201c])" 80\Replace=\\1 \\2 81\Name=Cleanup/Joins/Join to upper 81\Find="([[:alpha:],]\x201d)</p>\\s<p\\b[^>]>([A-Z\x201c])" 81\Replace=\\1 \\2 87\Name=Cleanup/Joins/Honorifics 87\Find="(Mr\|Mrs\|Ms\|Dr\|Prof)\\.</p>\\s+<p class=\"calibre\\d+\">([A-Z])" 87\Replace=\\1. \\2 88\Name=Cleanup/Joins/de BR w/punct 88\Find="([[:punct:]])<br class=\"calibre4\" />\\s+(\"[A-Za-z\x201c])" 88\Replace="\\1</p><p class=\"calibre4\">\\2" adjust the RED Note: I kept it simple and replace the capture (Wishlist PI: Import Sigil saved searches)

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Using the Editor function to activate links?	roger64	Editor	7	01-17-2016 01:09 AM
Function mode in editor S&R -- coming soon	eschwartz	Editor	12	11-21-2014 09:26 AM
Error in function mode in editor S&R	jbacelar	Editor	3	11-21-2014 06:34 AM
Book Editor TOC Editor Isue?	weberr	Editor	2	04-17-2014 12:13 PM
Can the kindle 3 be used as a text editor with copy/paste function somehow?	kinkle	Amazon Kindle	3	05-19-2011 11:50 AM

06-29-2018, 03:04 PM	#3
Divingduck Wizard Posts: 1,166 Karma: 1410083 Join Date: Nov 2010 Location: Germany Device: Sony PRS-650	It is maybe better to check your conversion preferences first. Your problem is a very common issue for a wrong conversion setup for PDF. Reduce the standard line unwrapping factor of 0.45 at PDF input preferences to a value between 0.25 to 0.12 You will find out that this will reduce the most of your problem to a minimum.

06-30-2018, 07:23 PM	#5
deback Book E d i t o r Posts: 432 Karma: 288184 Join Date: May 2015 Device: Laptop	"Reduce the standard line unwrapping factor of 0.45 at PDF input preferences to a value between 0.25 to 0.12." Enable Heuristics and change the line unwrap factor to 0.22. This will help to keep paragraphs together, so editing will be minimal.

Advert

Advert