Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 06-28-2018, 01:02 PM   #1
soucedz
Junior Member
soucedz began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jun 2018
Device: Kobo Aura 2nd Edition
Unhappy Book editor help - function for EPUB correction

Hey guys, i need help (since i don't know anything about Pyhton or RegEx functions) with creating a function that eliminates the unnecessary paragraphs that occur when converting PDFs to EPUB.
i have tried using Find&Replace with a simple expression like:

</p> <p class="calibre2">[a-z]

since correct paragraphs are succeeded with a capital letter, but the problem is that i don't want it to select the matched lower case letter, i tried something like:

</p> <p class="calibre2">?([a-z])

But the matched lower case letter still gets selected.

Thanks in advance.
soucedz is offline   Reply With Quote
Old 06-28-2018, 02:52 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,054
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
I have a series of 'cleanups' I use
Note: This is copied from Sigils saved search, ignore leading numbers an the line with Name= (describes what it does) and the escape before the \ (\\) should be a Single\
Code:
80\Name=Cleanup/Joins/Join to lower
80\Find="([[:alpha:],]\x201d*)</p>\\s*<p\\b[^>]*>([a-z\x201c])"
80\Replace=\\1 \\2
81\Name=Cleanup/Joins/Join to upper
81\Find="([[:alpha:],]\x201d*)</p>\\s*<p\\b[^>]*>([A-Z\x201c])"
81\Replace=\\1 \\2
87\Name=Cleanup/Joins/Honorifics
87\Find="(Mr|Mrs|Ms|Dr|Prof)\\.</p>\\s+<p class=\"calibre\\d+\">([A-Z])"
87\Replace=\\1. \\2
88\Name=Cleanup/Joins/de BR w/punct
88\Find="([[:punct:]])<br class=\"calibre4\" />\\s+(\"*[A-Za-z\x201c])"
88\Replace="\\1</p><p class=\"calibre4\">\\2"
* adjust the RED
Note: I kept it simple and replace the capture

(Wishlist PI: Import Sigil saved searches)
theducks is online now   Reply With Quote
Advert
Old 06-29-2018, 02:04 PM   #3
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,166
Karma: 1410083
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
It is maybe better to check your conversion preferences first. Your problem is a very common issue for a wrong conversion setup for PDF.

Reduce the standard line unwrapping factor of 0.45 at PDF input preferences to a value between 0.25 to 0.12

You will find out that this will reduce the most of your problem to a minimum.
Divingduck is offline   Reply With Quote
Old 06-29-2018, 02:36 PM   #4
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 46,190
Karma: 168983734
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by soucedz View Post
Hey guys, i need help (since i don't know anything about Pyhton or RegEx functions) with creating a function that eliminates the unnecessary paragraphs that occur when converting PDFs to EPUB.
i have tried using Find&Replace with a simple expression like:

</p> <p class="calibre2">[a-z]

since correct paragraphs are succeeded with a capital letter, but the problem is that i don't want it to select the matched lower case letter, i tried something like:

</p> <p class="calibre2">?([a-z])

But the matched lower case letter still gets selected.

Thanks in advance.
I've used a regex similar to your second example with the replacement string being " \1" (a space followed by whatever lower case letter was selected in the search).
DNSB is offline   Reply With Quote
Old 06-30-2018, 06:23 PM   #5
deback
Book E d i t o r
deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.
 
Posts: 432
Karma: 288184
Join Date: May 2015
Device: Laptop
"Reduce the standard line unwrapping factor of 0.45 at PDF input preferences to a value between 0.25 to 0.12."

Enable Heuristics and change the line unwrap factor to 0.22. This will help to keep paragraphs together, so editing will be minimal.
deback is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Using the Editor function to activate links? roger64 Editor 7 01-17-2016 12:09 AM
Function mode in editor S&R -- coming soon eschwartz Editor 12 11-21-2014 08:26 AM
Error in function mode in editor S&R jbacelar Editor 3 11-21-2014 05:34 AM
Book Editor TOC Editor Isue? weberr Editor 2 04-17-2014 11:13 AM
Can the kindle 3 be used as a text editor with copy/paste function somehow? kinkle Amazon Kindle 3 05-19-2011 10:50 AM


All times are GMT -4. The time now is 01:06 AM.


MobileRead.com is a privately owned, operated and funded community.