Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 07-04-2019, 04:45 AM   #1
ogassav
Junior Member
ogassav began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jul 2019
Device: Android
Question PDF -> ePUB: deleting <BR>s Best Practices

Dear All,

I'm new to Calibre, however those of you who are not surely know about the problem of broken lines when converting PDF to ePUB. <BR> codes appear wherever they want to and split text into thousands of passages which looks weird.

This article (https://dearauthor.com/ebooks/calibr...nversion-tips/) suggests using Heuristic Processing during conversion to get rid of <BR>s, but it didn't work for me - I used the range from 0.4 to 0.6 with absolutely no result.

The same article proposes to use Search & Replace function and it was a solution in my case! I used the following logic: \. +<br>(*SKIP)(*FAIL)|\<br>|\d +<br>

I assumed that <BR>s after dot (".") were an author-defined start of the new passage, so i didn't touch them (\. +<br>(*SKIP)), while standalone <BR>s (\<br>) and <BR>s which follow any word (\d +<br>) were replaced with nothing (= deleted), as almost always they were breaking sentence into useless passages.

Everything would have been prefectly fine, except one thing: the above-mentioned algorythm deletes "useful" <BR>s after headlines, which are usually highlighted with <b> code (<b>THIS IS HEADLINE </b><br>) and paragraphs (chapters???), which are highlighted with <a id> code (<a id="p8"></a> <br>).

So, what I need is to add an exception to my algorythm so that <BR>s are not deleted when they follow </a> and </b> codes. I played around with quite a number of different variants, but still can't find my Grails. Possibly (*SKIP)(*FAIL) architecture does not suppose multiple skip logic: I ignore 1 parameter from the very beginning and want to add 2 more - so finally 3 in total.

Any thoughts?

Last edited by ogassav; 07-04-2019 at 04:49 AM.
ogassav is offline   Reply With Quote
Old 07-04-2019, 09:55 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,370
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
My opinion is to NOT try and clean complex issues with conversion. Convert to EPUB or AZW3 and use the editor Search and replace to SELECTIVELY remove BR's (some are wanted, like in the headings). Then there may also be the case of BR BR, which may be a scene break and need a different treatment (do this first, then the singles)
theducks is offline   Reply With Quote
Advert
Old 07-05-2019, 02:26 AM   #3
ogassav
Junior Member
ogassav began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jul 2019
Device: Android
Dear theducks,

while i totally agree with you regarding flaws of "bulk" removement of BRs with Search&Replace function, i'm fine with certain mistakes left in the text, as it is supposed for my personal use only.

Do you have an idea of implementation of additional skip logic to the formula i've mentioned above?

Last edited by ogassav; 07-05-2019 at 02:28 AM.
ogassav is offline   Reply With Quote
Old 07-05-2019, 09:02 AM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,370
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by ogassav View Post
Dear theducks,


Do you have an idea of implementation of additional skip logic to the formula i've mentioned above?
Nope.
I had no reason to develop automated tools. I have a Library of saved searches (in Sigil) that I draw from (past efforts ) since it seems every books needs something slightly different anyway.
theducks is offline   Reply With Quote
Old 07-05-2019, 09:49 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,336
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
what you need for this kind of thing are look behind assertions in the regular expression.
kovidgoyal is offline   Reply With Quote
Advert
Old 07-05-2019, 12:47 PM   #6
ogassav
Junior Member
ogassav began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jul 2019
Device: Android
Quote:
Originally Posted by kovidgoyal View Post
what you need for this kind of thing are look behind assertions in the regular expression.
Mmm, are they described in Calibre help somewhere? Couldn't find them. Google said these assertions are used in Java and Python and i'm not a programmer at all...
ogassav is offline   Reply With Quote
Old 07-05-2019, 02:01 PM   #7
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,370
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by ogassav View Post
Mmm, are they described in Calibre help somewhere? Couldn't find them. Google said these assertions are used in Java and Python and i'm not a programmer at all...
They are PCRE flavor of REGEX. That is where you look.
There is a app called Regex buddy (for Windows) It ain't free ($40), but if you are short on hair
theducks is offline   Reply With Quote
Old 07-05-2019, 10:12 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,336
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
https://manual.calibre-ebook.com/regexp.html
kovidgoyal is offline   Reply With Quote
Old 07-06-2019, 03:28 AM   #9
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,165
Karma: 1410083
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
in addition this is as well helpful:
https://www.regular-expressions.info/lookaround.html
Divingduck is offline   Reply With Quote
Old 07-06-2019, 03:31 AM   #10
ogassav
Junior Member
ogassav began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jul 2019
Device: Android
OK guys, looks like there's misunderstsanding here. I perfectly know what i need to implement in my formula: the logic which excludes 2 types of <BR>s. Call it skip logic, look behind assertions, ignore principles - whatever.

The problem is that i don't know how to translate this logic into Calibre language of regular expressions. So finally, the message of my post is "Is there anyone familiar with this kinda programming here? I've worked on some formula and got stuck on a certain stage - need your help badly". And believe me i've studied Calibre language help already and tried several variants with no result and i've wrote it in my very first post - so i tried to do something myself before asking for help, so just pushing me in the direction of User Manual is not what i really expect from the community in cases like this.

Last edited by ogassav; 07-06-2019 at 03:34 AM.
ogassav is offline   Reply With Quote
Old 07-06-2019, 10:27 AM   #11
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,370
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Calibre uses the PCRE dialect of REGEX
theducks is offline   Reply With Quote
Old 07-13-2019, 08:51 AM   #12
deback
Book E d i t o r
deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.
 
Posts: 432
Karma: 288184
Join Date: May 2015
Device: Laptop
Quote:
This article (https://dearauthor.com/ebooks/calibr...nversion-tips/) suggests using Heuristic Processing during conversion to get rid of <BR>s, but it didn't work for me - I used the range from 0.4 to 0.6 with absolutely no result.
Try using 0.22 as the factor under Heuristic Processing. You should see a difference in the result regarding split paragraphs.
deback is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
My first EPUB! Need advice on best practices fluoresce ePub 31 05-03-2017 11:08 AM
Page Margin Best Practices epub->mobi BKh Conversion 0 08-09-2012 12:11 PM
TOC best practices (InDesign to ePUb) virtual_ink ePub 3 07-03-2011 01:50 PM
Converting cyrillic files to epub, best practices? Fking Calibre 6 01-09-2011 06:06 AM
EPUB best practices guide Bob Russell ePub 25 04-01-2008 08:36 AM


All times are GMT -4. The time now is 12:24 PM.


MobileRead.com is a privately owned, operated and funded community.