MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   I need help with text wrap... (https://www.mobileread.com/forums/showthread.php?t=207585)

skammer1974 03-06-2013 02:27 PM

I need help with text wrap...
 
Would someone assist me in fixing the text wrap issue I am having when I bring in epub file converted from PDF using Calibre. For example, here is a sample:

text text text text text text text text text text
text text text text text te
xt text text text text text t
ext text text

If you notice the text are broke apart at the right margin. Is there a way to fix this for efficiently without having to use the backspace keys?

I hope this makes sense. skammer1974

Doitsu 03-06-2013 03:25 PM

Quote:

Originally Posted by skammer1974 (Post 2446330)
Would someone assist me in fixing the text wrap issue I am having when I bring in epub file converted from PDF using Calibre.

You most likely didn't activate the Heuristisc Processing options (Heuristic Processing > Enable heuristic Processing), which are by default disabled. Try to convert the .pdf file again with all Heuristic options enabled.

This should take care of some of the issues.

Toxaris 03-06-2013 05:36 PM

Also learn the basics of RegEx S&R. That will help.

Tex2002ans 03-06-2013 08:28 PM

Converting a document from PDF is the WORST case scenario. It will take a lot of elbow grease to fix the document after conversion.

I personally use these two Regexes to help combine broken paragraphs:

Search #1:

Code:

-</p>\s+<p>
Replace #1: (empty)

Search #2:

Code:

([^>”\?\!\.])</p>\s+<p>
Replace #2: (a space is following the 1)

Code:

\1
Search #1 will take a line that ends with a hyphen, erase the hyphen, and combine it with the next line (you may/may not want to keep the hyphen, I replace one at a time to make sure the hyphen is not needed).

Search #2 will look for a paragraph NOT ending with any of the characters in red, and will combine it with the next paragraph.

For cleaning up directly from calibre's output you may need to use these Regexes for search instead:

Code:

-</p>\s+<p class="calibre[0-9]+">
Code:

([^>”\?\!\.])</p>\s+<p class="calibre[0-9]+">


All times are GMT -4. The time now is 08:03 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.