Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 06-15-2014, 04:28 PM   #1
R71986
Member
R71986 began at the beginning.
 
Posts: 12
Karma: 10
Join Date: May 2014
Device: None
Narrow text width, double spacing

The book came in pdf format. I converted to epub. Both versions show a narrow text width of about 6 words. Is there an easy, reliable and quick way to make all the line lengths fit the screen only where they should be longer than the screen width? (Some of the lines should be very short such as "it arrived" others need to be full width.)

For some reason the line spacing is double. That I would like to reduce to single.
R71986 is offline   Reply With Quote
Old 06-17-2014, 08:23 PM   #2
R71986
Member
R71986 began at the beginning.
 
Posts: 12
Karma: 10
Join Date: May 2014
Device: None
I looked in the manual and I could not see an easy way to format the book which has come me in double spacing and very short lines. is there a way to do it that does not involve editing every line manually?
R71986 is offline   Reply With Quote
Advert
Old 06-17-2014, 08:24 PM   #3
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,786
Karma: 146391129
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Can you post a sample of the HTML code?
JSWolf is online now   Reply With Quote
Old 06-17-2014, 09:10 PM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,068
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
When you converted the PDF, you need to try a slight smaller 'Line unwrap factor' (on the conversion screen form)
LUF tries
to join lines

so you should now see: LUF tries to join lines

If it was .45 > .42 and see what that produces. It may take a few tries, Overkill is not a good idea
theducks is online now   Reply With Quote
Old 06-19-2014, 12:38 PM   #5
R71986
Member
R71986 began at the beginning.
 
Posts: 12
Karma: 10
Join Date: May 2014
Device: None
Sample HTML

This is from the file browser that shows the epub html that was converted from the pdf. Note also the page numbers that appear in the text.

Code:
<p class="calibre1">He was invited to stay for dinner</p>
<p class="calibre1">but he was expected with his sister</p>
<p class="calibre1">and her family out in the yuppie</p>
<p class="calibre1">188/1467</p>
<p class="calibre1">That morning he had also had an</p>
<p class="calibre1">invitation to celebrate Christmas</p>
<p class="calibre1">jöbaden. He said no, but thank you, </p>
<p class="calibre1">certain that there was a limit to</p>
<p class="calibre1">Beckman’s indulgence and quite</p>
<p class="calibre1">sure that he had no ambition to find</p>
<p class="calibre1">out what that limit might be. </p>
<p class="calibre1">Instead he was knocking on the</p>
<p class="calibre1">door where Annika Blomkvist, now</p>
<p class="calibre1">Italian-born husband and their two</p>
<p class="calibre1">children. With a platoon of her hus-</p>
<p class="calibre1">band’s relatives, they were about to</p>
<p class="calibre1">carve the Christmas ham. During</p>
<p class="calibre1">dinner he answered questions about</p>
<p class="calibre1">the trial and received much well-</p>
<p class="calibre1">meaning and quite useless advice. </p>
<p class="calibre1">The only one who had nothing to</p>
<p class="calibre1">say about the verdict was his sister, </p>
<p class="calibre1">although she was the only lawyer in</p>
<p class="calibre1">189/1467</p>
<p class="calibre1">the room. She had worked as clerk</p>
<p class="calibre1">of a district court and as an assistant</p>
<p class="calibre1">prosecutor for several years before</p>
<p class="calibre1">she and three colleagues opened a</p>
<p class="calibre1">law firm of their own with offices on</p>
<p class="calibre1">having taken stock of its happening, </p>
<p class="calibre1">his little sister began to appear in</p>
<p class="calibre1">newspapers as representing battered</p>
<p class="calibre1">or threatened women, and on panel</p>
<p class="calibre1">discussions on TV as a feminist and</p>
<p class="calibre1">wome
R71986 is offline   Reply With Quote
Advert
Old 06-19-2014, 12:45 PM   #6
R71986
Member
R71986 began at the beginning.
 
Posts: 12
Karma: 10
Join Date: May 2014
Device: None
PDF formatting

Quote:
Originally Posted by theducks View Post
When you converted the PDF, you need to try a slight smaller 'Line unwrap factor' (on the conversion screen form)
LUF tries
to join lines

so you should now see: LUF tries to join lines

If it was .45 > .42 and see what that produces. It may take a few tries, Overkill is not a good idea
Looking at the pdf it too has very short lines and double spacing! I hoped there was some tools to sort this out automatically but I have a voice in my head that says it can not be done if the original is the same.

This is from the pdf

Quote:
“Handwriting?”

“Same as always, all in capitals.

Upright, neat lettering.”

With that, the subject was ex-

hausted, and not another word was

exchanged for almost a minute. The

retired policeman leaned back in his

kitchen chair and drew on his pipe.

He knew he was no longer expected

to come up with a pithy comment or

any sharp question which would

10/1467
R71986 is offline   Reply With Quote
Old 06-19-2014, 12:53 PM   #7
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
Normally if you enable heuristic processing and play with reducing the line-unwrap value then the conversion from PDF can handle combine short lines in a satisfactory way.
itimpi is offline   Reply With Quote
Old 06-19-2014, 12:59 PM   #8
R71986
Member
R71986 began at the beginning.
 
Posts: 12
Karma: 10
Join Date: May 2014
Device: None
Heuristics

Quote:
Originally Posted by itimpi View Post
Normally if you enable heuristic processing and play with reducing the line-unwrap value then the conversion from PDF can handle combine short lines in a satisfactory way.
Could you please tell me where the heuristic setting and the line unwrap value is to be found?

Based on the sample what line-unwrap value should I first try?
R71986 is offline   Reply With Quote
Old 06-19-2014, 01:05 PM   #9
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
Quote:
Originally Posted by R71986 View Post
Could you please tell me where the heuristic setting and the line unwrap value is to be found?

Based on the sample what line-unwrap value should I first try?
When you select a book and click on the convert option it brings up the Convert dialog. Heuristic processing is one of the areas shown in the left-hand panel for which you can set conversion settings. Tick the 'Enable Heuristic processing' checkbox to enable the other settings related to heuristic processing. The line unwrap setting normally starts at 0.40 , so try reducing it from this (e.g. 0.35) and see if it improves the conversion.
itimpi is offline   Reply With Quote
Old 06-19-2014, 04:54 PM   #10
R71986
Member
R71986 began at the beginning.
 
Posts: 12
Karma: 10
Join Date: May 2014
Device: None
Quote:
Originally Posted by itimpi View Post
When you select a book and click on the convert option it brings up the Convert dialog. Heuristic processing is one of the areas shown in the left-hand panel for which you can set conversion settings. Tick the 'Enable Heuristic processing' checkbox to enable the other settings related to heuristic processing. The line unwrap setting normally starts at 0.40 , so try reducing it from this (e.g. 0.35) and see if it improves the conversion.
The result is a bit eratic. I of course tried your heuristic setting gradually bringing it down to 20 but it still produced this. Is there anything else i can do?

Quote:
that they had to be asked and answered. This is how it is to be a criminal, he thought. On the other side of

the

microphone.

He

straightened up and tried to smile.

The reporters gave him friendly, almost embarrassed greetings.

“Let’s

see … Aftonbladet,

Ex-

pressen, TT wire service, TV4, and … where are you from? … ah yes, Dagens Nyheter. I must be a celebrity,”
R71986 is offline   Reply With Quote
Old 06-19-2014, 08:18 PM   #11
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,068
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
That is the perils of PDF

There is no perfect conversion.
You are lucky if there is even a close conversion .

May your REGEX foo get stronger because that is what you need (using the Editor or Sigil)

Pass after pass of carefully thought out Searches (If you do them in the wrong order, you make later pattern matches more difficult.
First I would remove the standalone page number lines.

Then I would remove the mostest junk that can be done with a single pattern.
BACKUP before each new cleaning pass in case you get it WRONG (discard the current bad edit)
theducks is online now   Reply With Quote
Old 06-20-2014, 12:02 PM   #12
R71986
Member
R71986 began at the beginning.
 
Posts: 12
Karma: 10
Join Date: May 2014
Device: None
The longer lines are acceptable now.

I managed to remove all the inbuilt page numbers by repeating the same regex with one less \d each time I ran the replace all. The replacements are shockingly fast.

I don't really understand how to do the other cleaning up of the page I show you at comment 10 above. What type of regex will distinguish between a single word that should be the only one the line such as "Hello" and in other cases the single word should be joined to the next line?

One way would be to join all words to one continuous line until a full stop is found, but is that level of control possible?
R71986 is offline   Reply With Quote
Old 06-20-2014, 12:20 PM   #13
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,068
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by R71986 View Post
The longer lines are acceptable now.

I managed to remove all the inbuilt page numbers by repeating the same regex with one less \d each time I ran the replace all. The replacements are shockingly fast.

I don't really understand how to do the other cleaning up of the page I show you at comment 10 above. What type of regex will distinguish between a single word that should be the only one the line such as "Hello" and in other cases the single word should be joined to the next line?

One way would be to join all words to one continuous line until a full stop is found, but is that level of control possible?
(NB Work in code view when I use Sigil, Calibre removes the BV temptation )

I have 5 different 'Join' saved searches. 3 are basic, run as is. 2 more need to be tweaked by case-by-case because they need to match the current class= portion to fine tune greediness

Almost all are run in Replace Next (Find to skip this one) mode
Line ending in Hyphen removal is a plague.
It could be a hyphenated word: join with no space or it could be pseudo em dash (--) where context is everything in the break/nobreak decision

There are examples in the stickies (and other places) over in Sigil. For the most part they also work in Calibre
theducks is online now   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Maintaining double spacing question ralphiedee Sigil 17 12-04-2012 08:16 PM
expression to remove double spacing between paragraphs ktj Calibre 4 07-26-2011 02:38 PM
Double Spacing Jafo Calibre 3 12-31-2010 10:47 AM
ePub double spacing leebase Calibre 5 03-30-2010 03:42 PM
.pdf file and Double Spacing output holguinero PDF 0 10-05-2009 12:14 PM


All times are GMT -4. The time now is 09:26 PM.


MobileRead.com is a privately owned, operated and funded community.