Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 12-27-2011, 03:46 PM   #1
mkelley
Enthusiast
mkelley began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
Convert from PDF to ePub puts paragraph spaces

I've got some very fine PDF files coming in to Calibre but when I try and convert them to ePub it ends up spacing between paragraphs, which is what I *don't* want.

I've tried checkmarking the "Remove Spacing between paragraphs" (even though the original PDF has no spacing) but that doesn't help. When I look at the PDFs in Calibre they look find (no spacing) so I'm not sure what's going on here -- doing other file conversions (like LIT to ePub) don't have this problem.

Any ideas?
mkelley is offline   Reply With Quote
Old 12-27-2011, 03:55 PM   #2
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,854
Karma: 5654321
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by mkelley View Post
I've got some very fine PDF files coming in to Calibre but when I try and convert them to ePub it ends up spacing between paragraphs, which is what I *don't* want.

I've tried checkmarking the "Remove Spacing between paragraphs" (even though the original PDF has no spacing) but that doesn't help. When I look at the PDFs in Calibre they look find (no spacing) so I'm not sure what's going on here -- doing other file conversions (like LIT to ePub) don't have this problem.

Any ideas?
Read the sticky (about PDF) at the top of this forum
theducks is offline   Reply With Quote
Old 12-27-2011, 05:24 PM   #3
mkelley
Enthusiast
mkelley began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
Uh, not to sound stupid (but I *feel* stupid) but could you be a little more specific? There are three stickies at the top of the Calibre sub-forum but none about PDF. In the PDF forum there are TONS of stickies (and I don't even know where to start).
mkelley is offline   Reply With Quote
Old 12-27-2011, 05:28 PM   #4
mkelley
Enthusiast
mkelley began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
Just from reading *some* of those stickies I'm guessing the idea is that a PDF doesn't contain true paragraph breaks. If so, I understand that much, but what I don't understand is why Calibre converts the end of a paragraph into a break PLUS a line break.

IOW, it would work perfectly if it just ignored the line break (or at least didn't pad the paragraph break with a space). It's intelligent enough to know when to break the paragraph (it's doing that consistently throughout my PDF file) but not enough to know not to add a line break as well?
mkelley is offline   Reply With Quote
Old 12-27-2011, 07:48 PM   #5
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,834
Karma: 4199513
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O
Calibre probably isn't adding linebreaks it's just creating css for your paragraphs with large top and bottom margins, probably something like:
.calibrenn {
display: block;
margin-bottom: 1em;
margin-left: 0;
margin-right: 0;
margin-top: 1em
}
You could manually edit these to zero.

However, the easier option is to try running your conversion with Convert - Look&Feel - Remove spacing between paragraphs checked. See if you like the resulting epub better.
jackie_w is offline   Reply With Quote
Old 12-27-2011, 07:56 PM   #6
mkelley
Enthusiast
mkelley began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
I already tried removing spacing (see my first message) but that doesn't do anything (most likely because there aren't any paragraph breaks of the <p or <div> kind).

And it's getting even more complicated -- I went to using ODT files (on the theory that PDFs were somehow just evil :>) and now I can get rid of the paragraph breaks (by editing in Open Office the ODT file) but I can't now get a paragraph indent -- the one that OO puts in there is ignored by any conversion, and there is no way to add one (because, again, the only way you can get an indent is to remove spacing, and I don't have any spacing).

Sigh -- this is very complicated and seemingly needlessly so.
mkelley is offline   Reply With Quote
Old 12-27-2011, 08:03 PM   #7
mkelley
Enthusiast
mkelley began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
And how do I edit the paragraph css -- I don't see this as an option anywhere in Calibre (but I'm pretty stupid)?
mkelley is offline   Reply With Quote
Old 12-27-2011, 08:17 PM   #8
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,834
Karma: 4199513
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O
Quote:
Originally Posted by mkelley View Post
And how do I edit the paragraph css -- I don't see this as an option anywhere in Calibre (but I'm pretty stupid)?
Sorry, mkelley, I failed to read your OP properly and my response wasn't helpful. My only excuse is that it's 1am.

There's nothing automated, but if you use the Tweak epub option (just highlight the epub and press T) your epub will be exploded into its constituent parts. You can view the .css file in your preferred text editor. You should probably open up some of the html content files in the text editor as well so you can see how the paragraphs have been constructed. Of course, this does pre-suppose that you know a little about html and css.

PDFs can be very difficult to convert satifactorily. There is no magic method, but if you've read the PDF sticky I guess you already know that.
jackie_w is offline   Reply With Quote
Old 12-27-2011, 08:17 PM   #9
mkelley
Enthusiast
mkelley began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
Somehow I found out how to look at the ePub document in xhml style (I think) and it's showing paragraph breaks (<p>) between paragraphs. So I don't get it -- I don't know why it's inserting them, and I don't know how to get rid of them (I'm off now to try and convert the ePub back to something and then back to ePub -- I feel like Alice in Wonderland. This is more confusing than programming).
mkelley is offline   Reply With Quote
Old 12-27-2011, 08:17 PM   #10
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 37,682
Karma: 18475502
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Sony Reader PRS-650, iPad, nook STR
Quote:
Originally Posted by mkelley View Post
And how do I edit the paragraph css -- I don't see this as an option anywhere in Calibre (but I'm pretty stupid)?
Go to the Sigil forum and read about Sigil. Then download and install Sigil. Take the ePub with the paragraph spaces and load it into Sigil and you can then edit the ePub to fix the paragraph spaces.
JSWolf is offline   Reply With Quote
Old 12-27-2011, 08:22 PM   #11
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,834
Karma: 4199513
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O
If the unwanted blank lines are always coded exactly the same way you could investigate the Convert - Search&Replace option to try and delete them during the conversion. A little Regex knowledge is usually required.
jackie_w is offline   Reply With Quote
Old 12-27-2011, 08:24 PM   #12
mkelley
Enthusiast
mkelley began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
Hey, if it's 1am there you ought to go to bed (I know I would).

It appears I have two choices -- I can remove all the paragraph breaks from the ODT file and this does convert just fine to an ePub document without paragraph breaks between paragraphs... except there is no indent at the start of each paragraph.

I can fix that by adding one paragraph break in the ODT file but then I get the indent AND an additional paragraph break in the resultant ePub file. So I'm darned if I do and darned if I don't. There doesn't appear to be any way to eat your cake and have it too.

I'm guessing there is one additional thing I don't know, but I'm going to look at some of the files which ARE correct and see if I can figure out what it is (my guess would be some sort of indent character that needs to be added somewhere).
mkelley is offline   Reply With Quote
Old 12-27-2011, 08:26 PM   #13
mkelley
Enthusiast
mkelley began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
Sorry, I cross posted to both of your replies.

I'll look at Sigil and see but first I'm going to try and pursue why the indent isn't working. Thanks for your help, though.
mkelley is offline   Reply With Quote
Old 12-27-2011, 08:50 PM   #14
mkelley
Enthusiast
mkelley began at the beginning.
 
Posts: 26
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
Ah -- don't worry about it. I think I got is sussed out.

While this would only be of interest to idiots like me struggling with this, I'll still put it down in case some other clueless individual finds this thread. I was using OpenOffice's add in Alternative Searching in order to get rid of extraneous paragraph feeds. The problem is that's it's pretty non-standard in how it refers to things (there are three different ways to refer to a paragraph, for example, among them end of paragraph and empty paragraph, both listed BEFORE just plain ole paragraph).

The bottom line is that as long as I made sure each paragraph had a paragraph tag (in OO Alternative searching it's a /p and represented on screen by the typographers big "P" with double ll on it) and then checkmarked "remove paragraph" with a proper indent in Calibre's conversion I end up with what I want.

Phew!
mkelley is offline   Reply With Quote
Old 12-28-2011, 07:33 AM   #15
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,861
Karma: 12755553
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by mkelley View Post
I've got some very fine PDF files coming in to Calibre but when I try and convert them to ePub it ends up spacing between paragraphs, which is what I *don't* want.
PDF files aren't the best source file to use for conversions.

Quote:
Originally Posted by mkelley View Post
doing other file conversions (like LIT to ePub) don't have this problem.
This is because at their core other formats use html (html, epub, lit, mobi), xml or text. PDF for the most part uses Postscript language to create a page. From Adobe's site:

"PDF is a particular file format, like EPS or native Illustrator files. It just so happens that PDF is built largely on the PostScript language"


The same site describes Postscript as follows:

"So, we've established that PostScript is a language, like BASIC, Fortran, or C++. But unlike these other languages, PostScript is a programming language designed to do one thing: describe extremely accurately what a page looks like.

Every programming language needs a processor to run or execute the code. In the case of PostScript, this processor is a combination of software and hardware which typically lives in a printer, and we call it a RIP - a Raster Image Processor. A RIP takes in PostScript code and renders it into dots on a page. So a PostScript printer is a device that reads and interprets PostScript programs, producing graphical information that gets imaged to paper, film, or plate."


Quote:
Originally Posted by mkelley View Post
Uh, not to sound stupid (but I *feel* stupid) but could you be a little more specific? There are three stickies at the top of the Calibre sub-forum but none about PDF.
You posted in the Calibre - Conversion sub-forum and there are 4 sticky posts at the top of this sub-forum. One is titled "Read this before Posting PDF Questions"

Quote:
Originally Posted by mkelley View Post
While this would only be of interest to idiots like me struggling with this, I'll still put it down in case some other clueless individual finds this thread. I was using OpenOffice
If you are using OpenOffice you should just save the doc as ODT or html or use the Writer2ePub OpenOffice extension to save your file as ePub and add any of those to calibre. If possible you should never use PDF as an intermediate format in your conversion workflow.
DoctorOhh is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Paragraph spaces in ePub to Mobi conversion disrupts indent formatting markpearl Conversion 34 09-21-2011 02:42 PM
Huge Sentence and Paragraph Spaces EPub to Mobi Dasha Amazon Kindle 10 06-06-2011 06:43 PM
I'm having a problem with extra paragraph spaces akosimike Calibre 10 05-27-2010 06:53 PM
PDF to ePub creates spaces within sentances robertpolson Calibre 1 02-13-2010 06:14 PM
Paragraph indendation or spaces? enarchay Sony Reader 0 05-28-2009 05:18 AM


All times are GMT -4. The time now is 03:36 AM.


MobileRead.com is a privately owned, operated and funded community.