![]() |
#1 |
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
|
Convert from PDF to ePub puts paragraph spaces
I've got some very fine PDF files coming in to Calibre but when I try and convert them to ePub it ends up spacing between paragraphs, which is what I *don't* want.
I've tried checkmarking the "Remove Spacing between paragraphs" (even though the original PDF has no spacing) but that doesn't help. When I look at the PDFs in Calibre they look find (no spacing) so I'm not sure what's going on here -- doing other file conversions (like LIT to ePub) don't have this problem. Any ideas? |
![]() |
![]() |
![]() |
#2 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,876
Karma: 59840450
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
![]() |
|
![]() |
![]() |
![]() |
#3 |
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
|
Uh, not to sound stupid (but I *feel* stupid) but could you be a little more specific? There are three stickies at the top of the Calibre sub-forum but none about PDF. In the PDF forum there are TONS of stickies (and I don't even know where to start).
|
![]() |
![]() |
![]() |
#4 |
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
|
Just from reading *some* of those stickies I'm guessing the idea is that a PDF doesn't contain true paragraph breaks. If so, I understand that much, but what I don't understand is why Calibre converts the end of a paragraph into a break PLUS a line break.
IOW, it would work perfectly if it just ignored the line break (or at least didn't pad the paragraph break with a space). It's intelligent enough to know when to break the paragraph (it's doing that consistently throughout my PDF file) but not enough to know not to add a line break as well? |
![]() |
![]() |
![]() |
#5 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,246
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
Calibre probably isn't adding linebreaks it's just creating css for your paragraphs with large top and bottom margins, probably something like:
.calibrenn { display: block; margin-bottom: 1em; margin-left: 0; margin-right: 0; margin-top: 1em } You could manually edit these to zero. However, the easier option is to try running your conversion with Convert - Look&Feel - Remove spacing between paragraphs checked. See if you like the resulting epub better. |
![]() |
![]() |
![]() |
#6 |
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
|
I already tried removing spacing (see my first message) but that doesn't do anything (most likely because there aren't any paragraph breaks of the <p or <div> kind).
And it's getting even more complicated -- I went to using ODT files (on the theory that PDFs were somehow just evil :>) and now I can get rid of the paragraph breaks (by editing in Open Office the ODT file) but I can't now get a paragraph indent -- the one that OO puts in there is ignored by any conversion, and there is no way to add one (because, again, the only way you can get an indent is to remove spacing, and I don't have any spacing). Sigh -- this is very complicated and seemingly needlessly so. |
![]() |
![]() |
![]() |
#7 |
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
|
And how do I edit the paragraph css -- I don't see this as an option anywhere in Calibre (but I'm pretty stupid)?
|
![]() |
![]() |
![]() |
#8 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,246
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
Quote:
There's nothing automated, but if you use the Tweak epub option (just highlight the epub and press T) your epub will be exploded into its constituent parts. You can view the .css file in your preferred text editor. You should probably open up some of the html content files in the text editor as well so you can see how the paragraphs have been constructed. Of course, this does pre-suppose that you know a little about html and css. PDFs can be very difficult to convert satifactorily. There is no magic method, but if you've read the PDF sticky I guess you already know that. |
|
![]() |
![]() |
![]() |
#9 |
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
|
Somehow I found out how to look at the ePub document in xhml style (I think) and it's showing paragraph breaks (<p>) between paragraphs. So I don't get it -- I don't know why it's inserting them, and I don't know how to get rid of them (I'm off now to try and convert the ePub back to something and then back to ePub -- I feel like Alice in Wonderland. This is more confusing than programming).
|
![]() |
![]() |
![]() |
#10 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 78,925
Karma: 143098300
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Go to the Sigil forum and read about Sigil. Then download and install Sigil. Take the ePub with the paragraph spaces and load it into Sigil and you can then edit the ePub to fix the paragraph spaces.
|
![]() |
![]() |
![]() |
#11 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,246
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
If the unwanted blank lines are always coded exactly the same way you could investigate the Convert - Search&Replace option to try and delete them during the conversion. A little Regex knowledge is usually required.
|
![]() |
![]() |
![]() |
#12 |
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
|
Hey, if it's 1am there you ought to go to bed (I know I would).
It appears I have two choices -- I can remove all the paragraph breaks from the ODT file and this does convert just fine to an ePub document without paragraph breaks between paragraphs... except there is no indent at the start of each paragraph. I can fix that by adding one paragraph break in the ODT file but then I get the indent AND an additional paragraph break in the resultant ePub file. So I'm darned if I do and darned if I don't. There doesn't appear to be any way to eat your cake and have it too. I'm guessing there is one additional thing I don't know, but I'm going to look at some of the files which ARE correct and see if I can figure out what it is (my guess would be some sort of indent character that needs to be added somewhere). |
![]() |
![]() |
![]() |
#13 |
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
|
Sorry, I cross posted to both of your replies.
I'll look at Sigil and see but first I'm going to try and pursue why the indent isn't working. Thanks for your help, though. |
![]() |
![]() |
![]() |
#14 |
Enthusiast
![]() Posts: 31
Karma: 10
Join Date: Dec 2011
Device: Kindle and iPad
|
Ah -- don't worry about it. I think I got is sussed out.
While this would only be of interest to idiots like me struggling with this, I'll still put it down in case some other clueless individual finds this thread. I was using OpenOffice's add in Alternative Searching in order to get rid of extraneous paragraph feeds. The problem is that's it's pretty non-standard in how it refers to things (there are three different ways to refer to a paragraph, for example, among them end of paragraph and empty paragraph, both listed BEFORE just plain ole paragraph). The bottom line is that as long as I made sure each paragraph had a paragraph tag (in OO Alternative searching it's a /p and represented on screen by the typographers big "P" with double ll on it) and then checkmarked "remove paragraph" with a proper indent in Calibre's conversion I end up with what I want. Phew! |
![]() |
![]() |
![]() |
#15 | ||||
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,888
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
Quote:
"PDF is a particular file format, like EPS or native Illustrator files. It just so happens that PDF is built largely on the PostScript language" The same site describes Postscript as follows: "So, we've established that PostScript is a language, like BASIC, Fortran, or C++. But unlike these other languages, PostScript is a programming language designed to do one thing: describe extremely accurately what a page looks like. Every programming language needs a processor to run or execute the code. In the case of PostScript, this processor is a combination of software and hardware which typically lives in a printer, and we call it a RIP - a Raster Image Processor. A RIP takes in PostScript code and renders it into dots on a page. So a PostScript printer is a device that reads and interprets PostScript programs, producing graphical information that gets imaged to paper, film, or plate." Quote:
Quote:
|
||||
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Paragraph spaces in ePub to Mobi conversion disrupts indent formatting | markpearl | Conversion | 34 | 09-21-2011 02:42 PM |
Huge Sentence and Paragraph Spaces EPub to Mobi | Dasha | Amazon Kindle | 10 | 06-06-2011 06:43 PM |
I'm having a problem with extra paragraph spaces | akosimike | Calibre | 10 | 05-27-2010 06:53 PM |
PDF to ePub creates spaces within sentances | robertpolson | Calibre | 1 | 02-13-2010 06:14 PM |
Paragraph indendation or spaces? | enarchay | Sony Reader | 0 | 05-28-2009 05:18 AM |