View Single Post
Old 12-28-2011, 07:33 AM   #15
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by mkelley View Post
I've got some very fine PDF files coming in to Calibre but when I try and convert them to ePub it ends up spacing between paragraphs, which is what I *don't* want.
PDF files aren't the best source file to use for conversions.

Quote:
Originally Posted by mkelley View Post
doing other file conversions (like LIT to ePub) don't have this problem.
This is because at their core other formats use html (html, epub, lit, mobi), xml or text. PDF for the most part uses Postscript language to create a page. From Adobe's site:

"PDF is a particular file format, like EPS or native Illustrator files. It just so happens that PDF is built largely on the PostScript language"


The same site describes Postscript as follows:

"So, we've established that PostScript is a language, like BASIC, Fortran, or C++. But unlike these other languages, PostScript is a programming language designed to do one thing: describe extremely accurately what a page looks like.

Every programming language needs a processor to run or execute the code. In the case of PostScript, this processor is a combination of software and hardware which typically lives in a printer, and we call it a RIP - a Raster Image Processor. A RIP takes in PostScript code and renders it into dots on a page. So a PostScript printer is a device that reads and interprets PostScript programs, producing graphical information that gets imaged to paper, film, or plate."


Quote:
Originally Posted by mkelley View Post
Uh, not to sound stupid (but I *feel* stupid) but could you be a little more specific? There are three stickies at the top of the Calibre sub-forum but none about PDF.
You posted in the Calibre - Conversion sub-forum and there are 4 sticky posts at the top of this sub-forum. One is titled "Read this before Posting PDF Questions"

Quote:
Originally Posted by mkelley View Post
While this would only be of interest to idiots like me struggling with this, I'll still put it down in case some other clueless individual finds this thread. I was using OpenOffice
If you are using OpenOffice you should just save the doc as ODT or html or use the Writer2ePub OpenOffice extension to save your file as ePub and add any of those to calibre. If possible you should never use PDF as an intermediate format in your conversion workflow.
DoctorOhh is offline   Reply With Quote