Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 04-05-2009, 12:41 PM   #1
Yompan
Junior Member
Yompan began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Apr 2009
Device: Sony PRS 505
Conversion from PDF : missing initials

Hi everyone,

Newbie here and proud owner since yesterday of a brand news PRS 505 (after tons of reading on my loyal PSP with Bookr). Naturally, I've installed Calibre to manage my ebooks collection and found it the perfect companion to my new best friend.

But...

I'm trying to convert a pdf file - a novel - formatted with a big initial at the beginning of each chapter, and Calibre just removes them. Of course, it's usually trivial to guess which letter it was, but I find it annoying.

I tried searching the forum and had a quick look at the FAQ without finding an answer. Is there any way to solve this problem ?

/Guillaume.
Yompan is offline   Reply With Quote
Old 04-05-2009, 05:34 PM   #2
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,459
Karma: 986493
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Short answer: No.

Long answer: pdf is a terrible format for conversion. Calibre uses pdftohtml internally to convert a text based pdf into an html document. From there it is converted into whatever format you want. This as with almost all pdf conversion issues in Calibre is due to the limitations of pdftohtml and the complexity of pulling text out of a pdf file. The only way to solve this issue is for someone to come up with something that works better than pdftohtml (this isn't nearly as easy as it sounds, I've tried and pdftohtml still produces better results).
user_none is offline   Reply With Quote
Old 04-05-2009, 06:14 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,101
Karma: 5101571
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@user_none have you looked at pdfminer it's in python and so may be easier to hack
kovidgoyal is offline   Reply With Quote
Old 04-05-2009, 08:28 PM   #4
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,459
Karma: 986493
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
@kovidgoyal, Looks like I have another project for my todo list. I'll look into pdfminer and I'll replacing pdftohtml with it.

@Yompan, can you email me (john@nachtimwald.com) the file your having the first letter issue with so I can try pdfminer and see if I can't get the file to convert correctly with it.
user_none is offline   Reply With Quote
Old 04-06-2009, 04:14 AM   #5
Yompan
Junior Member
Yompan began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Apr 2009
Device: Sony PRS 505
@Everyone : thanks for your answers. I'll avoid pdf in the future and stick to txt files, as I always did with my psp.

@user_none : file sent.

In the meantime, i'll convert it to plain text and add the missing initials manually. Fortunately, I just realised that the chapters really long (only 10 of them) so it's going to be easy & fast. But the same problem is bound to happen with other books ; a 50+ chapters novel would be much more tedious.
Yompan is offline   Reply With Quote
Old 04-06-2009, 04:55 AM   #6
pdurrant
The Ghost Mouse
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 32,147
Karma: 88589614
Join Date: Jul 2007
Location: Norfolk, England
Device: NOOK ST GlowLight
This is almost certainly because the first letter isn't really a character at all, but a graphic instead.

Quote:
Originally Posted by Yompan View Post
I'm trying to convert a pdf file - a novel - formatted with a big initial at the beginning of each chapter, and Calibre just removes them. Of course, it's usually trivial to guess which letter it was, but I find it annoying.
pdurrant is offline   Reply With Quote
Reply

Tags
conversion, initial, pdf

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Importing pdf library from Papers - missing meta tags varmemester Calibre 0 09-16-2010 10:56 AM
pdf with missing picture reamo Apple Devices 8 05-29-2010 08:04 PM
Half of book missing after conversion, ideas? ficbot Workshop 1 04-14-2009 10:05 AM
Mystery and Crime Green, Anna Katharine: Initials Only. v1. 9 Oct 07 HarryT Kindle Books 0 10-09-2007 05:51 AM
Mystery and Crime Green, Anna Katharine: Initials Only. v1. 9 Oct 07 HarryT BBeB/LRF Books 0 10-09-2007 05:50 AM


All times are GMT -4. The time now is 12:23 AM.


MobileRead.com is a privately owned, operated and funded community.