04-05-2009, 12:41 PM | #1 |
Junior Member
Posts: 4
Karma: 10
Join Date: Apr 2009
Device: Sony PRS 505
|
Conversion from PDF : missing initials
Hi everyone,
Newbie here and proud owner since yesterday of a brand news PRS 505 (after tons of reading on my loyal PSP with Bookr). Naturally, I've installed Calibre to manage my ebooks collection and found it the perfect companion to my new best friend. But... I'm trying to convert a pdf file - a novel - formatted with a big initial at the beginning of each chapter, and Calibre just removes them. Of course, it's usually trivial to guess which letter it was, but I find it annoying. I tried searching the forum and had a quick look at the FAQ without finding an answer. Is there any way to solve this problem ? /Guillaume. |
04-05-2009, 05:34 PM | #2 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Short answer: No.
Long answer: pdf is a terrible format for conversion. Calibre uses pdftohtml internally to convert a text based pdf into an html document. From there it is converted into whatever format you want. This as with almost all pdf conversion issues in Calibre is due to the limitations of pdftohtml and the complexity of pulling text out of a pdf file. The only way to solve this issue is for someone to come up with something that works better than pdftohtml (this isn't nearly as easy as it sounds, I've tried and pdftohtml still produces better results). |
Advert | |
|
04-05-2009, 06:14 PM | #3 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@user_none have you looked at pdfminer it's in python and so may be easier to hack
|
04-05-2009, 08:28 PM | #4 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
@kovidgoyal, Looks like I have another project for my todo list. I'll look into pdfminer and I'll replacing pdftohtml with it.
@Yompan, can you email me (john@nachtimwald.com) the file your having the first letter issue with so I can try pdfminer and see if I can't get the file to convert correctly with it. |
04-06-2009, 04:14 AM | #5 |
Junior Member
Posts: 4
Karma: 10
Join Date: Apr 2009
Device: Sony PRS 505
|
@Everyone : thanks for your answers. I'll avoid pdf in the future and stick to txt files, as I always did with my psp.
@user_none : file sent. In the meantime, i'll convert it to plain text and add the missing initials manually. Fortunately, I just realised that the chapters really long (only 10 of them) so it's going to be easy & fast. But the same problem is bound to happen with other books ; a 50+ chapters novel would be much more tedious. |
Advert | |
|
04-06-2009, 04:55 AM | #6 |
The Grand Mouse 高貴的老鼠
Posts: 71,503
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
This is almost certainly because the first letter isn't really a character at all, but a graphic instead.
|
Tags |
conversion, initial, pdf |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Importing pdf library from Papers - missing meta tags | varmemester | Calibre | 0 | 09-16-2010 10:56 AM |
pdf with missing picture | reamo | Apple Devices | 8 | 05-29-2010 08:04 PM |
Half of book missing after conversion, ideas? | ficbot | Workshop | 1 | 04-14-2009 10:05 AM |
Mystery and Crime Green, Anna Katharine: Initials Only. v1. 9 Oct 07 | HarryT | Kindle Books | 0 | 10-09-2007 05:51 AM |
Mystery and Crime Green, Anna Katharine: Initials Only. v1. 9 Oct 07 | HarryT | BBeB/LRF Books | 0 | 10-09-2007 05:50 AM |