![]() |
#1 | ||
Digitally confused
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 500
Karma: 1500000
Join Date: Mar 2010
Location: London, UK
Device: KPW, K2i, Nexus 7 32gb, Kobo Mini
|
learning to convert docs
I'm a bit new at this and have been trying to convert a few pdf books to epub to read in FBReader on my old Nokia N800. The pdf looks fine on my computer and on my N800 but I wanted to learn to convert using Calibre. I know regexp's etc but I don't understand these XPATH lines and can't see how they apply to non html files.
Problems I'm having:
![]() Mike |
||
![]() |
![]() |
![]() |
#2 |
Digitally confused
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 500
Karma: 1500000
Join Date: Mar 2010
Location: London, UK
Device: KPW, K2i, Nexus 7 32gb, Kobo Mini
|
Interestingly I still get all sorts of issues when converting from PDF to TXT. My aim was to just grab the text and then do the formatting with an editor like vi. Strangely the txt has many odd artefacts like double L's appearing as on L followed by a few strange graphic characters.
I do understand that PDFs are very poor as a container of text but I thought I might be able to convert my pdf files to epub (or even just txt) with the intention of picking a suitable ereader - I guess I'm stuck on getting one that can display the pdfs well. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
http://calibre-ebook.com/user_manual...ture-detection
http://calibre-ebook.com/user_manual...-pdf-documents As for the double ll glyph, that's a bug, which wont be fixed until calibre's new PDF engine is done. |
![]() |
![]() |
![]() |
#4 |
Digitally confused
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 500
Karma: 1500000
Join Date: Mar 2010
Location: London, UK
Device: KPW, K2i, Nexus 7 32gb, Kobo Mini
|
Yep - I'd read those pages, I also understand HTML and, to a lesser extent, XML. Problem is I'm trying to write small bits of code in Calibre using a language I don't know (XPATH) to process the contents of a file I can't see the contents of (PDF) and for some strange reason I seem to be having problems
![]() If I could just view the text then I could write a little program to stitch things back together. Are there converters that perhaps perform OCR on the PDF and just output the text? Mike |
![]() |
![]() |
![]() |
#5 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
read this http://calibre-ebook.com/user_manual...rsion.html#id7
in particular the section on the debug option which will allow you access to the text in the intermediate stages of conversion. |
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Language learning | Kumabjorn | General Discussions | 5 | 07-28-2010 12:33 PM |
e-learning | irenas | Astak EZReader | 42 | 03-03-2010 11:56 AM |
Seriously thoughtful Learning a new language | GraceKrispy | Lounge | 159 | 11-22-2009 08:38 AM |
Plucker Fails to convert HTML docs via Word | evwool | Reading and Management | 8 | 05-10-2009 01:23 PM |
Convert word DOCs when you don't have WORD ? heheh | macthekitten | Calibre | 9 | 01-30-2009 07:41 AM |