|  04-27-2010, 06:11 AM | #1 | ||
| Digitally confused            Posts: 500 Karma: 1500000 Join Date: Mar 2010 Location: London, UK Device: KPW, K2i, Nexus 7 32gb, Kobo Mini | 
				
				learning to convert docs
			 
			
			I'm a bit new at this and have been trying to convert a few pdf books to epub to read in FBReader on my old Nokia N800. The pdf looks fine on my computer and on my N800 but I wanted to learn to convert using Calibre. I know regexp's etc but I don't understand these XPATH lines and can't see how they apply to non html files. Problems I'm having: 
  Mike | ||
|   |   | 
|  04-29-2010, 07:52 PM | #2 | 
| Digitally confused            Posts: 500 Karma: 1500000 Join Date: Mar 2010 Location: London, UK Device: KPW, K2i, Nexus 7 32gb, Kobo Mini | 
			
			Interestingly I still get all sorts of issues when converting from PDF to TXT. My aim was to just grab the text and then do the formatting with an editor like vi. Strangely the txt has many odd artefacts like double L's appearing as on L followed by a few strange graphic characters.  I do understand that PDFs are very poor as a container of text but I thought I might be able to convert my pdf files to epub (or even just txt) with the intention of picking a suitable ereader - I guess I'm stuck on getting one that can display the pdfs well. | 
|   |   | 
| Advert | |
|  | 
|  04-29-2010, 08:42 PM | #3 | 
| creator of calibre            Posts: 45,604 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			http://calibre-ebook.com/user_manual...ture-detection http://calibre-ebook.com/user_manual...-pdf-documents As for the double ll glyph, that's a bug, which wont be fixed until calibre's new PDF engine is done. | 
|   |   | 
|  04-30-2010, 04:09 AM | #4 | 
| Digitally confused            Posts: 500 Karma: 1500000 Join Date: Mar 2010 Location: London, UK Device: KPW, K2i, Nexus 7 32gb, Kobo Mini | 
			
			Yep - I'd read those pages, I also understand HTML and, to a lesser extent, XML. Problem is I'm trying to write small bits of code in Calibre using a language I don't know (XPATH) to process the contents of a file I can't see the contents of (PDF) and for some strange reason I seem to be having problems   If I could just view the text then I could write a little program to stitch things back together. Are there converters that perhaps perform OCR on the PDF and just output the text? Mike | 
|   |   | 
|  04-30-2010, 06:44 AM | #5 | 
| creator of calibre            Posts: 45,604 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			read this http://calibre-ebook.com/user_manual...rsion.html#id7 in particular the section on the debug option which will allow you access to the text in the intermediate stages of conversion. | 
|   |   | 
| Advert | |
|  | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Language learning | Kumabjorn | General Discussions | 5 | 07-28-2010 12:33 PM | 
| e-learning | irenas | Astak EZReader | 42 | 03-03-2010 11:56 AM | 
| Seriously thoughtful Learning a new language | GraceKrispy | Lounge | 159 | 11-22-2009 08:38 AM | 
| Plucker Fails to convert HTML docs via Word | evwool | Reading and Management | 8 | 05-10-2009 01:23 PM | 
| Convert word DOCs when you don't have WORD ? heheh | macthekitten | Calibre | 9 | 01-30-2009 07:41 AM |