| 
			
			 | 
		#1 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
			
			 
				
				pdf2lrf
			 
			
			
			Part of libprs500 v0.3.81. It extracts the text from PDF files and converts them to LRF. Preserves bold and italics. See attached demo. 
		
	
		
		
			It doesn't support embedded images and results are not going to be satisfactory for complex PDF files. But for converting simple novels, it works great. Linux users: If you want support for PDF links then you need to install poppler from CVS. To use: Code: 
	pdf2lrf "mybook.pdf" Last edited by kovidgoyal; 07-30-2007 at 06:02 PM.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#2 | 
| 
			
			
			
			 Resident Curmudgeon 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,784 
				Karma: 150249619 
				Join Date: Nov 2006 
				Location: Roslindale, Massachusetts 
				
				
				Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Would it be possible to have PDF2HTML so we can then edit the text/book how we want and then use html2lrf to create a properly formatted book? 
		
	
		
		
		
		
		
		
		
		
		
		
	
	And this is a great step forward for PDF conversion without the need for Acrobat.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#3 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			pdftohtml is on your path in windows.  
		
	
		
		
		
		
		
		
		
		
		
		
	
	Code: 
	pdftohtml mybook.pdf  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#4 | 
| 
			
			
			
			 Resident Curmudgeon 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,784 
				Karma: 150249619 
				Join Date: Nov 2006 
				Location: Roslindale, Massachusetts 
				
				
				Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#5 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Incidentally, is there some reason this thread isn't being made a sticky?
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#6 | 
| 
			
			
			
			 The Introvert 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,307 
				Karma: 1000077497 
				Join Date: Jan 2007 
				Location: United Kingdom 
				
				
				Device: Sony Reader PRS-650 & 505 & 500 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Are there any instruction how to use this feature?  Sort of help or FAQ?
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#7 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Start up a terminal (Start->Run and type cmd.exe) 
		
	
		
		
		
		
		
		
		
		
		
		
	
	change to the directory of your pdf file Code: 
	cd "c:\my directory" pdf2lrf mybook.pdf  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#8 | 
| 
			
			
			
			 Resident Curmudgeon 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,784 
				Karma: 150249619 
				Join Date: Nov 2006 
				Location: Roslindale, Massachusetts 
				
				
				Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#9 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			thanks.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#10 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 13 
				Karma: 10 
				Join Date: Aug 2007 
				
				
				
				Device: Kindle 4 
				
				
				 | 
	
	
	
		
		
			
			 
				
				foreign characters
			 
			
			
			Could you explain how to get correct non-english characters from pdf? I get strange results with polish language. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	a word "CZĘŚĆ" is converted into: <b>CZ </b><br> <b>E ´S ´</b><br> <b>C I</b><br>  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#11 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Try the -enc switch of pdftohtml?
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#12 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 13 
				Karma: 10 
				Join Date: Aug 2007 
				
				
				
				Device: Kindle 4 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Thanks, I tried it, but I can only get the error message: 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Error: Couldn't find unicodeMap file for the 'iso-8859-2' encoding Is there a list of encoding names?  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#13 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I dont know, you'll have to contact the author of pdftohtml.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#14 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 13 
				Karma: 10 
				Join Date: Aug 2007 
				
				
				
				Device: Kindle 4 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Thanks for your input, I discovered that my pdf has embedded font without unicode map, which may be the reason of all problems and there is no easy way of fixing it :-(
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#15 | 
| 
			
			
			
			 Evangelist 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 415 
				Karma: 510423 
				Join Date: Nov 2006 
				
				
				
				Device: Sony PRS-505 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			This is a MESS.  
		
	
		
		
		
		
		
		
		
		
		
		
	
	Line breaks ignored. Page breaks after 1-2 lines on a page, IN THE MIDDLE of the sentence. ![]() ![]()  
		 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
            
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Converting your PDF files to Kindle using PDF2LRF (Better than Amazon's conversion) | guineapiguser | Amazon Kindle | 33 | 08-02-2011 07:07 PM | 
| Classic PDF2LRF equivalent for PDF->EPUB? | Waba | Barnes & Noble NOOK | 2 | 08-02-2010 06:56 PM | 
| A suggestion to Pdf2lrf | inew | Sony Reader | 3 | 10-08-2008 01:48 AM | 
| PDF2LRF contribution fund for CACAPEE! | skyd171 | Sony Reader | 1 | 01-31-2008 06:05 PM |