MobileRead Forums - View Single Post

nrapallo · 02-27-2009, 04:22 PM

Quote:

Originally Posted by AnthonyPaulO

Hi!

Nick : I'm writing an article on media conversions for the Kindle and would like to devote some of it to your PDFRead application if that's okay with you.

That's fine with me!

However, please review the first post in this thread where I refer to the previous version's thread, PDFRead 1.7 released, as that was where this software "matured". I enhanced a good thing, with PDFRead 1.8, because I especially wanted to get it to allow colour images to be processed and retained. ashkulz had already written the colour support routines but never ended up releasing his code. We had exchanged some emails about adding colour that could be used by the REB 1200 (his ebook reader and mine). That's where I got involved, in fact, before I joined mobileread. It was ashkulz who pointed me this way in late 2007...

Quote:

I was wondering something though... would it be possible to expand the program a bit so that it would take HTML files as well? In other words, since PDFRead is essentially a two step process (or three with unpaper) where you 1) extract HTML from PDF and 2) create .prc file from HTML, would it be possible to allow for people who already have the HTML (extracted from, let's say, a .CHM file or saved via a web-browser) and have them skip step one and perform only step two? This should get double the audience for your app!

Regards,

Anthony

As far as I know, there is no perfect .pdf to .html/.txt routine for all types of .pdf ebooks (Word/text based or scanned images). Mobipocket Creator's import function handles Word/text based .pdfs quite well, but not .pdf's containing just a bunch of scanned book page images. The only option to get those converted to .html/.txt is OCR.

PDFRead, is an alternative method for (reading on small screen ebook readers) the latter type of .pdf's (scanned images). Everything gets converted to images and .html is only used to bind them together sequentially to create an ebook. There is no text/OCR conversion done.

CHM are compiled html and are easily exploded in near perfect .html requiring only minor TOC/index editing to be done. They are not image based, but text based.

PDFRead specifically helps read, in ebook form, all image based .pdf/.djvu/.jpg/etc... documents. It's very focused in its intent and purpose. I wouldn't want it to do other types of conversion, especially when you have the grand-daddy of all converters, Calibre, being so well developed and maintained right here at mobileread.com.

I also have written/maintained software that converts .prc/.mobi/.pdb/.lit to my reader's .imp format. Any text/chm/html ebooks would be converted using these software packages (Mobi2IMP/Lit2SB) while all image based ebooks would be converted by PDFRead.

Does that make some sense?