12-31-2007, 06:56 AM | #1 |
Member
Posts: 20
Karma: 46
Join Date: Nov 2007
Location: Germany
Device: Sony PRS-505
|
From PDF to LRF via Mobipocket Creator and BD - works great :-)
Hi all,
since my results in converting PDF to LRF using Book Designer often were not as good as expected (i.e. losing italics, not detecting and eliminating page numbers, losing images) in some cases, I was looking for a new way to give me better results. As I knew, that Mobipocket deals really good with PDFs I was trying to convert PDFs with Mobipocket. But I had no luck with converting these really well looking PRCs to LRF, as Book Designer didn't recognize them. So I gave Mobipocket Creator a try. Same result with these PRCs ... but ... Creator makes a temporary HTML which works very well (at least in my tests) with BD. So that's the way to go: First install Mobipocket Creator from http://www.mobipocket.com/en/DownloadSoft/default.asp Install it with the advanced features enabled. Start it and choose "Import From Existing File" and import your PDF. Click on "Build". Now check "Open folder containing eBook" and click "OK". There you will find a HTML file. Open this in Book Designer and save your LRF - that's all. I know there are a few tools for converting PDF to LRF or PDF to HTML, but this is the easiest way with the best results I found so far (at least for standard eBooks - technical documents may not work that good) - give it a try! Stefan Last edited by shen; 12-31-2007 at 08:18 AM. |
01-01-2008, 08:35 AM | #2 |
The Introvert
Posts: 8,307
Karma: 1000077497
Join Date: Jan 2007
Location: United Kingdom
Device: Sony Reader PRS-650 & 505 & 500
|
Does it keep italics?
|
Advert | |
|
01-01-2008, 08:37 AM | #3 |
Member
Posts: 20
Karma: 46
Join Date: Nov 2007
Location: Germany
Device: Sony PRS-505
|
Yes, italics are kept :-)
Stefan |
01-01-2008, 08:38 AM | #4 |
The Introvert
Posts: 8,307
Karma: 1000077497
Join Date: Jan 2007
Location: United Kingdom
Device: Sony Reader PRS-650 & 505 & 500
|
|
01-02-2008, 11:45 AM | #5 |
The Introvert
Posts: 8,307
Karma: 1000077497
Join Date: Jan 2007
Location: United Kingdom
Device: Sony Reader PRS-650 & 505 & 500
|
|
Advert | |
|
01-02-2008, 12:34 PM | #6 |
Connoisseur
Posts: 62
Karma: 133
Join Date: Oct 2007
Location: Minnesota, USA
Device: Kobo Aura Edition 2
|
I've used this method before and it works fairly well. You may want to copy the file called "pdf2xml.exe" from the "Mobipocket Reader" program file into the "Mobipocket Creator" program file. The latest version of the reader (6.1) has a more recent version of this file which works better in some cases.
astra_lestat, look in the My Documents\My Publications folder. That is where the Creator will store the files by default. |
01-02-2008, 05:29 PM | #7 |
Groupie
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
|
Has anyone compared the results of this method to that of the pdf2lrf tool in libprs500?
I would be interested in hearing the results before going ahead and installing this myself. |
01-02-2008, 08:36 PM | #8 |
Member
Posts: 20
Karma: 46
Join Date: Nov 2007
Location: Germany
Device: Sony PRS-505
|
I tested libprs500 in the past using the GUI and in very most cases, I was not very happy with the results.
So I switched over to Book Designer and PdfLrf. Depending on my source PDF, one of them gave me acceptable results. Not perfect in most cases, but the LRFs were usable enough for me. But there were a few cases which didn't give me results I was willing to accept - no fun to read. That's why I was looking for alternative ways. I've tested nearly all known tools and methods from this forum and I was looking for something new, that's how I found out this Mobipocket Creator / BD combo which gave me surprisingly better results on PDFs which I had no luck with. I'm not looking for the 100% perfectly conversion tool, nor for the 100% perfect output (as I know that this hardly can be done - source PDFs differ so much in quality and layout). And I'm not willing to spend much time in correcting conversions, i.e. in Book Designer. I just want to convert my PDFs fast and with only few user interaction, put the LRF on my Sony, read it and that's all. What really does the trick here is the conversion and text reformatting from PDF to HTML, which Mobipocket does a great job on. Try it and open the output HTML in your Browser. As PDFs are organized in hard coded individual pages, a conversion to a single floating text on a large single page has to made at first, including images and dealing with italics, bold text, eliminating page numbers, headers, footes and so on. MP just deals great with that tasks - fast and automatically. Once you have it converted to such a HTML page, it's easy to create a well looking LRF, which I do with Book Designer. Of course other utilities may also be used if you start with a HTML page, but I found that this combo does a great job to my literaric PDFs - and they are german in most cases. This may not apply to all PDFs, but I've tried a few and nothing gave me better results in such a short time. Try it, it's worth a try - especially if you are not happy with the results you're getting right now, whatever tools you're using at the moment. For a quick check, you also can install Mobipocket Reader, open your PDF and read the output at the screen. Here you can do fast precheck of the conversion which you can expect from MP. And after a conversion to LRF in BD, the resulting LRF is in most cases at least that good if not better. I'm very happy with that. And to be honest ... there's not much left to do for Book Designer, most of the conversion tricks BD does are already made by MP. But MP keeps images and italics, eliminates page numbers which BD didn't do very well in some cases. At least you can define you preferred fonts and font sizes and other formatting options in BD. So reformatting the text is done mostly in MP and the look and feel is done mostly by BD. Too sad, that there's no converter from Mobipocket PRCs to Sony LRFs, because this could be a great combo. Stefan Last edited by shen; 01-02-2008 at 08:54 PM. |
01-07-2008, 06:07 AM | #9 |
The Introvert
Posts: 8,307
Karma: 1000077497
Join Date: Jan 2007
Location: United Kingdom
Device: Sony Reader PRS-650 & 505 & 500
|
I have tried it and I didn't like it, sorry.
Too many broken paragraphs, at the same time too many paragraphs annexed forming one huge paragraph. |
01-20-2008, 06:53 AM | #10 |
Junior Member
Posts: 6
Karma: 10
Join Date: Nov 2007
Device: PRS 505
|
Stefan,
thx a lot , ur suggestion for me is the best one i have find . Now i can have a good book in just 2 minutes , sure some page break is missing but the result is very good and most of all FAST ! Now we are all waiting for the next firmware update ( I have read will be out in r the first quarter 2008 ) , coz imho this device must only read PDF files ...but perfect. Thx again stefan |
01-21-2008, 09:55 AM | #11 | |
Enthusiast
Posts: 49
Karma: 299
Join Date: Oct 2007
Location: South Wales, UK
Device: PRS-505 (Blue)/PRS-505 (Red)/iPhone 3GS
|
Quote:
At the moment, I'm using the libprs500-produced LRF and ignoring the little mis-formattings, and would recommend the conversion procedure for anyone who is having problems getting a readable PDF conversion. I've successfully used PDFLRF and other PDF converters in the past for different docs, but this particular manual was a real pain in the backside, and Shen's method was the only one which gave me an acceptable document. Irene |
|
02-01-2008, 05:36 AM | #12 |
Junior Member
Posts: 6
Karma: 10
Join Date: Nov 2007
Device: PRS 505
|
Last problem for me now is code samples .
If a book contain code examples well indented , in the conversion indentation is lost , and code in really unreadable in all the book. Is some1 know a fast way to setup coode in BD or Mobi |
02-01-2008, 12:25 PM | #13 |
Junior Member
Posts: 6
Karma: 22
Join Date: Nov 2007
Device: Sony PRS-505
|
Does MobiPocket convert the text by interpreting it with OCR?
The book i tried initially appeared to convert well (nice formatting), but the text accuracy was absolutely miserable. Ended up using the PDF cut-paste and manual fix method since that at least preserved the correct text. |
02-01-2008, 01:20 PM | #14 |
Zealot
Posts: 127
Karma: 9856
Join Date: Dec 2007
Location: Ontario, Canada
Device: Sony PRS-300/Kindle Keyboard/iPad Mini
|
Anybody got a sample to test? I've got a sneaking suspicion that WordPerfect would handle it better/more easily - Reveal Codes is the WP user's friend. I've cleaned up plenty of ASCII-text from mailing list posts by running it through WP. What I'd suspect would work for the PDF would be to either open it in WP or cut-&-paste it in, turn on Reveal Codes, see what codes are being used at the end of lines versus end/beginning of paragraphs, and go from there. Regardless of whether there's a blank line between paragraphs or if the paragraphs are indicated by indentation alone, there will be something unique about the coding that separates them. Search and replace that with some sort of unique indicator word/phrase. Then search and replace the hard line feeds with the soft line feed code. One more search and replace to turn the indicator back into the proper paragraph separation code, then a quick once-over to confirm that things look good.
At that point, I'd probably run a macro to just go ahead and do the HTML conversion (mainly just a series of search-&-replaces to replace WP code with HTML for bold, italic, underline, etc.), add in any desired extra HTML coding, then save out as plain text. Rename the txt file to html and you're good to go. Why not just let WP save as HTML, you may ask. Simple - the same reason that I highly recommend not letting Word save as HTML - they both do a lousy job and include way too much unnecessary junk. |
05-04-2009, 01:04 PM | #15 |
Junior Member
Posts: 9
Karma: 10
Join Date: Aug 2008
Device: Kindle
|
Complex PDF to HTML
I wrote a python script to convert the output of pdf2xml (from Mobipocket Creator) to html which is suitable for converting to ebook formats. I wrote it specifically to handle code indentation properly. It uses the same source that Mobipocket Creator uses and tries to do an even better job. It is opensource (GPL) so you can tweak it if you know python. I posted about it at http://talkings.org/2009/05/03/complex-pdf-html/. The download link is there as well.
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Using MobiPocket Creator to convert PDF to PRC | Bilbo1967 | Kindle Formats | 15 | 08-16-2010 07:16 AM |
Great tips for PDF reading from Solitaire1: works on Gen3 | hidari | Bookeen | 2 | 01-28-2010 06:36 PM |
Dicken's Hand written manuscript available as a PDF. Works great on my DX! | Roy White | Amazon Kindle | 2 | 12-11-2009 12:41 PM |
Mobipocket creator, PDF | Skar90 | Software | 7 | 10-10-2009 12:33 PM |
Mobipocket Reader 4.8 and Mobipocket eNews Creator | Mobipocket | Reading and Management | 1 | 01-29-2004 08:03 AM |