Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 03-12-2008, 08:11 AM   #1
Prospect
Other
Prospect will become famous soon enoughProspect will become famous soon enoughProspect will become famous soon enoughProspect will become famous soon enoughProspect will become famous soon enoughProspect will become famous soon enough
 
Posts: 143
Karma: 644
Join Date: Jan 2008
Location: Norway
Device: Cybook, Kindle
PDF extraction – what is the best tool?

When converting PDFs to MobiPocket for my Cybook I have so far used MobiPocket Creator and Adobe Acrobat v6.

I think that the best result is archived if I export the PDF to HTML from Acrobat and then convert the HTML file to .prc using MopiPocket Creator, instead of converting directly from PDF to .prc in MobiPocket creator.

As far as I know I could also use BookDesigner for this task.

The conversion is never perfect and there are always issues with formatting.

How do you extract your PDFs? What is the best current tool/process? Will I archive better results if I update Acrobat to the latest version?
Prospect is offline   Reply With Quote
Old 03-12-2008, 09:04 AM   #2
RWood
Technogeezer
RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.
 
RWood's Avatar
 
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
For me the best tool is ABBYY PDF Transformer. As I remember it is about $99. It creates MS Word documents that can be edited and loaded into BD.
RWood is offline   Reply With Quote
Old 03-29-2008, 10:55 AM   #3
wgrimm
Addict
wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.
 
Posts: 230
Karma: 334908
Join Date: Oct 2006
Device: multiple
Quote:
Originally Posted by Prospect View Post
When converting PDFs to MobiPocket for my Cybook I have so far used MobiPocket Creator and Adobe Acrobat v6.

I think that the best result is archived if I export the PDF to HTML from Acrobat and then convert the HTML file to .prc using MopiPocket Creator, instead of converting directly from PDF to .prc in MobiPocket creator.
I have tried a great many software packages for PDF conversion. I own the latest Adobe Acrobat, and it is one of my least favorites for this task. DocUnPDF works very well (Mac and Win versions available, $60 or so) and will output in many formats including html and lrf. The company's reps are pretty nice, and will work with customers- for example, issuing license codes for 2 installations of the software when you buy it, so you can have one install at home and one at work.

My other favorite is Gemini, by Iceni, a British software company. Its output from pdf to html is the best I have seen, but you do pay a price- $159 when I bought it. It's absolutely top-of-the-line.

Gemini and UnPDF are the only 2 softwares out there I would recommend for this task.
wgrimm is offline   Reply With Quote
Old 03-29-2008, 12:15 PM   #4
wallcraft
reader
wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.
 
wallcraft's Avatar
 
Posts: 6,975
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
Quote:
Originally Posted by wgrimm View Post
DocUnPDF works very well
I can't find DocUnPDF. Did you mean deskUNPDF?
wallcraft is offline   Reply With Quote
Old 03-29-2008, 01:41 PM   #5
Prospect
Other
Prospect will become famous soon enoughProspect will become famous soon enoughProspect will become famous soon enoughProspect will become famous soon enoughProspect will become famous soon enoughProspect will become famous soon enough
 
Posts: 143
Karma: 644
Join Date: Jan 2008
Location: Norway
Device: Cybook, Kindle
I downloaded the demo version of Gemini and I agree that it is a great tool that works better than both Adobe Acrobate and Mobipocket Creator.

Thanks!
Prospect is offline   Reply With Quote
Old 03-29-2008, 05:40 PM   #6
wgrimm
Addict
wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.wgrimm ought to be getting tired of karma fortunes by now.
 
Posts: 230
Karma: 334908
Join Date: Oct 2006
Device: multiple
Quote:
Originally Posted by wallcraft View Post
I can't find DocUnPDF. Did you mean deskUNPDF?
Sorry, you are right. They have an upgrade offer now, and a couple of bundle specials.
wgrimm is offline   Reply With Quote
Old 04-14-2008, 07:04 PM   #7
tomsheeley
Junior Member
tomsheeley began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Mar 2008
Device: Palm TX
I have had great luck with the different converters from ABC Amber ( http://www.processtext.com/ ).

They have a PDF converter that convert to almost any format you can think of - for only $12.95.

I use the companies " MS Lit" converter almost every day, as so many ebooks are released in Lit format , which my Palm TX can not read.

Hope it helps!
tomsheeley is offline   Reply With Quote
Old 04-24-2008, 06:45 PM   #8
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,660
Karma: 127838196
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
ABC Amber Lit converter doesn't work well as it's based on a buggy version of ConvertLIT.
JSWolf is offline   Reply With Quote
Old 04-25-2008, 02:17 PM   #9
DDHarriman
Guru
DDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura about
 
Posts: 860
Karma: 4380
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
Acrobat pro 8.0 (export as text and ou word) and Omnipage pro 16 (OCR the PDF file and save as text or word).
DDHarriman is offline   Reply With Quote
Old 04-25-2008, 04:12 PM   #10
WillAdams
Wizard
WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.
 
WillAdams's Avatar
 
Posts: 1,233
Karma: 3350652
Join Date: Feb 2008
Device: Amazon Kindle Paperwhite (300ppi), Samsung Galaxy Book 12
The best tool I've found for this is Marcel Weiher's TextLightning.app available from www.metaobject.com (ob. discl. I was a beta-tester). Although it's a Mac OS X app, it's available for Linux and could probably be compiled for Windows using the recently improved support for Windows GNUstep www.gnustep.org affords.

William
WillAdams is offline   Reply With Quote
Old 08-20-2009, 08:10 AM   #11
stilliremain
Junior Member
stilliremain began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Sep 2008
Device: Cybook Gen3 eBoo
LRF conversion seems to have been removed from docudesk unpdf professional version 3.0? Can anyone confirm? I've downloaded trials of 2 and 3 and this seems to be the case...
stilliremain is offline   Reply With Quote
Old 08-24-2009, 05:54 AM   #12
Christina789
Junior Member
Christina789 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Aug 2009
Device: none
Quote:
Originally Posted by Prospect View Post
When converting PDFs to MobiPocket for my Cybook I have so far used MobiPocket Creator and Adobe Acrobat v6.

I think that the best result is archived if I export the PDF to HTML from Acrobat and then convert the HTML file to .prc using MopiPocket Creator, instead of converting directly from PDF to .prc in MobiPocket creator.

As far as I know I could also use BookDesigner for this task.

The conversion is never perfect and there are always issues with formatting.

How do you extract your PDFs? What is the best current tool/process? Will I archive better results if I update Acrobat to the latest version?
When I only need to extract some test content from a PDF file. I use the freeware AnyBizSoft PDF to Text. It extracts text from PDF files.
Since you want to retain the format, I think converting PDF to Word or HTML, and then to .prc could be a choice. Anyway, I think there will be problems with formatting once a file is converted for 2 or more times with different tools.
Christina789 is offline   Reply With Quote
Old 08-24-2009, 08:46 PM   #13
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,185
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
I extract PDFs to Word docs (or RTF; the file is the same from my viewpoint), and then edit the Word doc. If I were more fluent in HTML, I'd extract to that--and expect spend the same amount of time editing the HTML file as I spend on the average PDF-to-Word conversion.

I generally have to fix the page sizes & margins, remove text boxes, change pictures to inline with text, and do odd things to get rid of the page numbers & headers. Then I fix the paragraph settings starting by making them all single-spaced, and removing the right & left margin indents if any; if it's reasonable, I change them all to the same before & after amounts and justification. Then I set the font--make it all one font, use find & replace to fix the sizes, make sure it's all 100% size, not condensed or expanded.

I'd expect HTML files to work better if the fonts were normalized, remove the extra "div" sections and "align" tags, get rid of tables that force the page structure.

Basic novels should transfer nicely. Of course, basic novels probably transfer fine from the original PDF straight to Mobi. It's when there are other formatting aspects that the conversion breaks down, and none of the auto-converters shines as the best one, because PDF wasn't designed to be a convert-from format.
Elfwreck is offline   Reply With Quote
Old 09-26-2009, 03:12 PM   #14
orion2001
Groupie
orion2001 can name that song in three notesorion2001 can name that song in three notesorion2001 can name that song in three notesorion2001 can name that song in three notesorion2001 can name that song in three notesorion2001 can name that song in three notesorion2001 can name that song in three notesorion2001 can name that song in three notesorion2001 can name that song in three notesorion2001 can name that song in three notesorion2001 can name that song in three notes
 
Posts: 162
Karma: 24658
Join Date: Sep 2009
Device: PRS-505
Hi Elfwreck,

I posted in another thread regarding this, but you seem to have a lot of experience with PDF->Word conversions. You outlined a lot of postprocessing that you do. Does your convertor insert paragraph breaks at the end of a page even if a sentence is continued on the next? If so, do you go in and manually delete every spurious paragraph break for each page? I can't figure out if there is a software smart enough to not include these breaks at the end of a page, or if there is an easy way to correct for it.
Thanks!
orion2001 is offline   Reply With Quote
Old 09-26-2009, 05:13 PM   #15
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,185
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
Quote:
Originally Posted by orion2001 View Post
I posted in another thread regarding this, but you seem to have a lot of experience with PDF->Word conversions.
An insane amount. I've been working with PDF conversions for 10 years. (I still miss some features of Acrobat 4 that got dropped in later updates.) (Not that I want to go back. I just wish they'd change those few features.)

Quote:
You outlined a lot of postprocessing that you do. Does your convertor insert paragraph breaks at the end of a page even if a sentence is continued on the next? If so, do you go in and manually delete every spurious paragraph break for each page? I can't figure out if there is a software smart enough to not include these breaks at the end of a page, or if there is an easy way to correct for it.
Thanks!
Yes, it keeps the original page breaks, which means adding paragraph breaks in those spots. If it's short, I sometimes scroll through & manually remove the page breaks/paragraph breaks at the ends of each page.

Otherwise, I look for ways to identify paragraph breaks in the wrong places. This starts with removing unwanted page breaks; sometimes I remove them all (replace with a space); sometimes I try to keep them before chapter breaks, if chapter headers have identifiable typographical issues that I can search for.

Then: Search for [any letter]^p (or [any letter][space]^p), replace with [find what text]qqq, then replace ^pqqq with [space].

This doesn't work if some paragraphs are supposed to end with letters instead of punctuation (like tables), so it may involve some checking & manual touch-up. And it won't catch sentences that ended on one page, and the first line of the next page is supposed to be part of the same paragraph.

Sometimes I can search for tabs or indentation of first line--often, anything that's not indented is either a chapter header or should be part of the previous page. So, semi-manual: search, then manually fix.

It gets faster with practice. It's always a bit choppy, and never as good as a page-by-page QC, although I find it plenty acceptable for personal reading. Since most of the PDFs I convert this way are either not legal to distribute, or only of interest to a very limited crowd (I convert legal rulings from PDF to neatly-formatted Word docs for friends), I've not had to develop anything that works more smoothly.

Last edited by Elfwreck; 09-26-2009 at 05:15 PM.
Elfwreck is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
eBook PDF - free tool for creating PDF eBooks from text files KACartlidge PDF 6 01-04-2012 09:41 AM
Best PDF conversion tool. Dark123 PDF 19 04-21-2010 02:52 AM
Best PDF Convertion Tool Nathan Campos Workshop 5 12-27-2009 10:47 AM
Yet another PDF cropping tool sjvr767 iRex 7 02-14-2009 07:04 AM


All times are GMT -4. The time now is 06:25 PM.


MobileRead.com is a privately owned, operated and funded community.