Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 01-25-2016, 11:23 AM   #1
1v4n0
Groupie
1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.
 
Posts: 171
Karma: 40000
Join Date: Oct 2013
Device: kindle
Post pdf to doc: best way?

Hello guys.

What is the best and (possibly) easiest way to convert from pdf (text) to doc/odt/rft? And I mean with the line breaks in the right places too, not just saving to doc from acrobat. The final goal is an epub.

I did it this way (acrobat pro->save as doc) once, and then I bulk-corrected all the line breaks with perfect epub. Not sure if I missed anything.

EDIT apparently I did. It doesn't undo line breaks where the line ends with punctuation and starts with a guillemet «, and it erases the dash where the new line starts with one (this is probably due to prefectepub's hypenation regex). Also, it doesn't undo line breaks if a page ends with a period and the next one starts with a capital letter.

EDIT2 acrobat pro->save as HTML does a pretty good job.

One other time I passed the pdf through finereader, but it was more complicated.

Thanks.

Last edited by 1v4n0; 02-14-2017 at 08:38 AM.
1v4n0 is offline   Reply With Quote
Old 07-24-2016, 01:45 AM   #2
EbokJunkie
Addict
EbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blue
 
Posts: 229
Karma: 13495
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
1v4n0

Aiseesoft PDF Converter Ultimate converts pdf->docx (text only) surprisingly good.
EbokJunkie is offline   Reply With Quote
Old 08-07-2016, 10:44 AM   #3
calum.kane
Junior Member
calum.kane has memorized the entire works of Homer, Shakespeare, and Jane Austencalum.kane has memorized the entire works of Homer, Shakespeare, and Jane Austencalum.kane has memorized the entire works of Homer, Shakespeare, and Jane Austencalum.kane has memorized the entire works of Homer, Shakespeare, and Jane Austencalum.kane has memorized the entire works of Homer, Shakespeare, and Jane Austencalum.kane has memorized the entire works of Homer, Shakespeare, and Jane Austencalum.kane has memorized the entire works of Homer, Shakespeare, and Jane Austencalum.kane has memorized the entire works of Homer, Shakespeare, and Jane Austencalum.kane has memorized the entire works of Homer, Shakespeare, and Jane Austencalum.kane has memorized the entire works of Homer, Shakespeare, and Jane Austencalum.kane has memorized the entire works of Homer, Shakespeare, and Jane Austen
 
calum.kane's Avatar
 
Posts: 4
Karma: 23332
Join Date: Aug 2016
Location: Columbia
Device: Kobo Aura
It depends on which version of office you are using. MS Office 2007 has a full utility for converting and editing PDF files.
calum.kane is offline   Reply With Quote
Old 08-07-2016, 03:41 PM   #4
1v4n0
Groupie
1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.
 
Posts: 171
Karma: 40000
Join Date: Oct 2013
Device: kindle
I'm on open office
1v4n0 is offline   Reply With Quote
Old 08-07-2016, 05:31 PM   #5
Pablo
Guru
Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.Pablo ought to be getting tired of karma fortunes by now.
 
Pablo's Avatar
 
Posts: 970
Karma: 4999999
Join Date: Mar 2009
Location: Rosario, Argentina
Device: SONY PRS-505, PRS-T2
Conversion of pdf files is not at all easy, and every pdf is different. In all cases, the resulting file has to be parsed and corrected manually.

My first step with text pdf files is always BRISS to remove headers and footers and then Mobipocket Creator Professional Edition (free software). Of the several output formats, I always use the html file. If the html is in good shape, I open it with Sigil. If a lot of corrections are needed, I sometimes use notepad++ to make a first cleanup and then I open with Sigil.

I've tried other solutions, including saving to html with Adobe Standard, but I always come back to Mobipocket Creator, which has always produced the best results for me.

Hope this helps.

Last edited by Pablo; 08-07-2016 at 05:34 PM.
Pablo is offline   Reply With Quote
Old 08-07-2016, 07:34 PM   #6
EbokJunkie
Addict
EbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blueEbokJunkie can differentiate black from dark navy blue
 
Posts: 229
Karma: 13495
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
Aiseesoft PDF Converter Ultimate->docx->proofreading->applying Heading style to chapter titles, generate TOC->Calibre->ePub

Last edited by EbokJunkie; 08-07-2016 at 07:37 PM.
EbokJunkie is offline   Reply With Quote
Old 08-11-2016, 11:11 PM   #7
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by calum.kane View Post
It depends on which version of office you are using. MS Office 2007 has a full utility for converting and editing PDF files.
I had not realized that recent versions of MS Word would directly read/convert PDF files. Thank you for pointing this out. I just tried loading a few random PDF files into MS Word 2016 (Office 365 v16) and it did an amazingly good job. I tried scanned documents (no OCR layer) with inset graphics, and it correctly found and OCR'd the text, using the correct font, and also inset the graphics perfectly within the text. A two-column scientific paper (computer generated, not scanned) was converted perfectly into a Word file--all of the text in the correct font and with all of the figures placed perfectly. Even when I scanned that same 2-column article into a bitmapped PDF file with no OCR layer and used Word to open that bitmapped version, it still correctly rendered the text (using OCR) into two columns and inset the figures. Microsoft has clearly put a lot of effort into turning PDFs into Word documents. I had no idea. This will be my number one recommendation for dealing with PDFs from this point forward.

Edit: I've added some examples. The scanned conversion isn't as good as the conversion of the computer-generated PDF (native.pdf), but it's still very good considering the intelligence that had to go into something like that. Sorry I used a zip file for the word files. The forum attachment system would not let me attach a .docx file. The open_office_format.zip file contains the Word files saved in the .odt format.
Attached Files
File Type: pdf native.pdf (154.0 KB, 1095 views)
File Type: pdf scanned_no_ocr_layer.pdf (883.0 KB, 1028 views)
File Type: zip wordfiles.zip (45.8 KB, 980 views)
File Type: zip open_office_format.zip (43.6 KB, 769 views)

Last edited by willus; 08-12-2016 at 09:56 AM. Reason: Added examples (now with .odt files)
willus is offline   Reply With Quote
Old 08-12-2016, 04:18 AM   #8
1v4n0
Groupie
1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.
 
Posts: 171
Karma: 40000
Join Date: Oct 2013
Device: kindle
Cool. Open office doesn't have anything like that, right?
1v4n0 is offline   Reply With Quote
Old 08-12-2016, 09:54 AM   #9
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by 1v4n0 View Post
Cool. Open office doesn't have anything like that, right?
I don't know--I haven't tried it. Try loading the above PDF examples into the Open Office version of Word and see how the conversions compare to what I posted from MS Office 365 v16.
willus is offline   Reply With Quote
Old 08-12-2016, 09:08 PM   #10
Pyr0
Junior Member
Pyr0 began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Apr 2011
Device: Kindle Keyboard 3 wifi, Kindle Voyage
How do you go from there?. I got a nice docx but when I converted to AZW3 with calibre to use in my kindle it was terrible.
Pyr0 is offline   Reply With Quote
Old 08-12-2016, 11:07 PM   #11
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by Pyr0 View Post
How do you go from there?. I got a nice docx but when I converted to AZW3 with calibre to use in my kindle it was terrible.
This is probably better asked/searched in the Kindle Format forum.
willus is offline   Reply With Quote
Old 02-14-2017, 08:41 AM   #12
1v4n0
Groupie
1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.1v4n0 writes the songs that make the whole world sing.
 
Posts: 171
Karma: 40000
Join Date: Oct 2013
Device: kindle
Sorry for the gravedigging, and feel free to delete this post. I just found out that exporting the pdf to html instead of doc does a pretty good job, at least with acrobat pro. Then you can just edit the html file with whatever text editor you have (OpenOffice, in my case).
1v4n0 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
convert word doc to pdf or epub wrenn1 Kobo Reader 13 07-29-2010 12:44 PM
Android doc 2 pdf (offline) Snepscheut enTourage Archive 1 06-23-2010 11:59 PM
Converting from .doc to .pdf. to .lrf???? tnronin Workshop 8 01-28-2010 11:24 AM


All times are GMT -4. The time now is 01:30 AM.


MobileRead.com is a privately owned, operated and funded community.