Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 11-24-2009, 07:42 PM   #1
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
Advice on formatting OCR'd printed to digital material.

Hi there ! I'm working on a thesis and I would like to use a good chunk of printed material. Please help me understand why this is happening and how to fix it.

I've used ABBYY FineReader 8 which came with the scanner, corrected all mistakes but the output PDF in Acrobat looks like this:



As you can see, the Italic font is out of line. Could it be because ABBYY saves the PDF as v1.4 ? Or is it because it doesn't properly recognize the characters ? Maybe because it couldn't find a matching font ?

How can I "narrow" it down without sacrificing font size ? Is replacing the font the only option ?


My next issue is with random tables... What's with these ? I only see them using Ctrl+A. They seem to serve no function and the PDF is smaller if manually removed one by one (a real pain in the *ss).




Thank you very much.



PS: Acrobat is a very poor PDF editor. No bolding options, italic, strikethrough, etc. Very expensive for such limited functionality. I wish the scanner was cheaper and didn't include Acrobat... Can you please suggest a "better" PDF editor ? Adobe InDesign ? Wonder how much THAT will cost... Pff.
DSpider is offline   Reply With Quote
Old 11-24-2009, 08:31 PM   #2
AnemicOak
Bookaholic
AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.
 
AnemicOak's Avatar
 
Posts: 14,391
Karma: 54969924
Join Date: Oct 2007
Location: Minnesota
Device: iPad Mini 4, AuraHD, iPhone XR +
Quote:
Originally Posted by DSpider View Post
PS: Acrobat is a very poor PDF editor. No bolding options, italic, strikethrough, etc. Very expensive for such limited functionality. I wish the scanner was cheaper and didn't include Acrobat... Can you please suggest a "better" PDF editor ? Adobe InDesign ? Wonder how much THAT will cost... Pff.
Acrobat isn't really for editing PDF's in the sense of what you're doing. PDF's aren't really designed to be edited that way.

No, InDesign won't do anything as far as editing a PDF goes.

What you want is to work from a source file (RTF, DOC) and then when you get it how you want create a PDF from the source file (Acrobat will export to those formats). When I OCR I OCR to RTF and go from there. If you use Acrobat to export to RTF you can then use Word, Open Office or whatever to get it exactly how you want it and then if you want a PDF you can create one from the edited RTF.
AnemicOak is offline   Reply With Quote
Advert
Old 11-25-2009, 01:50 PM   #3
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
Is there a way to retain the original layout of the scanned material ? Line breaks and everything.

Maybe something found in ABBYY FineReader 10 and not v8 ? Also, Microsoft Word 2007 seems to display .rtf files better than OpenOffice (what a disappointment). Haven't tried exporting to .doc. You think it will make any difference ?
DSpider is offline   Reply With Quote
Old 11-25-2009, 02:13 PM   #4
Kellhus
Member
Kellhus began at the beginning.
 
Posts: 17
Karma: 20
Join Date: Oct 2009
Device: none
You've got a couple of options here. In FR 8.0 Go to:

Tools menu -> Options -> 4. Save
choose "Formats Settings"
In "RTF/DOC/Word XML" you've got something called "Retain layout".

I'd suggest you try either of these.
Kellhus is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Telegraph: The OED Will Not Be Printed Again Ben Thornton News 21 08-30-2010 11:46 AM
digital media and printed media are the same... mattbiernat Amazon Kindle 0 08-13-2010 07:55 PM
[KOBO] Strip existing formatting to apply my own default formatting to all books digital_steve Calibre 2 08-10-2010 06:34 PM
What printed Book do you want as an ebook? Dr. Drib Reading Recommendations 268 11-22-2009 05:25 AM
Digital revolution comes to printed word DonaldL. News 0 11-07-2008 06:44 PM


All times are GMT -4. The time now is 03:41 PM.


MobileRead.com is a privately owned, operated and funded community.