Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 06-07-2013, 10:25 PM   #16
noork85
Junior Member
noork85 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2012
Device: iPad 4
ok i ran it through ABBYY and then saved the files as a word doc. now wverything is just centered. chNging alignment doesnt work either. its all centered,

btw, i do like abbyy much better than paperport.
noork85 is offline   Reply With Quote
Old 06-08-2013, 01:38 AM   #17
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 2,862
Karma: 2714881
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
You can save a Word document from ABBYY in multiple manners. I usually take 'workable copy' as I think it is called.

Intelligent S&R are smart search and replace action you can create, usually with wildcards. It resembles RegEx very much.
Toxaris is online now   Reply With Quote
Old 06-12-2013, 03:24 AM   #18
noork85
Junior Member
noork85 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2012
Device: iPad 4
Quote:
Originally Posted by Toxaris View Post
You can save a Word document from ABBYY in multiple manners. I usually take 'workable copy' as I think it is called.

Intelligent S&R are smart search and replace action you can create, usually with wildcards. It resembles RegEx very much.

i didnt understand a word you said about S&R

BUT....omg...what a difference using ABBY has made. i love it! minimum mistakes so far.

i opened the book in microsoft word from abby and created a table of contents from there. i transferred the book to my ipad and i used PerefectReader/Stanza/iBooks to open the pdf. When i click the icons to view contents it says there are none. yet, the second page is the table of contents and is very much 'clickable'. but if i want to go from chapter to chapter, i would have to o back to the beginning of the book and click from there.

any way around this??
noork85 is offline   Reply With Quote
Old 06-12-2013, 03:43 AM   #19
Jen_Smith
Member
Jen_Smith has won the respect of intelligent people and the affection of children.Jen_Smith has won the respect of intelligent people and the affection of children.Jen_Smith has won the respect of intelligent people and the affection of children.Jen_Smith has won the respect of intelligent people and the affection of children.Jen_Smith has won the respect of intelligent people and the affection of children.Jen_Smith has won the respect of intelligent people and the affection of children.Jen_Smith has won the respect of intelligent people and the affection of children.Jen_Smith has won the respect of intelligent people and the affection of children.Jen_Smith has won the respect of intelligent people and the affection of children.Jen_Smith has won the respect of intelligent people and the affection of children.Jen_Smith has won the respect of intelligent people and the affection of children.
 
Posts: 23
Karma: 140640
Join Date: Apr 2013
Device: Kobo Aura HD, Sony PRS-T1, Kobo Mini
The way I do it is:

1. Scan.
2. OCR to convert the images to a Word file.
3. Cut and paste the entire Word document into Notepad. This gets rid of all the formatting. I use ABBYY Finereader too, and the OCR package knows the difference between text and page headings etc. It does page headings/page numbers as headers/footers in word, so copy/pasting to Notepad gets rid of them all quickly.
4. Cut and past back from Notepad into a new Word document.
5. Run spellchecker (as has been commented above, this gets rid of the repeated and obvious errors).
6. Insert page breaks where chapter breaks are supposed to be.
7. Import the Word document into Calibre and convert to epub.
8. Import the epub into Jutoh (which is the epub editing software I use), and go through the whole thing word-by-word fine tuning, adding in italics, links, and any other stuff.
9. Create a new epub.
10. Delete the original epub from Calibre, and import the new, corrected, version into Calibre to replace it. Sort out the metadata (author, series etc).
11. Done.

There are lots of more knowledgeable people on here than I am, who understand about code and things like that, so their way might be better than mine.

My ABBYY Finereader is version 9; it came packaged with the scanner I use. It's been updated since, but what I've got works pretty well.

And as far as I'm concerned, S&R means 'search and rescue'. Intelligent S&R actions are presumably wearing a lifejacket and keep your control centre updated with where you are and how it's going?
Jen_Smith is offline   Reply With Quote
Old 06-12-2013, 04:50 AM   #20
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 2,862
Karma: 2714881
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
I use the following method:
1. Scan the book
2. Run through ABBYY, save as DOCX, HTML (for images and to see if text is seen as image by mistake) and PDF/A (to simplify searching the original scan)
3. Start Word, load the DOCX and run my add-in (first procedures) to solve/fix a large number of OCR issues.
4. Before the final two steps, I also run a spellcheck
5. Perform the conversion to HTML and generate the basic ePUB
6. Make final touches to the ePUB via Sigil. Usually that is somewhat more complex formatting and the TOC.
Toxaris is online now   Reply With Quote
Old 06-16-2013, 01:24 AM   #21
MissGoat
Junior Member
MissGoat began at the beginning.
 
Posts: 1
Karma: 10
Join Date: May 2013
Device: Ipad
Great tips everyone...My workflow is similar to Jen_Smith.

I use AbbyFinereader Express as it is the only option available for Mac. Unfortunately, it limited in its function when compared to the PC version.

I scan 2 pages of the book in 1 go.

I recently started to convert to html and found that there is less conversion issues vs converting to text. When conveted, the html looks as I scan it ... 2 pages.

Once converted, I copy all and paste to my text edit program on Mac. But the downside of this is that all the formatting (italics etc ) is lost.

I have an MS Word but I cant figure it out. How can I open the html file in MS word. I have tried to copy and paste into the MS word but it paste as how I scan it. How do I get it into one column instead of 2?

I am trying to reduce one step in the process by not having to find and add in italics.

Any help is appreciated. Thanks.
MissGoat is offline   Reply With Quote
Old 09-01-2013, 11:18 PM   #22
simsurfer
Junior Member
simsurfer began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Sep 2013
Device: Kindle Paperwhite, Nexus 7
Hello all:

First post here. I recently sent one of favorite books to 1DollarScan and got the PDF with OCR back from them. I have run that through ABBY and it looks fantastic! Cleaned up a bunch of issues in ABBY and have saved that to HTML. From there I have run that HTML file through Calibre and while the results are good I am getting short lines of text as one of other users mentions. I would love to get the book as close as possible to the original.

That being said, in ABBY, once I run the PDF through that should I then save it to Open Office, clean up the issues in formatting and then save that as a HTML file and then into Calibre?

Thanks a lot for any assistance on this, I am a total noob at this but spent the entire weekend experimenting with this.
simsurfer is offline   Reply With Quote
Old 09-01-2013, 11:29 PM   #23
AnemicOak
Bookaholic
AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.AnemicOak ought to be getting tired of karma fortunes by now.
 
AnemicOak's Avatar
 
Posts: 10,020
Karma: 28126419
Join Date: Oct 2007
Location: Minnesota
Device: HDX 8.9, AuraHD, Nook HD+, Kindle 2,3,T , Opus, Nexus7, iPhone5, etc
Quote:
Originally Posted by simsurfer View Post
Hello all:

First post here. I recently sent one of favorite books to 1DollarScan and got the PDF with OCR back from them. I have run that through ABBY and it looks fantastic! Cleaned up a bunch of issues in ABBY and have saved that to HTML. From there I have run that HTML file through Calibre and while the results are good I am getting short lines of text as one of other users mentions. I would love to get the book as close as possible to the original.

That being said, in ABBY, once I run the PDF through that should I then save it to Open Office, clean up the issues in formatting and then save that as a HTML file and then into Calibre?

Thanks a lot for any assistance on this, I am a total noob at this but spent the entire weekend experimenting with this.
If you're cleaning things up in Open Office anyway I'd just use the writer2epub plugin to create the ePub. Or you could open the Calibre ePub you have and clean it up in Sigil.
AnemicOak is offline   Reply With Quote
Old 09-02-2013, 12:26 AM   #24
simsurfer
Junior Member
simsurfer began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Sep 2013
Device: Kindle Paperwhite, Nexus 7
Quote:
Originally Posted by AnemicOak View Post
If you're cleaning things up in Open Office anyway I'd just use the writer2epub plugin to create the ePub. Or you could open the Calibre ePub you have and clean it up in Sigil.
But I use Kindle. I would prefer not to have to convert to ePub or Sigil only cause that is yet another format and software page I have to learn.

Cant I just to from Open Office Writer (MS Word) direct to HTML & then Calibre or will the formatting still go weird on me?

Thanks.
simsurfer is offline   Reply With Quote
Old 09-02-2013, 04:13 AM   #25
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 2,862
Karma: 2714881
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
It will probably be weird. Take notice that most makers will create a mobi via an ePUB...
Toxaris is online now   Reply With Quote
Old 09-02-2013, 11:18 AM   #26
simsurfer
Junior Member
simsurfer began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Sep 2013
Device: Kindle Paperwhite, Nexus 7
I just did a fast test by saving to Open Office Writer from ABBY, then running that into Calibre, I edited "some" pages in Writer first, I think the first 9 pages. A really big difference. Havent compared them side by side, Im off to bed soon (work nights) but just wanted to report that the formatting is really good (not 100%, more like 80%). might even be good enough for my tastes. Will keep you informed when Im back at work and have more time to read.
simsurfer is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
sending scanned book to Kindle for conversion? Georgia Swan Workshop 0 07-31-2011 10:52 AM
scanned book to epub langmarp General Discussions 3 06-28-2010 08:44 AM
Scanned in book only works sideways, or upside down PGA Workshop 2 03-12-2010 03:01 PM
Scanned book conversion jabberwock_11 Calibre 2 01-25-2010 03:37 AM
Google Book Settlement Site Is Up; Paying Authors $60 Per Scanned Book yagiz News 8 04-26-2009 01:43 AM


All times are GMT -4. The time now is 01:09 AM.


MobileRead.com is a privately owned, operated and funded community.