Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 03-15-2011, 03:16 PM   #1
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 312
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch
Need help with Abbyy Finereader 10 (linebreaks)

Hello.

Can someone help me? I have a problem with Abby Finereader 10 Professional Edition.

I started to scan a few pages from a book. But the problem is: the page ends, but the paragraph does not. It continues on the next page. However, Abbyy Finereader adds a linebreak.

Just to show what I mean, I add a picture from my Abbyy Screen: On top you see the original page of the book (instead of scanning, I tried it with a camera - it works) and below is the text that Abbyy made out of it.

you see, where it says "that is a long distance," ... this is the end of the page. But the sentence continues on the next page with "but even if ...". This happens on every page. Of course I tried to remove the linebreak manually, but I can't (or I haven't found out how).

Another thing is: this book has more then 400 pages. So even if I can manually remove the linebreak, it would be very tedious to do it on almost all 400 pages. So is there a way how I can tell Abbyy to remove this linebreak? Every new paragraph starts with a text intent, and Abbyy recognizes this, if it is on the middle of a page, but when it comes to a new page ... it's like every new page is a new document.

I hope someone can help me.
Attached Thumbnails
Click image for larger version

Name:	Scan 2.jpg
Views:	244
Size:	167.8 KB
ID:	68400  

Last edited by NASCARaddicted; 03-15-2011 at 10:55 PM.
NASCARaddicted is offline   Reply With Quote
Old 03-16-2011, 04:22 AM   #2
Iain
Enthusiast
Iain began at the beginning.
 
Posts: 49
Karma: 14
Join Date: Jul 2010
Location: Harrogate, England
Device: iPad
Abby and Breaks

HI. I found the same thing.

My approach was to convert to Word 2007 and then process the word file to remove most of the incorrect line breaks whilst converting it to ePub format.

I'd be happy to let you have a copy of the program but there are two things you should be aware of. Firstly, it's currently running around 80% (that's a guess by the way). So it corrects 8 out of 10 of these spurious breaks. I need to go back and update it.

Secondly, it's been up on the blocks with it's wheels off (metaphorically) for the last 3 months or so waiting for me to finish off modifictions to identify chapter breaks better than FineReader does. Unfortunately a heap of other things have stopped me moving forward on this and I'm not sure when I'll get back to it.

If this is interesting let me know and I'll add you to my list of 'beta testers'.

It also helps sort out some of the hyphenation errors.


Iain
Iain is offline   Reply With Quote
 
Enthusiast
Old 03-16-2011, 01:27 PM   #3
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 312
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch
Quote:
Originally Posted by Iain View Post
HI. I found the same thing.

My approach was to convert to Word 2007 and then process the word file to remove most of the incorrect line breaks whilst converting it to ePub format.

I'd be happy to let you have a copy of the program but there are two things you should be aware of. Firstly, it's currently running around 80% (that's a guess by the way). So it corrects 8 out of 10 of these spurious breaks. I need to go back and update it.

Secondly, it's been up on the blocks with it's wheels off (metaphorically) for the last 3 months or so waiting for me to finish off modifictions to identify chapter breaks better than FineReader does. Unfortunately a heap of other things have stopped me moving forward on this and I'm not sure when I'll get back to it.

If this is interesting let me know and I'll add you to my list of 'beta testers'.

It also helps sort out some of the hyphenation errors.


Iain
Hello Ian

Thanks for your offer, I will give it a try.

But I am kinda surprised about Abbyy. It seems as if this is really not possible in FR10. I may be wrong, but I think I heard and saw something like: it was possible in FR8 and FR9. Can anyone confirm this?
NASCARaddicted is offline   Reply With Quote
Old 03-16-2011, 05:20 PM   #4
adv_dp_fan
Connoisseur
adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!
 
Posts: 95
Karma: 57138
Join Date: May 2010
Device: Sony 505, iPad 1 & 3, Galaxy Note 8.1
I use FR10 and yeah, sometimes it correctly detects that the paragraph continues on the next page and sometimes it doesn't. I've just taken to running back through the book after I'm done in an external editor with the physical copy checking the beginning of each paragraph to make sure they're correct and just use delete/space to correct them. As I do this as part of some other general proofing/cleanup of the final file it doesn't add much to my work flow.

This also lets me catch another error that sometimes pops up with FR10, the incorrect joining of multiple paragraphs into one in a page.

I use to use FR9 and it had the same issues.

All in all though, it gets most of the paragraphs correct for me.
adv_dp_fan is offline   Reply With Quote
Old 03-27-2011, 06:48 AM   #5
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 413
Karma: 326969
Join Date: Nov 2009
Location: Romania
Device: iPod touch 2G (16 GB)
You could save it as HTML from Word (or FineReader), open it up in your favourite web browser and quickly pass it over. They'll stand out like nails, especially if you zoom out a bit. Then, when you find one, you simply go back to Word and add a line break (Shift+Enter) where you want. Easy.

Just don't leave spaces before a line break, though. Toggle this little guy before you check it:

... else there may be a few double spaces when (or if) you decide to convert it to ePUB or some other format.

Also, you could try clicking the middle mouse click when you view the HTML (in the web browser) and drag the mouse down so it will scroll automatically so your finger won't get tired. But not too fast so as to not miss any of them.

Last edited by DSpider; 03-27-2011 at 07:00 AM.
DSpider is offline   Reply With Quote
Old 03-27-2011, 07:58 AM   #6
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 3,002
Karma: 3440001
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
You could also use a RegExp in Word for this. Search for a small letter or a hyphen combined with a line break and replace that line break with a space.
Toxaris is offline   Reply With Quote
Old 03-27-2011, 12:24 PM   #7
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 312
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch
thanks guys for your help. Right now, before I convert from html to epub, I use a regex search. If there is a new paragraph, without the characters .!? in front of it, it is probably wrong, so finding it is not really a problem.

But I thought Abbyy 10 should be able to detect this by himself. It seems as if it ain't .... but you can't even do it manually, and that is what surprised me most ...
NASCARaddicted is offline   Reply With Quote
Old 03-27-2011, 02:27 PM   #8
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 3,002
Karma: 3440001
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
I must say it surprises me. In a whole book it happens only in about 10 sentences for me. The rest is correctly identified by Abbyy.
Toxaris is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
ABBYY FineReader - Proof reading tips? PieOPah Workshop 23 03-02-2012 01:03 AM
ABBYY Finereader and text formating Student1 Workshop 6 12-15-2011 06:37 PM
Abbyy FineReader Dictionaries Mebyon Workshop 2 02-10-2010 02:57 PM
ABBYY FineReader cannot see images chinesealbumart Workshop 8 05-15-2009 11:03 PM
Ended wanted: coupon code for Abbyy finereader moz Flea Market 1 03-12-2008 02:10 AM


All times are GMT -4. The time now is 11:35 AM.


MobileRead.com is a privately owned, operated and funded community.