|
|
#1 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 298
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch
|
Need help with Abbyy Finereader 10 (linebreaks)
Can someone help me? I have a problem with Abby Finereader 10 Professional Edition. I started to scan a few pages from a book. But the problem is: the page ends, but the paragraph does not. It continues on the next page. However, Abbyy Finereader adds a linebreak. Just to show what I mean, I add a picture from my Abbyy Screen: On top you see the original page of the book (instead of scanning, I tried it with a camera - it works) and below is the text that Abbyy made out of it. you see, where it says "that is a long distance," ... this is the end of the page. But the sentence continues on the next page with "but even if ...". This happens on every page. Of course I tried to remove the linebreak manually, but I can't (or I haven't found out how). Another thing is: this book has more then 400 pages. So even if I can manually remove the linebreak, it would be very tedious to do it on almost all 400 pages. So is there a way how I can tell Abbyy to remove this linebreak? Every new paragraph starts with a text intent, and Abbyy recognizes this, if it is on the middle of a page, but when it comes to a new page ... it's like every new page is a new document. I hope someone can help me. Last edited by NASCARaddicted; 03-15-2011 at 10:55 PM. |
|
|
|
|
|
#2 |
|
Enthusiast
![]() Posts: 48
Karma: 14
Join Date: Jul 2010
Location: Harrogate, England
Device: iPad
|
Abby and Breaks
HI. I found the same thing.
My approach was to convert to Word 2007 and then process the word file to remove most of the incorrect line breaks whilst converting it to ePub format. I'd be happy to let you have a copy of the program but there are two things you should be aware of. Firstly, it's currently running around 80% (that's a guess by the way). So it corrects 8 out of 10 of these spurious breaks. I need to go back and update it. Secondly, it's been up on the blocks with it's wheels off (metaphorically) for the last 3 months or so waiting for me to finish off modifictions to identify chapter breaks better than FineReader does. Unfortunately a heap of other things have stopped me moving forward on this and I'm not sure when I'll get back to it. If this is interesting let me know and I'll add you to my list of 'beta testers'. It also helps sort out some of the hyphenation errors. Iain |
|
|
|
|
Enthusiast
|
|
|
|
#3 | |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 298
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch
|
Quote:
Thanks for your offer, I will give it a try. But I am kinda surprised about Abbyy. It seems as if this is really not possible in FR10. I may be wrong, but I think I heard and saw something like: it was possible in FR8 and FR9. Can anyone confirm this? |
|
|
|
|
|
|
#4 |
|
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 81
Karma: 37574
Join Date: May 2010
Device: Sony 505 and iPad
|
I use FR10 and yeah, sometimes it correctly detects that the paragraph continues on the next page and sometimes it doesn't. I've just taken to running back through the book after I'm done in an external editor with the physical copy checking the beginning of each paragraph to make sure they're correct and just use delete/space to correct them. As I do this as part of some other general proofing/cleanup of the final file it doesn't add much to my work flow.
This also lets me catch another error that sometimes pops up with FR10, the incorrect joining of multiple paragraphs into one in a page. I use to use FR9 and it had the same issues. All in all though, it gets most of the paragraphs correct for me. |
|
|
|
|
|
#5 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 368
Karma: 298951
Join Date: Nov 2009
Location: Romania
Device: iPod touch 2G (16 GB)
|
You could save it as HTML from Word (or FineReader), open it up in your favourite web browser and quickly pass it over. They'll stand out like nails, especially if you zoom out a bit. Then, when you find one, you simply go back to Word and add a line break (Shift+Enter) where you want. Easy.
Just don't leave spaces before a line break, though. Toggle this little guy before you check it:![]() ... else there may be a few double spaces when (or if) you decide to convert it to ePUB or some other format. Also, you could try clicking the middle mouse click when you view the HTML (in the web browser) and drag the mouse down so it will scroll automatically so your finger won't get tired. But not too fast so as to not miss any of them. Last edited by DSpider; 03-27-2011 at 07:00 AM. |
|
|
|
|
|
#6 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,095
Karma: 927511
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
|
You could also use a RegExp in Word for this. Search for a small letter or a hyphen combined with a line break and replace that line break with a space.
__________________
Creator and maintainer of the e-Book Tools Word add-in. Creator and maintainer of the Clean HTML macro for MS Word. |
|
|
|
|
|
#7 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 298
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch
|
thanks guys for your help. Right now, before I convert from html to epub, I use a regex search. If there is a new paragraph, without the characters .!? in front of it, it is probably wrong, so finding it is not really a problem.
But I thought Abbyy 10 should be able to detect this by himself. It seems as if it ain't .... but you can't even do it manually, and that is what surprised me most ... |
|
|
|
|
|
#8 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,095
Karma: 927511
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
|
I must say it surprises me. In a whole book it happens only in about 10 sentences for me. The rest is correctly identified by Abbyy.
__________________
Creator and maintainer of the e-Book Tools Word add-in. Creator and maintainer of the Clean HTML macro for MS Word. |
|
|
|
![]() |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| ABBYY FineReader - Proof reading tips? | PieOPah | Workshop | 23 | 03-02-2012 01:03 AM |
| ABBYY Finereader and text formating | Student1 | Workshop | 6 | 12-15-2011 06:37 PM |
| Abbyy FineReader Dictionaries | Mebyon | Workshop | 2 | 02-10-2010 02:57 PM |
| ABBYY FineReader cannot see images | chinesealbumart | Workshop | 8 | 05-15-2009 11:03 PM |
| Ended wanted: coupon code for Abbyy finereader | moz | Flea Market | 1 | 03-12-2008 02:10 AM |