03-15-2011, 03:16 PM | #1 |
Addict
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
|
Need help with Abbyy Finereader 10 (linebreaks)
Hello.
Can someone help me? I have a problem with Abby Finereader 10 Professional Edition. I started to scan a few pages from a book. But the problem is: the page ends, but the paragraph does not. It continues on the next page. However, Abbyy Finereader adds a linebreak. Just to show what I mean, I add a picture from my Abbyy Screen: On top you see the original page of the book (instead of scanning, I tried it with a camera - it works) and below is the text that Abbyy made out of it. you see, where it says "that is a long distance," ... this is the end of the page. But the sentence continues on the next page with "but even if ...". This happens on every page. Of course I tried to remove the linebreak manually, but I can't (or I haven't found out how). Another thing is: this book has more then 400 pages. So even if I can manually remove the linebreak, it would be very tedious to do it on almost all 400 pages. So is there a way how I can tell Abbyy to remove this linebreak? Every new paragraph starts with a text intent, and Abbyy recognizes this, if it is on the middle of a page, but when it comes to a new page ... it's like every new page is a new document. I hope someone can help me. Last edited by NASCARaddicted; 03-15-2011 at 10:55 PM. |
03-16-2011, 04:22 AM | #2 |
Enthusiast
Posts: 49
Karma: 14
Join Date: Jul 2010
Location: Harrogate, England
Device: iPad
|
Abby and Breaks
HI. I found the same thing.
My approach was to convert to Word 2007 and then process the word file to remove most of the incorrect line breaks whilst converting it to ePub format. I'd be happy to let you have a copy of the program but there are two things you should be aware of. Firstly, it's currently running around 80% (that's a guess by the way). So it corrects 8 out of 10 of these spurious breaks. I need to go back and update it. Secondly, it's been up on the blocks with it's wheels off (metaphorically) for the last 3 months or so waiting for me to finish off modifictions to identify chapter breaks better than FineReader does. Unfortunately a heap of other things have stopped me moving forward on this and I'm not sure when I'll get back to it. If this is interesting let me know and I'll add you to my list of 'beta testers'. It also helps sort out some of the hyphenation errors. Iain |
Advert | |
|
03-16-2011, 01:27 PM | #3 | |
Addict
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
|
Quote:
Thanks for your offer, I will give it a try. But I am kinda surprised about Abbyy. It seems as if this is really not possible in FR10. I may be wrong, but I think I heard and saw something like: it was possible in FR8 and FR9. Can anyone confirm this? |
|
03-16-2011, 05:20 PM | #4 |
Zealot
Posts: 103
Karma: 57138
Join Date: May 2010
Device: Sony 505, iPad 1 & 3, Galaxy Note 8.1
|
I use FR10 and yeah, sometimes it correctly detects that the paragraph continues on the next page and sometimes it doesn't. I've just taken to running back through the book after I'm done in an external editor with the physical copy checking the beginning of each paragraph to make sure they're correct and just use delete/space to correct them. As I do this as part of some other general proofing/cleanup of the final file it doesn't add much to my work flow.
This also lets me catch another error that sometimes pops up with FR10, the incorrect joining of multiple paragraphs into one in a page. I use to use FR9 and it had the same issues. All in all though, it gets most of the paragraphs correct for me. |
03-27-2011, 06:48 AM | #5 |
Evangelist
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
|
You could save it as HTML from Word (or FineReader), open it up in your favourite web browser and quickly pass it over. They'll stand out like nails, especially if you zoom out a bit. Then, when you find one, you simply go back to Word and add a line break (Shift+Enter) where you want. Easy.
Just don't leave spaces before a line break, though. Toggle this little guy before you check it: ... else there may be a few double spaces when (or if) you decide to convert it to ePUB or some other format. Also, you could try clicking the middle mouse click when you view the HTML (in the web browser) and drag the mouse down so it will scroll automatically so your finger won't get tired. But not too fast so as to not miss any of them. Last edited by DSpider; 03-27-2011 at 07:00 AM. |
Advert | |
|
03-27-2011, 07:58 AM | #6 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
You could also use a RegExp in Word for this. Search for a small letter or a hyphen combined with a line break and replace that line break with a space.
|
03-27-2011, 12:24 PM | #7 |
Addict
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
|
thanks guys for your help. Right now, before I convert from html to epub, I use a regex search. If there is a new paragraph, without the characters .!? in front of it, it is probably wrong, so finding it is not really a problem.
But I thought Abbyy 10 should be able to detect this by himself. It seems as if it ain't .... but you can't even do it manually, and that is what surprised me most ... |
03-27-2011, 02:27 PM | #8 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
I must say it surprises me. In a whole book it happens only in about 10 sentences for me. The rest is correctly identified by Abbyy.
|
01-19-2017, 12:30 PM | #9 | |
Enthusiast
Posts: 28
Karma: 27226
Join Date: May 2016
Device: Kobo glo hd
|
Quote:
Or any other solutions? |
|
01-19-2017, 01:08 PM | #10 |
A Hairy Wizard
Posts: 3,094
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Here is a quick explanation from the MR wiki...but basically Regex is a way to put a variable(s) into a search and replace. for example:
find: "w.*?d" (without the quotes) would find any word, or group of words that began with 'w' and ended with 'd'. like: "word", "wad", "why are you a nerd" It's very powerful when you learn how to use it properly! Most text editing tools have some form of regex search/replace. I'm pretty sure Finereader does as well. However, you would need to check the finereader users manual to see specifically which commands it supports. You can also search here at MR and you will find a lot of examples like this one that talks about this specific issue. Last edited by Turtle91; 01-19-2017 at 01:15 PM. |
01-19-2017, 04:00 PM | #11 |
Enthusiast
Posts: 28
Karma: 27226
Join Date: May 2016
Device: Kobo glo hd
|
hmmm thank you
now i need regx search string for paragraphs starting with lower case and paragraphs ending with "-" |
01-19-2017, 04:10 PM | #12 |
Enthusiast
Posts: 28
Karma: 27226
Join Date: May 2016
Device: Kobo glo hd
|
ok i found regex search string for paragraphs starting with lower case and paragraphs ending wit "-"
if any one need this is the string /^[a-zğüşıçö]|-$/gm |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
ABBYY FineReader - Proof reading tips? | PieOPah | Workshop | 23 | 03-02-2012 01:03 AM |
ABBYY Finereader and text formating | Student1 | Workshop | 6 | 12-15-2011 06:37 PM |
Abbyy FineReader Dictionaries | Mebyon | Workshop | 2 | 02-10-2010 02:57 PM |
ABBYY FineReader cannot see images | chinesealbumart | Workshop | 8 | 05-15-2009 11:03 PM |
Ended wanted: coupon code for Abbyy finereader | moz | Flea Market | 1 | 03-12-2008 02:10 AM |