Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 03-15-2011, 03:16 PM   #1
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
Need help with Abbyy Finereader 10 (linebreaks)

Hello.

Can someone help me? I have a problem with Abby Finereader 10 Professional Edition.

I started to scan a few pages from a book. But the problem is: the page ends, but the paragraph does not. It continues on the next page. However, Abbyy Finereader adds a linebreak.

Just to show what I mean, I add a picture from my Abbyy Screen: On top you see the original page of the book (instead of scanning, I tried it with a camera - it works) and below is the text that Abbyy made out of it.

you see, where it says "that is a long distance," ... this is the end of the page. But the sentence continues on the next page with "but even if ...". This happens on every page. Of course I tried to remove the linebreak manually, but I can't (or I haven't found out how).

Another thing is: this book has more then 400 pages. So even if I can manually remove the linebreak, it would be very tedious to do it on almost all 400 pages. So is there a way how I can tell Abbyy to remove this linebreak? Every new paragraph starts with a text intent, and Abbyy recognizes this, if it is on the middle of a page, but when it comes to a new page ... it's like every new page is a new document.

I hope someone can help me.
Attached Thumbnails
Click image for larger version

Name:	Scan 2.jpg
Views:	1315
Size:	167.8 KB
ID:	68400  

Last edited by NASCARaddicted; 03-15-2011 at 10:55 PM.
NASCARaddicted is offline   Reply With Quote
Old 03-16-2011, 04:22 AM   #2
Iain
Enthusiast
Iain began at the beginning.
 
Posts: 49
Karma: 14
Join Date: Jul 2010
Location: Harrogate, England
Device: iPad
Abby and Breaks

HI. I found the same thing.

My approach was to convert to Word 2007 and then process the word file to remove most of the incorrect line breaks whilst converting it to ePub format.

I'd be happy to let you have a copy of the program but there are two things you should be aware of. Firstly, it's currently running around 80% (that's a guess by the way). So it corrects 8 out of 10 of these spurious breaks. I need to go back and update it.

Secondly, it's been up on the blocks with it's wheels off (metaphorically) for the last 3 months or so waiting for me to finish off modifictions to identify chapter breaks better than FineReader does. Unfortunately a heap of other things have stopped me moving forward on this and I'm not sure when I'll get back to it.

If this is interesting let me know and I'll add you to my list of 'beta testers'.

It also helps sort out some of the hyphenation errors.


Iain
Iain is offline   Reply With Quote
Old 03-16-2011, 01:27 PM   #3
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
Quote:
Originally Posted by Iain View Post
HI. I found the same thing.

My approach was to convert to Word 2007 and then process the word file to remove most of the incorrect line breaks whilst converting it to ePub format.

I'd be happy to let you have a copy of the program but there are two things you should be aware of. Firstly, it's currently running around 80% (that's a guess by the way). So it corrects 8 out of 10 of these spurious breaks. I need to go back and update it.

Secondly, it's been up on the blocks with it's wheels off (metaphorically) for the last 3 months or so waiting for me to finish off modifictions to identify chapter breaks better than FineReader does. Unfortunately a heap of other things have stopped me moving forward on this and I'm not sure when I'll get back to it.

If this is interesting let me know and I'll add you to my list of 'beta testers'.

It also helps sort out some of the hyphenation errors.


Iain
Hello Ian

Thanks for your offer, I will give it a try.

But I am kinda surprised about Abbyy. It seems as if this is really not possible in FR10. I may be wrong, but I think I heard and saw something like: it was possible in FR8 and FR9. Can anyone confirm this?
NASCARaddicted is offline   Reply With Quote
Old 03-16-2011, 05:20 PM   #4
adv_dp_fan
Zealot
adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!
 
Posts: 103
Karma: 57138
Join Date: May 2010
Device: Sony 505, iPad 1 & 3, Galaxy Note 8.1
I use FR10 and yeah, sometimes it correctly detects that the paragraph continues on the next page and sometimes it doesn't. I've just taken to running back through the book after I'm done in an external editor with the physical copy checking the beginning of each paragraph to make sure they're correct and just use delete/space to correct them. As I do this as part of some other general proofing/cleanup of the final file it doesn't add much to my work flow.

This also lets me catch another error that sometimes pops up with FR10, the incorrect joining of multiple paragraphs into one in a page.

I use to use FR9 and it had the same issues.

All in all though, it gets most of the paragraphs correct for me.
adv_dp_fan is offline   Reply With Quote
Old 03-27-2011, 06:48 AM   #5
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
You could save it as HTML from Word (or FineReader), open it up in your favourite web browser and quickly pass it over. They'll stand out like nails, especially if you zoom out a bit. Then, when you find one, you simply go back to Word and add a line break (Shift+Enter) where you want. Easy.

Just don't leave spaces before a line break, though. Toggle this little guy before you check it:

... else there may be a few double spaces when (or if) you decide to convert it to ePUB or some other format.

Also, you could try clicking the middle mouse click when you view the HTML (in the web browser) and drag the mouse down so it will scroll automatically so your finger won't get tired. But not too fast so as to not miss any of them.

Last edited by DSpider; 03-27-2011 at 07:00 AM.
DSpider is offline   Reply With Quote
Old 03-27-2011, 07:58 AM   #6
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
You could also use a RegExp in Word for this. Search for a small letter or a hyphen combined with a line break and replace that line break with a space.
Toxaris is offline   Reply With Quote
Old 03-27-2011, 12:24 PM   #7
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
thanks guys for your help. Right now, before I convert from html to epub, I use a regex search. If there is a new paragraph, without the characters .!? in front of it, it is probably wrong, so finding it is not really a problem.

But I thought Abbyy 10 should be able to detect this by himself. It seems as if it ain't .... but you can't even do it manually, and that is what surprised me most ...
NASCARaddicted is offline   Reply With Quote
Old 03-27-2011, 02:27 PM   #8
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
I must say it surprises me. In a whole book it happens only in about 10 sentences for me. The rest is correctly identified by Abbyy.
Toxaris is offline   Reply With Quote
Old 01-19-2017, 12:30 PM   #9
pefilix
Enthusiast
pefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud of
 
Posts: 28
Karma: 27226
Join Date: May 2016
Device: Kobo glo hd
Quote:
Originally Posted by NASCARaddicted View Post
thanks guys for your help. Right now, before I convert from html to epub, I use a regex search. If there is a new paragraph, without the characters .!? in front of it, it is probably wrong, so finding it is not really a problem.

But I thought Abbyy 10 should be able to detect this by himself. It seems as if it ain't .... but you can't even do it manually, and that is what surprised me most ...
I am on same problem but i dont understand your solution. What is regex search and how to search for wrong paragraph areas?

Or any other solutions?
pefilix is offline   Reply With Quote
Old 01-19-2017, 01:08 PM   #10
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,394
Karma: 20212733
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
Here is a quick explanation from the MR wiki...but basically Regex is a way to put a variable(s) into a search and replace. for example:

find: "w.*?d" (without the quotes)
would find any word, or group of words that began with 'w' and ended with 'd'.
like: "word", "wad", "why are you a nerd"

It's very powerful when you learn how to use it properly!


Most text editing tools have some form of regex search/replace. I'm pretty sure Finereader does as well. However, you would need to check the finereader users manual to see specifically which commands it supports.

You can also search here at MR and you will find a lot of examples like this one that talks about this specific issue.

Last edited by Turtle91; 01-19-2017 at 01:15 PM.
Turtle91 is online now   Reply With Quote
Old 01-19-2017, 04:00 PM   #11
pefilix
Enthusiast
pefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud of
 
Posts: 28
Karma: 27226
Join Date: May 2016
Device: Kobo glo hd
hmmm thank you

now i need regx search string for

paragraphs starting with lower case

and

paragraphs ending with "-"
pefilix is offline   Reply With Quote
Old 01-19-2017, 04:10 PM   #12
pefilix
Enthusiast
pefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud ofpefilix has much to be proud of
 
Posts: 28
Karma: 27226
Join Date: May 2016
Device: Kobo glo hd
ok i found regex search string for paragraphs starting with lower case and paragraphs ending wit "-"

if any one need this is the string

/^[a-zğüşıçö]|-$/gm
pefilix is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
ABBYY FineReader - Proof reading tips? PieOPah Workshop 23 03-02-2012 01:03 AM
ABBYY Finereader and text formating Student1 Workshop 6 12-15-2011 06:37 PM
Abbyy FineReader Dictionaries Mebyon Workshop 2 02-10-2010 02:57 PM
ABBYY FineReader cannot see images chinesealbumart Workshop 8 05-15-2009 11:03 PM
Ended wanted: coupon code for Abbyy finereader moz Flea Market 1 03-12-2008 02:10 AM


All times are GMT -4. The time now is 09:01 AM.


MobileRead.com is a privately owned, operated and funded community.