|
|
Thread Tools | Search this Thread |
04-23-2011, 05:44 PM | #1 |
Enthusiast
Posts: 34
Karma: 10
Join Date: Apr 2011
Location: Berlin, Germany
Device: Android Tablets
|
OCR software/Abbyy Finereader-Highlighting –Export pdf w.notes, highlighted passages
I just thought I should open a new thread assembling all the problems I have encountered with reading texts that have been converted by OCR software on my eBook reader. In my case, the OCR software is Abbyy Finereader and the eBook reader is a Sony PRS-650
1) If the Sony reader has to display pdf files that have been converted with OCR from image to text, one can view first a jpg layer image then a text layer and finally, though not always, but very often, a blank page. This peculiar succession of pages applies to the whole books, and as a result, in a book with 200 pages, you have to turn over all in all up to 600 pages – for the reader a rather inconvenient and cumbersome experience. Is there a way to solve this problem? To my mind, the best possible solution would probably be to let the reader decide which one of the two layers he wants to see. See also https://www.mobileread.com/forums/sho...d.php?t=105696 2) When I mark a small passage of an article or a book that has been converted with OCR from image to text it seems as if I highlight the whole page instead of the particular sentence I intended to highlight. In the Reader Library, however, the highlighted passage can be viewed quite in the way I wanted it to be, yet in the Reader itself the whole page from the particular sentence upwards is colored in a dark grey. What causes this problem? And is there anything one can do to prevent it? 3) The Reader Library does allow for the viewing of the text with all the marks, bookmarks and annotations. But they won't show up in Adobe Acrobat X or in any other program, as they are saved separately as XML files. And unfortunately, there is still no support for exporting the notes via calibre. As a consequence, there is no way of doing a backup of the texts together with all the notes, not to mention a possibility of going on to work with the books and articles one has already read using a different kind of software less cumbersome than the Reader Library. As far as I am concerned, after a series of freezes (probably due to my SD card) I already had to erase the memory of my reader, thereby losing all my notes and highlighting marks. Although I am certainly very happy to have this feature, I have to say that I am not very happy with the way how notes and highlights are actually exported. I think the user needs to have some say in it, for instance he should be able to decide if he wants to have the data in each case to be accompanied by the exact date when the note was taken and the text highlighted or if had rather not. And furthermore, he should be the one to decide if the text and the corresponding notes begin with page 1 or rather with page 371 or page 71 (as might be the case for some articles). Is there any possibility to export not only the highlighted parts but the whole text, i.e. a pdf file, with all the highlights and all the notes? There has already been a discussion on this subject: https://www.mobileread.com/forums/showthread.php?t=33199 https://www.mobileread.com/forums/sho...d.php?t=109939 https://www.mobileread.com/forums/sho...33#post1236133 And here too: https://www.mobileread.com/forums/sho...54&postcount=2 https://www.mobileread.com/forums/sho...t=55079&page=6 wonderose Last edited by wonderose; 05-01-2011 at 12:00 AM. |
04-23-2011, 05:58 PM | #2 |
Wizard
Posts: 2,888
Karma: 5875940
Join Date: Dec 2007
Device: PRS505, 600, 350, 650, Nexus 7, Note III, iPad 4 etc
|
Don't save the jpg image of the page...
|
Advert | |
|
04-25-2011, 04:48 AM | #3 |
PRS+ author
Posts: 1,637
Karma: 2446233
Join Date: Dec 2007
Device: Sony PRS-300, 505, 600, 650, 950
|
A bit offtopic, wondering, what happens to handwritten remarks when you change zoom level? (in EPUBs/LRFs in particular)
|
04-25-2011, 05:39 PM | #4 |
Enthusiast
Posts: 34
Karma: 10
Join Date: Apr 2011
Location: Berlin, Germany
Device: Android Tablets
|
As for the question about the awkward reading experience with a two layered pdf file, there is certainly the option of converting the initial pdf file to an epub via html.
While gaining a nice text flow, there is, however, a drawback: one would lose the ability to verify the converted text by comparing it to the original (in case there have been errors during the conversion process). Anyhow, to do this in one title or one file would be, I think, quite nice. Even more important is, to my mind, the fact that with the epub file there also goes the ability to cite the right page number – and that is something I consider, for my rather academic purposes, as quite important. Especially if I highlight some passages of the text, take notes and, eventually, export them via Reader Library. Then, in particular, the epub should take into account the many peculiar exceptions, i.e. the Roman numeral page numbers of a preface or the articles that begin at p. 137 and not at p. 1. But perhaps there is a solution to this problem that I haven’t seen. Last edited by wonderose; 04-25-2011 at 05:43 PM. |
04-27-2011, 10:41 PM | #5 |
Enthusiast
Posts: 34
Karma: 10
Join Date: Apr 2011
Location: Berlin, Germany
Device: Android Tablets
|
My thanks to elcreative! As he suggested, it is indeed possible to just save the text layer. I initially had dismissed this possibility, because Acrobat X obviously doesn’t offer it (http://forums.adobe.com/thread/546374?tstart=0). But Abbyy Finereader does. I had overlooked it, even though it figures plainly and squarely in the options of the “save as” menu.
As for the export of annotations, highlights and bookmarks, I have found out that it is possible to save documents with all the notes on your computer using Adobe Digital Editions. And yet this again is a dead end. From there no handing over to Acrobat X, and no printing as PDF file. You are stuck with ADE, which is for my needs far from comfortable (i.e. no highlighting, no copy and paste). Even less so then Sony’s Reader Library. But at least, with ADE, it is possible to create a backup copy on the desktop. A good thing, since I once had to erase all the files of my Sony PRS-650, and though saving the XML files, I wasn’t able to reconstitute the lost documents with my notes. I now have a rather new question regarding page numbers of pdf files. When I export my notes via Sony’s Reader Library I have to rely on the correct page numbers. This is rather important for citation purposes. Yet the Sony PRS-650 does not take into account the page numbers I assign to my documents using Acrobat X. Thus, an article on the reader would not start with the correct page 272, but with page 1. As a result, the page numbers appearing in the exported notes differ quite a lot from those in the original article. In contrast to Sony’s Reader Library, Adobe Digital Editions does, however, show the correct page numbers as well as all the notes I have taken. But to my great disappointment it wouldn’t allow me to export them. The only solution seems to be to fill in blank pages with Acrobat X, to be exact: 271 of them. Or is there any other remedy you could think of? |
Advert | |
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Need help with Abbyy Finereader 10 (linebreaks) | NASCARaddicted | Workshop | 11 | 01-19-2017 04:10 PM |
ABBYY Finereader and text formating | Student1 | Workshop | 6 | 12-15-2011 06:37 PM |
DX and PDF highlighting¬es | pavelh | Amazon Kindle | 2 | 02-22-2011 12:26 PM |
Abbyy FineReader Dictionaries | Mebyon | Workshop | 2 | 02-10-2010 02:57 PM |
ABBYY FineReader cannot see images | chinesealbumart | Workshop | 8 | 05-15-2009 11:03 PM |