09-15-2012, 12:43 PM | #16 |
Linux User
Posts: 2,279
Karma: 6123806
Join Date: Sep 2010
Location: Heidelberg, Germany
Device: none
|
totally off topic now but I wish that New Posts -> Your Posts link would show threads instead of posts.
Last edited by frostschutz; 09-15-2012 at 04:14 PM. |
09-19-2012, 06:34 PM | #17 | |
Biotechnologist
Posts: 38
Karma: 499330
Join Date: Jun 2009
Device: 1st Gen Kindle; Sony PRS-T1
|
Quote:
Thanks, Schauberger |
|
09-19-2012, 07:14 PM | #18 |
Linux User
Posts: 2,279
Karma: 6123806
Join Date: Sep 2010
Location: Heidelberg, Germany
Device: none
|
Open Office with http://extensions.services.openoffic...ject/pdfimport makes the text visible.
|
09-19-2012, 09:11 PM | #19 | |
Biotechnologist
Posts: 38
Karma: 499330
Join Date: Jun 2009
Device: 1st Gen Kindle; Sony PRS-T1
|
Quote:
Could you instruct me on what to do next ? Thanks, Schauberger |
|
09-20-2012, 08:12 AM | #20 |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
I used pdfclean from the MuPDF distribution to decompress the streams and then just edited the PDF file in a text editor to delete the bitmap and make the text visible. If you can be patient, I can probably write you a little Windows program that will do this automatically.
|
09-20-2012, 08:57 AM | #21 |
Linux User
Posts: 2,279
Karma: 6123806
Join Date: Sep 2010
Location: Heidelberg, Germany
Device: none
|
In OO/pdfimport you can do it by clicking the background image element and delete it with the del key. Cumbersome if you have lots of images, but at least you actually see what you are doing and what you're getting... and pdfimport might even be able to let you manually fix a typo here and there?
In vim you find the image element (it's the largest one with 500kb or whatever), note the line number where it starts, search for endstream, note the line number where it ends, and then you use the command :line1,line2d to delete the whole area. Then you save and the image is gone. I could write a script that does the vim step automatically - if I find the time, and if you use Linux/Python cmdline. I don't know what to edit to make the text visible though. You'd have to replace the font somehow but how? Might have a go at willus PDF later to see what he changed :P |
09-20-2012, 08:44 PM | #22 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
SeeText v1.01
Quote:
Usage: Save the attachment to your desktop, unzip it, and drag your PDF file onto the icon (works kind of like k2pdfopt) or just double-click the icon to see the command-line usage. Last edited by willus; 09-20-2012 at 11:18 PM. Reason: Clarification; updated program |
|
09-20-2012, 09:21 PM | #23 | |
Biotechnologist
Posts: 38
Karma: 499330
Join Date: Jun 2009
Device: 1st Gen Kindle; Sony PRS-T1
|
Quote:
Thanks alot! Schauberger |
|
09-20-2012, 11:12 PM | #24 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Mod to make PDF OCR text visible
Quote:
3 Tr "Tr" stands for Text Render, and 3 means invisible. If you change the 3 to a 0, the text becomes visible. The only tricky part is decompressing the stream since it's usually compressed. See p. 246 of Adobe's PDF Reference. |
|
09-20-2012, 11:17 PM | #25 |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Glad it worked. I made a little update to the program that should make the visible OCR text a little clearer--just re-download the same attachment from my earlier post.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
remove OCR from a PDF? | soondai | 9 | 10-08-2011 12:42 PM | |
How to convert an OCR file to a Non-OCR one | res9282 | 1 | 08-05-2011 05:58 AM | |
Backround color? | paquitz | Calibre | 3 | 11-21-2010 09:20 PM |
RFE: Remove remove tags in bulk edit | magphil | Calibre | 0 | 08-11-2009 10:37 AM |
White text on black backround? | Fingers | Which one should I buy? | 7 | 12-21-2007 12:19 PM |