Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 09-15-2012, 12:43 PM   #16
frostschutz
Linux User
frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.
 
frostschutz's Avatar
 
Posts: 2,279
Karma: 6123806
Join Date: Sep 2010
Location: Heidelberg, Germany
Device: none
totally off topic now but I wish that New Posts -> Your Posts link would show threads instead of posts.

Last edited by frostschutz; 09-15-2012 at 04:14 PM.
frostschutz is offline   Reply With Quote
Old 09-19-2012, 06:34 PM   #17
Schauberger
Biotechnologist
Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.
 
Schauberger's Avatar
 
Posts: 38
Karma: 499330
Join Date: Jun 2009
Device: 1st Gen Kindle; Sony PRS-T1
Quote:
Originally Posted by willus View Post
Did you ever figure this out? What system are you running on? Mac? PC? I was able to make the OCR'd text visible and remove the bitmap from your PDF file using a couple tools that I have (see attached). The OCR is excellent. PDF X-change does a nice job.
Excellent, that's exactly what I was trying to achieve. I run Windows, how did you do this?

Thanks,
Schauberger
Schauberger is offline   Reply With Quote
Advert
Old 09-19-2012, 07:14 PM   #18
frostschutz
Linux User
frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.
 
frostschutz's Avatar
 
Posts: 2,279
Karma: 6123806
Join Date: Sep 2010
Location: Heidelberg, Germany
Device: none
Open Office with http://extensions.services.openoffic...ject/pdfimport makes the text visible.
frostschutz is offline   Reply With Quote
Old 09-19-2012, 09:11 PM   #19
Schauberger
Biotechnologist
Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.
 
Schauberger's Avatar
 
Posts: 38
Karma: 499330
Join Date: Jun 2009
Device: 1st Gen Kindle; Sony PRS-T1
Quote:
Originally Posted by frostschutz View Post
Sorry for my lack of reply. This forum makes it hard to follow threads you've replied to... (unless you want to be bombarded with notification mails)

Here's what you end up with when you remove the image from your sample.pdf. It's a blank page. But you can still select and copy text out of it. I haven't tested it on the eReader, but with some luck the text will show up when you use Reflow.

The original PDF is really just an image (2079x2840 px) with a text layer on top that uses an "invisible font". Not sure if it would be possible to make the font visible to get somewhat of an image in the original layout back - the result would not look good though.

What could be done is a resized image, since the current one is too large for eReaders. I attached that too. Of course the quality is horrible.

Resizing was done with GhostScript; removal of the image with qpdf (convert pdf to qdf) and vim. I'm sure there are better tools... but is this result useful at all?

If the goal is reflow you could just as well convert it to txt in the first place, as that's really all there is once you remove the image.
I've followed your directions by converting my docuement via qpdf, but now I'm unsure as to what to do next. I assume that I have to replace some text string in my file using vim, but I'm not sure.

Could you instruct me on what to do next ?

Thanks,
Schauberger
Schauberger is offline   Reply With Quote
Old 09-20-2012, 08:12 AM   #20
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by Schauberger View Post
Excellent, that's exactly what I was trying to achieve. I run Windows, how did you do this?

Thanks,
Schauberger
I used pdfclean from the MuPDF distribution to decompress the streams and then just edited the PDF file in a text editor to delete the bitmap and make the text visible. If you can be patient, I can probably write you a little Windows program that will do this automatically.
willus is offline   Reply With Quote
Advert
Old 09-20-2012, 08:57 AM   #21
frostschutz
Linux User
frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.
 
frostschutz's Avatar
 
Posts: 2,279
Karma: 6123806
Join Date: Sep 2010
Location: Heidelberg, Germany
Device: none
In OO/pdfimport you can do it by clicking the background image element and delete it with the del key. Cumbersome if you have lots of images, but at least you actually see what you are doing and what you're getting... and pdfimport might even be able to let you manually fix a typo here and there?

In vim you find the image element (it's the largest one with 500kb or whatever), note the line number where it starts, search for endstream, note the line number where it ends, and then you use the command :line1,line2d to delete the whole area. Then you save and the image is gone.

I could write a script that does the vim step automatically - if I find the time, and if you use Linux/Python cmdline.

I don't know what to edit to make the text visible though. You'd have to replace the font somehow but how? Might have a go at willus PDF later to see what he changed :P
frostschutz is offline   Reply With Quote
Old 09-20-2012, 08:44 PM   #22
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
SeeText v1.01

Quote:
Originally Posted by Schauberger View Post
Excellent, that's exactly what I was trying to achieve. I run Windows, how did you do this?

Thanks,
Schauberger
Try the attached program. If you don't need the converted files to be compatible with Adobe Reader, you can make the output files much smaller by removing the bitmaps (the converted files will still be readable with SumatraPDF, for example).

Usage: Save the attachment to your desktop, unzip it, and drag your PDF file onto the icon (works kind of like k2pdfopt) or just double-click the icon to see the command-line usage.
Attached Files
File Type: zip seetext_v101.zip (141.2 KB, 357 views)

Last edited by willus; 09-20-2012 at 11:18 PM. Reason: Clarification; updated program
willus is offline   Reply With Quote
Old 09-20-2012, 09:21 PM   #23
Schauberger
Biotechnologist
Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.Schauberger ought to be getting tired of karma fortunes by now.
 
Schauberger's Avatar
 
Posts: 38
Karma: 499330
Join Date: Jun 2009
Device: 1st Gen Kindle; Sony PRS-T1
Quote:
Originally Posted by willus View Post
Try the attached program. If you stay don't have to have Adobe compatibility, it can make the output files much smaller by removing the bitmaps. Save the attachment to your desktop, unzip it, and drag your PDF file onto the icon (works kind of like k2pdfopt).
Fantastic, that worked perfectly, removing the bitmap, rendering the text visible, and reducing the size of my file considerably.
Thanks alot!
Schauberger
Schauberger is offline   Reply With Quote
Old 09-20-2012, 11:12 PM   #24
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Mod to make PDF OCR text visible

Quote:
Originally Posted by frostschutz View Post
I don't know what to edit to make the text visible though. You'd have to replace the font somehow but how? Might have a go at willus PDF later to see what he changed :P
If you decompress the PDF object stream that renders the OCR text, you'll see this line (or these two tokens) near the beginning of the stream:

3 Tr

"Tr" stands for Text Render, and 3 means invisible. If you change the 3 to a 0, the text becomes visible. The only tricky part is decompressing the stream since it's usually compressed. See p. 246 of Adobe's PDF Reference.
willus is offline   Reply With Quote
Old 09-20-2012, 11:17 PM   #25
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by Schauberger View Post
Fantastic, that worked perfectly, removing the bitmap, rendering the text visible, and reducing the size of my file considerably.
Thanks alot!
Schauberger
Glad it worked. I made a little update to the program that should make the visible OCR text a little clearer--just re-download the same attachment from my earlier post.
willus is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
remove OCR from a PDF? soondai PDF 9 10-08-2011 12:42 PM
How to convert an OCR file to a Non-OCR one res9282 PDF 1 08-05-2011 05:58 AM
Backround color? paquitz Calibre 3 11-21-2010 09:20 PM
RFE: Remove remove tags in bulk edit magphil Calibre 0 08-11-2009 10:37 AM
White text on black backround? Fingers Which one should I buy? 7 12-21-2007 12:19 PM


All times are GMT -4. The time now is 11:40 AM.


MobileRead.com is a privately owned, operated and funded community.