Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 08-08-2019, 03:30 AM   #1
fredthefork
Junior Member
fredthefork began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Aug 2019
Device: Kindle Paperwhite
Cropping PDFs for EPUB conversion using BRISS, Ghostscript and/or Calibre

Hello! I'm new to this so please forgive me if this is basic knowledge.

I have a PDF file which is OCRed. I would like to convert it to epub. The main problem is that I'd like to crop my pdf so I do not have duplicate Headers or Page Numbers in my epub. I have tried first OSX's Preview, then Briss for that. I then tried to run it through calibre epub conversion. Didn'nt work. I then used ghostscript to extract the text:
Code:
gs -sDEVICE=txtwrite -o extractedText%d.txt input.pdf
- but this doesn't work either -still getting all the headers. Although the pdf is clearly cropped, the cropped content did not seem to get deleted permanently.

Then I read on here that

If you run the Briss PDF output through Ghostscript to generate a new PDF, I believe it will permanently get rid of the cropped-out material so that it won't come back in calibre.

This user suggested this command:
Code:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
. And although it does produce a pdf, running it through my first ghostscript command or through the standard calibre conversion is to no avail: Still get the headers & page numbers. I've also tried using different pdfs, just to be sure.

What am I missing here? This can't be so difficult, - can it?
fredthefork is offline   Reply With Quote
Old 08-09-2019, 12:36 AM   #2
asleyam
Enthusiast
asleyam is less competitive than you.asleyam is less competitive than you.asleyam is less competitive than you.asleyam is less competitive than you.asleyam is less competitive than you.asleyam is less competitive than you.asleyam is less competitive than you.asleyam is less competitive than you.asleyam is less competitive than you.asleyam is less competitive than you.asleyam is less competitive than you.
 
Posts: 31
Karma: 14720
Join Date: Mar 2016
Device: kindle voyage, Kobo Forma, Kobo Aura One
Have you tried ScanTailor? It is free and open source. I have a mac so I use ScanTailor via Crossover. Though if you have macports installed then ScanTailor is easy to install. Unfortunately Homebrew does not have a cask for it yet.

http://scantailor.org/

It is designed as a preprocessing tool so it works on batches of scanned images. If you already have a pdf then simply export the pages as images and enter them into ScanTailor. Then use the various settings to crop the headers and page numbers, deskew, set margins etc . It will output in Tif format.

There is no easy one click method that I have found to batch crop out extraneous material from scanned images,
asleyam is offline   Reply With Quote
Old 08-09-2019, 01:04 PM   #3
dwig
Wizard
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 1,574
Karma: 4190000
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
Quote:
Originally Posted by fredthefork View Post
Hello! I'm new to this so please forgive me if this is basic knowledge.

I have a PDF file which is OCRed. I would like to convert it to epub. ...
What am I missing here? This can't be so difficult, - can it?
Yes.

One, "cropping" tools like Briss don't delete anything. They just set a new page size for viewing. The old data is still there; it's just off the page and out of view.

Two, the PDF was OCRd before it was cropped. The headers and similar "junk" is still in the text layer from the OCR process and still "visible" to the format converter so it ends up in the ePub.

You might be more successful if you "crop" the PDF first and then to the OCR. This might prevent the OCR process from "seeing" the parts that were trimmed.
dwig is offline   Reply With Quote
Reply

Tags
briss, conversion from .pdf, ghostscript, pdf and calibre

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF cropping software: BRISS laborg PDF 314 06-18-2016 02:17 PM
Cropping PDFs romnempire PDF 2 04-11-2011 12:59 AM
briss PDF cropping software from MR featured on LH. Nexutix General Discussions 4 01-30-2011 12:17 AM
Cropping .pdfs with Briss and converting with Calibre mrslecavalier Amazon Kindle 6 07-13-2010 07:53 PM
Cropping PDFs harryo iRex 33 11-20-2009 10:41 AM


All times are GMT -4. The time now is 09:58 PM.


MobileRead.com is a privately owned, operated and funded community.