03-06-2015, 07:02 AM | #1 |
Addict
Posts: 218
Karma: 29322
Join Date: Mar 2015
Location: Norway
Device: Android-phone, HTC Desire Z
|
Stripping a file from header text?
Hello,
I prefer reading epub over free websites, so for almost nothing I got this literary guide as an ebook file, but it turns out it was only delivered in pdf, so I'll have to convert it. But now there's many unwanted elements in the pdf. In the bottom there's the page number, on the right there's a logo, on the left there's a copyright text. I'd like to be left with only relevant text, so the conversion goes smoother (if smooth is a relevant word when dealing with pdf-files). Here's the file https://dl.dropboxusercontent.com/u/...animalfarm.pdf Do you know of an easy way to strip such pdf-files from unwanted elements? |
03-06-2015, 11:02 AM | #2 |
Banned
Posts: 488
Karma: 1080260
Join Date: Sep 2012
Device: sony prs t1 kindle dx ipad
|
Freeware apps: pdf scissors, briss, k2pdfopt.
http://sourceforge.net/projects/briss/ https://sites.google.com/site/pdfscissors/ http://www.willus.com/k2pdfopt/ k2pdfopt is more powerfull app but in your case it's better to use briss or scissors because they use intuitive visual rectangle boxes for cropping instead of commands. It takes about 10 seconds to load that 38 page file and ten more seconds for cropping if we know what to do. Since original cropped file is 15 cm wide we can also try to read it in landscape mode of 6" reader (12 cm width). To adjust that cropped pdf file for landscape viewing (if our reader is poor or slow at it) we can use k2pdfopt beforehand; just by loading the cropped pdf file into k2pdfopt and choosing fitwidth mode, and after couple of minutes we'll get the third file from attachment. If letters are very small (as in this case) we can use k2pdfopt's reflow mode instead (default mode, reflow box checked) getting the fourth file from attachment after a couple of minutes. Last edited by markom; 03-06-2015 at 12:38 PM. |
Advert | |
|
03-06-2015, 01:21 PM | #3 |
Addict
Posts: 218
Karma: 29322
Join Date: Mar 2015
Location: Norway
Device: Android-phone, HTC Desire Z
|
That's great. Thanks. Can you get the last step to run smoothly as well, from pdf to epub?
|
03-06-2015, 08:03 PM | #4 |
Fuzzball, the purple cat
Posts: 1,274
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
The latest release of k2pdfopt (v2.32, MS Windows version) now also can use visual rectangles to apply a cropping box.
|
03-06-2015, 08:17 PM | #5 | |
Banned
Posts: 488
Karma: 1080260
Join Date: Sep 2012
Device: sony prs t1 kindle dx ipad
|
Quote:
Here I used Abbyy 11 to quickly convert cropped animalfarm pdf to epub. Personally, I always use pdf optimization only, without conversion to epub, because I want the 100% exactness(at recognition and formating) for every sign, what is impossible to achieve automatically and quickly with epub (for the scanned material) because it reflows imperfect OCR text layer only (not pdf image itself), so I'd use reflowed pdf (with k2pdfopt) if letters were to small for comfortable reading on 6" screen. A5 sized pdf and 2-column A4 pdfs pose no problem even for 6" screen after cropping, so it's usually only one column A4 pdf that should be reflowed for reading on 6" reader in landscape (2 or 3 screens per pdf page). k2pdfopt app reflows pdf image itself, not just the imperfect OCR layer behind the image, it also retains pictures and tables unlike pdf reflow in e-readers and is usually a lot faster to flip through because e-reader's processor doesn't have to compute reflowing itself but just to show already reflowed page. Last edited by markom; 03-07-2015 at 02:07 AM. |
|
Advert | |
|
03-07-2015, 02:21 PM | #6 |
Addict
Posts: 218
Karma: 29322
Join Date: Mar 2015
Location: Norway
Device: Android-phone, HTC Desire Z
|
Beautiful. Thanks!
|
03-07-2015, 05:56 PM | #7 |
Addict
Posts: 218
Karma: 29322
Join Date: Mar 2015
Location: Norway
Device: Android-phone, HTC Desire Z
|
I just bought Hitchens Why Orwell Matters, and ran it through Briss, but this strange thing happened. This image–which was not visible when I viewed the original pdf file–suddenly popped up.
Here's the pdf file. Here's the epub. What I did was to open Briss, not have a rectangle at all for the first 3 pages, and then I ran it through Calibre. Does the same thing happen when you run it through your scissors? |
03-08-2015, 12:07 PM | #8 | |
Banned
Posts: 488
Karma: 1080260
Join Date: Sep 2012
Device: sony prs t1 kindle dx ipad
|
Quote:
When we crop or delete pdf page we don't actually genuinely crop or delete the original pdf page, but just mask it, telling the reader to show just part of the page or not to show it at all. Deleting a cropped part of the pdf page genuinely isn't easy i.e. straightforward thing even using Adobe Acrobat. Also as already mentioned, this pdf book as most of the belletristic there is A5 pdf format or smaller, so there is really no need for epub/mobi conversion because it is easily and quickly readable, searchable, annotateable, scribbleable etc. as cropped or zoomed pdf in landscape mode on 6" eink screen, two or three screens per page depending what size of letters we want, and we can get a lot bigger letter size than in the original paper book because 6" screen is 12 cm wide and the text width in this book is 10 cm without margins. Last edited by markom; 03-10-2015 at 05:40 AM. |
|
03-11-2015, 03:49 PM | #9 |
Addict
Posts: 218
Karma: 29322
Join Date: Mar 2015
Location: Norway
Device: Android-phone, HTC Desire Z
|
I read pretty much all my books on an old phone, with FBReader (not the latest edition, because the phone is too old to handle it), so that's why I prefer I epub.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
ePub->pdf: How to narrow space between header and book text? | EbokJunkie | Conversion | 17 | 01-07-2015 02:17 AM |
Getting text length from mobi header. | mattst | Kindle Formats | 7 | 03-29-2012 06:31 AM |
HTML input plugin stripping text within toc tags in child html file | nimblebooks | Conversion | 3 | 02-21-2012 03:24 PM |
HTML to ePub stripping out Content text | nimblebooks | Conversion | 6 | 02-01-2012 01:50 AM |
PDF Conversion - Removing Header / Footer Text | heb | Sony Reader | 9 | 07-11-2010 11:02 PM |