Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 09-14-2014, 01:02 AM   #1
highstream
Enthusiast
highstream began at the beginning.
 
Posts: 30
Karma: 10
Join Date: Jul 2012
Device: Kindle
Converting pdf to mobi question

calibre did a very effective job of converting a pdf to mobi, to read on a Kindle. A question: is there any way to edit out the text and image(?) between pages, as seen in the screenshot (the "Yuan Ban..." comes from using the translation option)? The number on the upper left - Di 2 - is the page number, but it's not important. Thanks,
Attached Thumbnails
Click image for larger version

Name:	Calibre - Graham Greene pdf example.JPG
Views:	174
Size:	115.6 KB
ID:	128321  

Last edited by highstream; 09-14-2014 at 01:04 AM.
highstream is offline   Reply With Quote
Old 09-14-2014, 01:13 AM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
It will require a bit of work:

Sticky: Read this before Posting PDF Questions
Quote:
There are page numbers, headers, or footers in my output
You need to use Calibre's Search and Replace feature when converting from pdf in order to remove any text you don't want. These require the use of a search syntax called regular expressions. If you are intimidated by regular expressions, many Windows users have reported that Mobipocket creator is a good alternative to use to do the initial pdf conversion. Use Mobipocket Creator to convert the pdf to the .mobi format, and then use Calibre to convert from mobi to your final desired format.

I cropped the headers/footers from my pdf with another tool, but Calibre still converts them
Most pdf cropping utilities only change the visible page boundaries of the pdf, they don't actually eliminate the text data.

You need to find a utility which both crops AND deletes hidden text. Very few tools do this - Adobe Acrobat has an option to 'remove hidden text' while optimizing pdfs which can facilitate this. The alternative is to use Calibre's search and replace function to delete the headers, or use Sigil after conversion to epub.
eschwartz is offline   Reply With Quote
Advert
Old 09-14-2014, 02:23 AM   #3
highstream
Enthusiast
highstream began at the beginning.
 
Posts: 30
Karma: 10
Join Date: Jul 2012
Device: Kindle
Thanks. My oversight. I had looked at the Search and Replace tab under Convert Books, but misunderstood the phrase "Regular Expressions," as it's a technical term for what in plain language means repeating expressions. That faq on PDFs helped, to a point.

I tried Mobipocket Creator, but it doesn't appear to recognize pdf's.

Ok, one of the new screenshots below shows how far I've come from the one in the OP; the other shows the underlying code in the PDF. The larger number is the chapter, which stays. The two pieces I've yet to figure out how to code as regular expressions are the small Chinese character and the number that follows (page), and the large graphic (below in the coding). For the first, getting rid of the Chinese character is no problem, but my attempts to get rid of the page numbers with brackets e.g., [0-9], have failed - and I'm afraid of that messing with the Chapter numbers. For the graphic, img src="index-1_1.jpg gets rid of one, but I'm not sure how to code an expression for all of them. Suggestions welcome. Thanks,
Attached Thumbnails
Click image for larger version

Name:	Calibre - Graham Greene pdf example after search and replace.JPG
Views:	134
Size:	22.3 KB
ID:	128323   Click image for larger version

Name:	Calibre - Graham Greene pdf example code.JPG
Views:	144
Size:	16.9 KB
ID:	128324  

Last edited by highstream; 09-14-2014 at 02:27 AM.
highstream is offline   Reply With Quote
Old 09-14-2014, 02:31 AM   #4
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Perhaps a regex tutorial can help you get more of an idea how to code them. I've always found this site to be extremely helpful. http://www.regular-expressions.info/

Code:
img src="index-\d+_\d+.jpg
will catch all the numbers.

\d represents a set of [0-9], or any number. The plus repeats it one or more times. \d+ represents any non-decimal number of arbitrary length.
eschwartz is offline   Reply With Quote
Old 09-14-2014, 02:34 AM   #5
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
If you copy-paste the whole header-footer code here, in [CODE][/CODE] tags, I can help you arrive at a suitable regex. I am not sure where the page numbers with brackets are, I don't see any in the screenshot.

Last edited by eschwartz; 09-14-2014 at 02:45 AM.
eschwartz is offline   Reply With Quote
Advert
Old 09-14-2014, 02:43 AM   #6
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,619
Karma: 29710338
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@highstream - FWIW - With Mobicreator don't use the File menu, start the program and drag drop the PDF onto the first screen, or click Import From Existing File->Adobe PDF.

BR
BetterRed is online now   Reply With Quote
Old 09-14-2014, 02:45 AM   #7
highstream
Enthusiast
highstream began at the beginning.
 
Posts: 30
Karma: 10
Join Date: Jul 2012
Device: Kindle
Thanks for the examples! The brackets are used in examples on both the regular expression faqs, before getting to more general expressions using \d and such.

Your image coding got rid of all those except the one on the cover page - no big deal. For the page numbers, following your example I tried "第 \d+" and that worked!

It'd be nice to get chapters coded so that they are recognized by the forward and back buttons on the Kindle, but I imagine that's asking too much given the current state of pdf to mobi conversion. Many thanks,

Last edited by highstream; 09-14-2014 at 02:58 AM.
highstream is offline   Reply With Quote
Old 09-14-2014, 02:57 AM   #8
highstream
Enthusiast
highstream began at the beginning.
 
Posts: 30
Karma: 10
Join Date: Jul 2012
Device: Kindle
BetterRed, Thanks, I missed that. Gave Mobicreator Adobe PDF conversion a try. Unless I'm missing something, in this case the result left a lot to be desired compared to a direct conversion with calibre.
highstream is offline   Reply With Quote
Old 09-14-2014, 03:06 AM   #9
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
You can manually build a ToC with the ToC editor. Once you have a ToC, the back/forth buttons will work.
eschwartz is offline   Reply With Quote
Old 09-14-2014, 03:15 AM   #10
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,619
Karma: 29710338
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by highstream View Post
BetterRed, Thanks, I missed that. Gave Mobicreator Adobe PDF conversion a try. Unless I'm missing something, in this case the result left a lot to be desired compared to a direct conversion with calibre.
@highstream - there's not much you can miss in MobiCreator, there are no tweaks or adjustments that I can think of - its a WYGIATI tool

I suspect Kovid has done quite a lot of work on the PDF-Input PI since that PDF Read This First sticky was written,

BR

WYGIATI - what you get is all there is
BetterRed is online now   Reply With Quote
Old 09-25-2014, 10:18 PM   #11
LadyKate
Fanatic
LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.LadyKate ought to be getting tired of karma fortunes by now.
 
Posts: 515
Karma: 1470724
Join Date: Jul 2013
Location: Quebec CA
Device: android 4 (samsung tablet and asus tablet)
I have found mobipocket creator to be fairly good at conversion and removal of headers and footers.

Due to the variety of ways pdf files are created, it is very likely that any pdf conversion will need to be "tweaked". The conversion program used is limited by the quality of the pdf that it is trying to convert.
LadyKate is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting a PDF to mobi and having it come out right? bizzybody Kindle Formats 7 08-12-2014 02:20 PM
Converting PDF to MOBI Killiney Colm Workshop 1 07-15-2012 09:59 AM
converting pdf to mobi BeccaPrice Conversion 2 01-03-2012 05:40 AM
Error converting pdf to mobi, and also chm to mobi Neo139 Conversion 10 08-12-2011 09:55 AM
Converting .html to .mobi Question gilvezan Conversion 1 05-30-2011 01:14 PM


All times are GMT -4. The time now is 01:58 AM.


MobileRead.com is a privately owned, operated and funded community.