Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Amazon Kindle

Notices

Reply
 
Thread Tools Search this Thread
Old 03-04-2011, 10:43 PM   #1
pietro99
Connoisseur
pietro99 has learned how to buy an e-book online
 
Posts: 55
Karma: 76
Join Date: Sep 2010
Location: Australia
Device: Kindle 3
Converting pdf for Kindle with Calibre

So far I have been lucky and found everything I want to read in .mobi format. But one book I can only find in pdf and I can't get Calibre to do it properly. (I have the latest update)

The resulting .mobi book is interspersed with unwanted stuff that appears freqently, like this:

54

9781416585855TEXT.indd 54

25/11/09 3:31:56 PM


Changing the various options in Calibre doesn't eradicate it, perhaps I am missing something. The 54 is the page number but can anyone suggest a way to get rid of this please?
pietro99 is offline   Reply With Quote
Old 03-05-2011, 12:11 PM   #2
bob_tm
Enthusiast
bob_tm began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Jan 2011
Device: Kindle 3 WiFi, Onyx M92
Quote:
Originally Posted by pietro99 View Post
The resulting .mobi book is interspersed with unwanted stuff that appears freqently, like this:

54

9781416585855TEXT.indd 54

25/11/09 3:31:56 PM


Changing the various options in Calibre doesn't eradicate it, perhaps I am missing something. The 54 is the page number but can anyone suggest a way to get rid of this please?
This should be doable using the regular expression replacement feature of Calibre (you can replace 3 expressions - here all of them should be replaced by the empty string). From the top of my head and from the example you have provided, I would guess the 3 expressions would be:

\[B\]\d+

\d+TEXT\.indd \d+

\d+\/\d+\/+d+ \d+:+d+:+d+ PM\[\/B\]

Since this isn't Perl (which is the variation of regexps I usually use), you may not have to put a "\" behind a "/" as I have done above. Try to experiment with these strings and if supported by Calibre, put "^" in front of the expressions to denote beginning of line and "\s*$" at the end of the expressions to denote end of line with possible trailing white space. If the date and time strings are the same in all instances of the unwanted strings, you can use the actual numbers rather than "\d+" (which denotes one or more digits).

Experimentation is the key here and you will learn how to do this. Regexps are great stuff, though looks like Greek to the uninitiated (except for the Greek uninitiated ).

-- bob_tm

Last edited by bob_tm; 03-05-2011 at 03:23 PM.
bob_tm is offline   Reply With Quote
Advert
Old 03-05-2011, 02:33 PM   #3
FF2
Wizard
FF2 ought to be getting tired of karma fortunes by now.FF2 ought to be getting tired of karma fortunes by now.FF2 ought to be getting tired of karma fortunes by now.FF2 ought to be getting tired of karma fortunes by now.FF2 ought to be getting tired of karma fortunes by now.FF2 ought to be getting tired of karma fortunes by now.FF2 ought to be getting tired of karma fortunes by now.FF2 ought to be getting tired of karma fortunes by now.FF2 ought to be getting tired of karma fortunes by now.FF2 ought to be getting tired of karma fortunes by now.FF2 ought to be getting tired of karma fortunes by now.
 
Posts: 1,105
Karma: 1025784
Join Date: Oct 2010
Device: WiFi Kindle3
I just tried converting some guide books - almost a complete disaster.

They use a format where there is kind of a basic details box containing a list at the 1/3 outer side of each page and then a more detailed full narrative text taking up 2/3. But it all gets mashed together into an unreadable mess.

I was curious and converted to epub - just as much a mess. So it is not only the kindle that suffers. pdf that are not just linear text just don't convert very well. (other readers besides the kindle may do a better job of showing the pdf natively (reflow)
FF2 is offline   Reply With Quote
Old 03-05-2011, 02:38 PM   #4
Donr
Junior Member
Donr began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Mar 2011
Device: Kindle 3
I also have been trying to convert pdf files to kindle format. I have emailed them to freekindle.com and put convert in the subject line. I have yet to receive a reply. I downloaded Calebre and it just transfered the pdf book to my kindle as a pdf. No kindle formating.
What am I doing wrong?
Donr is offline   Reply With Quote
Old 03-05-2011, 03:17 PM   #5
bob_tm
Enthusiast
bob_tm began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Jan 2011
Device: Kindle 3 WiFi, Onyx M92
Quote:
Originally Posted by Donr View Post
downloaded Calebre and it just transfered the pdf book to my kindle as a pdf. No kindle formating.
What am I doing wrong?
Run convert on the PDF to Mobi in Calibre and transfer the Mobi (this will not work on image based PDFs). The result is likely to suck unless you work on the conversion options (margins to handle line breaks and regexps to remove headers, footers and page numbers). If it is a multicolumn PDF or a PDF with boxes and drawings, you should not have high hopes of converting it to a readable Mobi.

bob_tm
bob_tm is offline   Reply With Quote
Advert
Old 03-05-2011, 03:31 PM   #6
susan_cassidy
Wizard
susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.
 
Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
Quote:
Originally Posted by Donr View Post
I also have been trying to convert pdf files to kindle format. I have emailed them to freekindle.com and put convert in the subject line. I have yet to receive a reply. I downloaded Calebre and it just transfered the pdf book to my kindle as a pdf. No kindle formating.
What am I doing wrong?
Did you enable the email address from which you sent the email on your "Manage Your Kindle" page? Otherwise, the email gets dropped.

I don't know what steps you took in Calibre to convert, so I can't help with that. Did you set the output format to .mobi? You have to tell it to convert, not just send, in case that's the problem.
susan_cassidy is offline   Reply With Quote
Old 03-05-2011, 03:32 PM   #7
mr ploppy
Feral Underclass
mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.mr ploppy ought to be getting tired of karma fortunes by now.
 
mr ploppy's Avatar
 
Posts: 3,622
Karma: 26821535
Join Date: Jan 2010
Location: Yorkshire, tha noz
Device: 2nd hand paperback
PDFtoEpub works better than Calibre, you can crop off things like page numbers before you start.
mr ploppy is offline   Reply With Quote
Old 03-05-2011, 04:33 PM   #8
pietro99
Connoisseur
pietro99 has learned how to buy an e-book online
 
Posts: 55
Karma: 76
Join Date: Sep 2010
Location: Australia
Device: Kindle 3
Quote:
Originally Posted by bob_tm View Post
This should be doable using the regular expression replacement feature of Calibre (you can replace 3 expressions - here all of them should be replaced by the empty string). From the top of my head and from the example you have provided, I would guess the 3 expressions would be:

\[B\]\d+

\d+TEXT\.indd \d+

\d+\/\d+\/+d+ \d+:+d+:+d+ PM\[\/B\]

Since this isn't Perl (which is the variation of regexps I usually use), you may not have to put a "\" behind a "/" as I have done above. Try to experiment with these strings and if supported by Calibre, put "^" in front of the expressions to denote beginning of line and "\s*$" at the end of the expressions to denote end of line with possible trailing white space. If the date and time strings are the same in all instances of the unwanted strings, you can use the actual numbers rather than "\d+" (which denotes one or more digits).

Experimentation is the key here and you will learn how to do this. Regexps are great stuff, though looks like Greek to the uninitiated (except for the Greek uninitiated ).

-- bob_tm
You are spot-on! It certainly looks like Greek when you start but I am starting to see how it works. I managed to get rid of page numbers 1-9 with \d but page numbers 10 onwards were still there. So I tried \ddd but that didn't work. What is the secret for that please?

I am slowly getting through the tutorial; just hope I have the patience.

Edit: just worked it out....\d\d\d for all the page numbers.

Last edited by pietro99; 03-05-2011 at 04:36 PM. Reason: update
pietro99 is offline   Reply With Quote
Old 03-05-2011, 08:46 PM   #9
pietro99
Connoisseur
pietro99 has learned how to buy an e-book online
 
Posts: 55
Karma: 76
Join Date: Sep 2010
Location: Australia
Device: Kindle 3
I've been spending a couple of hours with Calibre, and although I got the first line with the page number to work, I just can't get the other 2, and hoping bob_tm might help.

For the 2nd line:

9781416585855TEXT.indd 57<br>

I come up with:

\d+TEXT\.indd \s\ d+<br>

For the 3rd line:

25/11/09 3:31:56 PM<br>

I come up with:

25/11/09\s\d+\:d+\:d+ PM<br>

(the date stays the same each time)

Neither of these will work for me.
pietro99 is offline   Reply With Quote
Old 03-06-2011, 06:56 AM   #10
bob_tm
Enthusiast
bob_tm began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Jan 2011
Device: Kindle 3 WiFi, Onyx M92
Quote:
Originally Posted by pietro99 View Post
Edit: just worked it out....\d\d\d for all the page numbers.
I recommend

\d+

which means "one or more digits". That will cover all page numbers regardless of number of digits. As it stands, however, it will also get rid of alle numbers in the whole book, so it should be restricted using markers that makes the page numbers unique (like ^\s*\d+\s*$ which means a series of digits on its own on a line with possible white space before and after - also remember to add possible HTML tags that nmay surround the page number).

bob_tm
bob_tm is offline   Reply With Quote
Old 03-06-2011, 06:59 AM   #11
bob_tm
Enthusiast
bob_tm began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Jan 2011
Device: Kindle 3 WiFi, Onyx M92
Quote:
Originally Posted by pietro99 View Post
For the 2nd line:

9781416585855TEXT.indd 57<br>

I come up with:

\d+TEXT\.indd \s\ d+<br>

For the 3rd line:

25/11/09 3:31:56 PM<br>

I come up with:

25/11/09\s\d+\:d+\:d+ PM<br>

(the date stays the same each time)

Neither of these will work for me.
Please copy the source from the conversion windows (with surrounding HTML tags) and I'll see if I can help you out. The text as cited by you (as copied from the resulting document) may not be sufficient to use as a base regexps, as it hides the HTML tags.

bob_tm
bob_tm is offline   Reply With Quote
Old 03-06-2011, 03:46 PM   #12
pietro99
Connoisseur
pietro99 has learned how to buy an e-book online
 
Posts: 55
Karma: 76
Join Date: Sep 2010
Location: Australia
Device: Kindle 3
Quote:
Originally Posted by bob_tm View Post
Please copy the source from the conversion windows (with surrounding HTML tags) and I'll see if I can help you out. The text as cited by you (as copied from the resulting document) may not be sufficient to use as a base regexps, as it hides the HTML tags.

bob_tm
Thanks for your help. The actual text from the conversion window is:

being purposely obtuse?” she said.<br>
<i>71</i><br>
9781416585855TEXT.indd 71<br>
25/11/09 3:31:58 PM<br>
<hr>
<A name=79></a>“Obtuse is purposeful by defi nition,” Bernie said.<br>
pietro99 is offline   Reply With Quote
Old 03-06-2011, 04:04 PM   #13
bob_tm
Enthusiast
bob_tm began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Jan 2011
Device: Kindle 3 WiFi, Onyx M92
Quote:
Originally Posted by pietro99 View Post
Thanks for your help. The actual text from the conversion window is:

being purposely obtuse?” she said.<br>
<i>71</i><br>
9781416585855TEXT.indd 71<br>
25/11/09 3:31:58 PM<br>
<hr>
<A name=79></a>“Obtuse is purposeful by defi nition,” Bernie said.<br>

...

For the 2nd line:

9781416585855TEXT.indd 57<br>

I come up with:

\d+TEXT\.indd \s\ d+<br>

For the 3rd line:

25/11/09 3:31:56 PM<br>

I come up with:

25/11/09\s\d+\:d+\:d+ PM<br>

(the date stays the same each time)

Neither of these will work for me.
My mistake in the original post (som errors there - sorry).

9781416585855TEXT.indd 57<br>

You suggested:

\d+TEXT\.indd \s\ d+<br>

I suggest:
\d+TEXT\.indd\s+\d+<br>


For:

25/11/09 3:31:56 PM<br>

You suggested:

25/11/09\s\d+\:d+\:d+ PM<br>

I suggest (note the misplaced "\" above that originated from my typos):

25/11/09\s\d+:\d+:\d+\s+PM<br>

Sorry about this. Hopefully the regexps make more sense as written here (though they could be wrong too). All you should need here are normal text, \s and \d for white space and digit and the '+'-suffix to these in order to denote "one or more occurrences".

bob_tm
bob_tm is offline   Reply With Quote
Old 03-06-2011, 04:31 PM   #14
pietro99
Connoisseur
pietro99 has learned how to buy an e-book online
 
Posts: 55
Karma: 76
Join Date: Sep 2010
Location: Australia
Device: Kindle 3
Quote:
Originally Posted by bob_tm View Post
My mistake in the original post (som errors there - sorry).

9781416585855TEXT.indd 57<br>

You suggested:

\d+TEXT\.indd \s\ d+<br>

I suggest:
\d+TEXT\.indd\s+\d+<br>


For:

25/11/09 3:31:56 PM<br>

You suggested:

25/11/09\s\d+\:d+\:d+ PM<br>

I suggest (note the misplaced "\" above that originated from my typos):

25/11/09\s\d+:\d+:\d+\s+PM<br>

Sorry about this. Hopefully the regexps make more sense as written here (though they could be wrong too). All you should need here are normal text, \s and \d for white space and digit and the '+'-suffix to these in order to denote "one or more occurrences".

bob_tm
You are so quick!

We are getting there. The 2nd one found 311 instances but the 3rd one still doesn't find any.

EDIT: Got it! The 3rd line that works is:

25/11/09\s+\d+:\d+:\d+\s+PM<br>

The s becomes s+ as I think there are 2 spaces after the date.

That has been a most edifying experience. Thanks for all your input.

Last edited by pietro99; 03-06-2011 at 04:37 PM. Reason: Update
pietro99 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre not converting entire PDF book, HELP! chilady1 Calibre 4 09-21-2010 05:11 AM
Problem converting PDF to EPUB in calibre adgpro Calibre 2 07-09-2010 01:10 AM
Converting from PDF to ePub, Calibre not working Alda ePub 10 07-09-2010 01:00 AM
PRS-300 Converting PDF via Calibre for Reader 300 jamcoops Sony Reader 9 10-23-2009 06:59 PM
Converting PDF files in Calibre BJWanlund Calibre 0 12-07-2008 10:28 PM


All times are GMT -4. The time now is 12:59 PM.


MobileRead.com is a privately owned, operated and funded community.