PDF conversion for Kindle

remltr · 06-11-2011, 07:47 PM

Option 1 - Converting pdf to mobi:

Did a search but did not find the specific answer to my question.

I did a search and replace of the page headers when I did the conversion and that helped although I was not able to strip the page numbers themselves. Since the page format does not fit the Kindle perfectly, I get page numbers randomly in the top, middle or wherever on the screen. I would like to get rid of them. Any help there would be appreciated.

Option 2 - Just using the pdf:

Sent the original pdf with drm removed, the reason for using calibre in the first place, and it looked pretty good right out the gate. Until that is, you turn to the following pages of any chapter. The first page of each chapter looks great, all subsequent pages are much smaller text with about 1/3 page width margins left and right. So you are looking at a narrow column of very small text. Then the next chapter first page looks great again. Not sure that this is calibre issue or not. The only weird thing is that the page headers and the page numbers don't display in the pdf format, which is what I wanted in the first place.

I know from reading the pdf sticky that the pdf format is the hardest to deal with, but if someone has some information on how to fix either of these issues, that would be great as I can work with either format.

The converted to mobi format at the moment I guess is the least irritating option, just would be nice to remove those page numbers.

Thanks

remltr · 06-11-2011, 09:43 PM

OK I found the string of code that generates the page numbers and it looks like this 5 . This would be page 5.

Is there a wild card character(s) that I can substitute for the number in the search and replace function that would find all the numbers and then I could replace them with nothing?

I tried # and ? and it didn't return any results in the wizard. I don't understand the code so I am just shooting in the dark here.

ldolse · 06-11-2011, 10:14 PM

Apparently you didn't read the stickies in full - the PDF FAQ includes links to both the search and replace tutorial and the regular expression tutorial:
Search & Replace: https://www.mobileread.com/forums/sho...d.php?t=118570
Regular Expressions: https://www.mobileread.com/forums/sho...d.php?t=118569

Anway what you want to wildcard any numbers is \d+ e.g.:

Code:

<i>\d+</i>\s*<br>

remltr · 06-11-2011, 11:34 PM

ldolse,

That worked great.

I did kind of look over the links that you referenced previously, but not totally understanding the code, my brain went toes up on it.

Looking at the code you provided and going back and reviewing the links again I am still a little confused as to how it works.

What I think I understand:

\d+ = numbers
\s* = white space
 = line break

What I don't understand: what are these symbols searching or are they just formatting code?




Thanks for the quick answer.

ldolse · 06-12-2011, 02:18 AM

Glad it worked. is part of the html markup - that tells the rendering device to use italics when rendering the text between the opening and closing tag.

Bookworm80 · 06-14-2011, 06:24 PM

I just wanted to report that I downloaded a PDF from a publisher's Website, of a book I had seen mentioned on "Books on the Knob." It didn't work to send the PDF from my@free.kindle.com email, but it worked flawlessly by placing the PDF in my Calibre library. Not only that, but I downloaded the cover and metadata.
Calibre is amazing!

remltr · 06-15-2011, 08:22 AM

I have other books in a series that relates to my original post and they have changed the format of the headers and I can't figure out how to remove them.

This is an example of the title header (the N equals the exact character position of the title) on the odd numbered pages.

THE NNNNN NNN / 3

This is what I find when I view the code via the search & replace feature

<A name=9></a>W3888-Nnnnn Nnn 8/25/03 10:23 AM Page 3 
N N N N N N N N N N N 
/ 3 

And this one is the author info on the even numbered pages.

4 / NNNNNNN NNNNNNNN

This is what I find when I view the code via the search & replace feature

<A name=10></a>W3888-Nnnnn Nnn 8/25/03 10:23 AM Page 4 
4 
/ N N N N N N N N N N N N N N N 

I tried the tags previously described, but they are not working in this situation.

Can you assist me one more time please?

Thank you.

remltr · 06-15-2011, 11:57 PM

OK - I have figured out most of this on my own, but the way the page numbers are done is kind of insidious.

There are letter spaces between each number that is greater than one digit.

What I am going to show below is only the variable page number portion of the string being searched. The rest remains the same throughout the search, so need to muddy up what I am trying to point out.

This string \d will find pages 1 thru 9

This string \d+\s\d+ will find pages 1 0 thru 9 9 (notice the letter space)

This string \d+\s\d+\s\d+ will find pages 1 0 0 thru the end of the book

Granted I don't the + after the d, for this particular pattern, but it doesn't change the outcome with or without it.

Is there a way to put this all together in one search string? It would not be a problem if the search on each page was the same. But the even number pages include the author name, while the odd number pages include the title name. So that means it will take six search strings to clear it all out. Three searches for odd, complete the conversion from pdf to mobi, then reopen the newly created mobi and then do three searches for the even, finally converting once again to mobi. To bad there are not six search dialog windows.

ldolse · 06-16-2011, 01:41 AM

You can use parentheses with the '|' character:

Code:

(\d+|\d+\s\d+|\d+\s\d+\s\d+)

You could also probably simplify it further by using this regex:

Code:

(\d+\s?)+

Don't forget to include or <a name=\d+> somewhere in the expression so that numbers in the middle of the text don't get eaten.

remltr · 06-16-2011, 09:53 PM

Quote:

Originally Posted by ldolse

You can use parentheses with the '|' character:

Code:

(\d+|\d+\s\d+|\d+\s\d+\s\d+)

You could also probably simplify it further by using this regex:

Code:

(\d+\s?)+

Don't forget to include or <a name=\d+> somewhere in the expression so that numbers in the middle of the text don't get eaten.

This worked pretty well: (\d+\s?)+

I still needed to go in clean out the pages where a chapter started, but those did not have letter spacing between the digits and had a different string of characters surrounding them. In the end I still needed to do a couple of conversions, but your assistance made it much easier.

I am saving the searches that worked in a spreadsheet for future reference as there are about 12 books in this authors series. Maybe they will come in handy, if they don't keep changing the structure of the page titles.

Now I need to clear out all the useless searches I created that Calibre saves so I can have a cleaner drop down. Just need to figure out what file that info is stored in.

ldolse thanks once again for your help. I believe that I am understanding the structure for the search tags a little better and hopefully can carry on on my own.

06-11-2011, 07:47 PM	#1
remltr Junior Member Posts: 7 Karma: 10 Join Date: Jun 2011 Device: Kindle	PDF conversion for Kindle Option 1 - Converting pdf to mobi: Did a search but did not find the specific answer to my question. I did a search and replace of the page headers when I did the conversion and that helped although I was not able to strip the page numbers themselves. Since the page format does not fit the Kindle perfectly, I get page numbers randomly in the top, middle or wherever on the screen. I would like to get rid of them. Any help there would be appreciated. Option 2 - Just using the pdf: Sent the original pdf with drm removed, the reason for using calibre in the first place, and it looked pretty good right out the gate. Until that is, you turn to the following pages of any chapter. The first page of each chapter looks great, all subsequent pages are much smaller text with about 1/3 page width margins left and right. So you are looking at a narrow column of very small text. Then the next chapter first page looks great again. Not sure that this is calibre issue or not. The only weird thing is that the page headers and the page numbers don't display in the pdf format, which is what I wanted in the first place. I know from reading the pdf sticky that the pdf format is the hardest to deal with, but if someone has some information on how to fix either of these issues, that would be great as I can work with either format. The converted to mobi format at the moment I guess is the least irritating option, just would be nice to remove those page numbers. Thanks Last edited by remltr; 06-11-2011 at 07:49 PM.

06-11-2011, 09:43 PM	#2
remltr Junior Member Posts: 7 Karma: 10 Join Date: Jun 2011 Device: Kindle	OK I found the string of code that generates the page numbers and it looks like this <i>5</i><br>. This would be page 5. Is there a wild card character(s) that I can substitute for the number in the search and replace function that would find all the numbers and then I could replace them with nothing? I tried # and ? and it didn't return any results in the wizard. I don't understand the code so I am just shooting in the dark here.

06-11-2011, 10:14 PM	#3
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	Apparently you didn't read the stickies in full - the PDF FAQ includes links to both the search and replace tutorial and the regular expression tutorial: Search & Replace: https://www.mobileread.com/forums/sho...d.php?t=118570 Regular Expressions: https://www.mobileread.com/forums/sho...d.php?t=118569 Anway what you want to wildcard any numbers is \d+ e.g.: Code: <i>\d+</i>\s*<br>

06-11-2011, 11:34 PM	#4
remltr Junior Member Posts: 7 Karma: 10 Join Date: Jun 2011 Device: Kindle	ldolse, That worked great. I did kind of look over the links that you referenced previously, but not totally understanding the code, my brain went toes up on it. Looking at the code you provided and going back and reviewing the links again I am still a little confused as to how it works. What I think I understand: \d+ = numbers \s* = white space <br> = line break What I don't understand: what are these symbols searching or are they just formatting code? <i> </i> Thanks for the quick answer.

06-12-2011, 02:18 AM	#5
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	Glad it worked. <i> </i> is part of the html markup - that tells the rendering device to use italics when rendering the text between the opening and closing tag.

06-14-2011, 06:24 PM	#6
Bookworm80 Junior Member Posts: 3 Karma: 10 Join Date: May 2011 Device: Kindle 3	I just wanted to report that I downloaded a PDF from a publisher's Website, of a book I had seen mentioned on "Books on the Knob." It didn't work to send the PDF from my@free.kindle.com email, but it worked flawlessly by placing the PDF in my Calibre library. Not only that, but I downloaded the cover and metadata. Calibre is amazing!

06-15-2011, 08:22 AM	#7
remltr Junior Member Posts: 7 Karma: 10 Join Date: Jun 2011 Device: Kindle	I have other books in a series that relates to my original post and they have changed the format of the headers and I can't figure out how to remove them. This is an example of the title header (the N equals the exact character position of the title) on the odd numbered pages. THE NNNNN NNN / 3 This is what I find when I view the code via the search & replace feature <A name=9></a>W3888-Nnnnn Nnn 8/25/03 10:23 AM Page 3<br> N N N N N N N N N N N<br> / 3<br> And this one is the author info on the even numbered pages. 4 / NNNNNNN NNNNNNNN This is what I find when I view the code via the search & replace feature <A name=10></a>W3888-Nnnnn Nnn 8/25/03 10:23 AM Page 4<br> 4<br> / N N N N N N N N N N N N N N N<br> I tried the tags previously described, but they are not working in this situation. Can you assist me one more time please? Thank you. Last edited by remltr; 06-15-2011 at 09:03 AM.

06-15-2011, 11:57 PM	#8
remltr Junior Member Posts: 7 Karma: 10 Join Date: Jun 2011 Device: Kindle	OK - I have figured out most of this on my own, but the way the page numbers are done is kind of insidious. There are letter spaces between each number that is greater than one digit. What I am going to show below is only the variable page number portion of the string being searched. The rest remains the same throughout the search, so need to muddy up what I am trying to point out. This string \d will find pages 1 thru 9 This string \d+\s\d+ will find pages 1 0 thru 9 9 (notice the letter space) This string \d+\s\d+\s\d+ will find pages 1 0 0 thru the end of the book Granted I don't the + after the d, for this particular pattern, but it doesn't change the outcome with or without it. Is there a way to put this all together in one search string? It would not be a problem if the search on each page was the same. But the even number pages include the author name, while the odd number pages include the title name. So that means it will take six search strings to clear it all out. Three searches for odd, complete the conversion from pdf to mobi, then reopen the newly created mobi and then do three searches for the even, finally converting once again to mobi. To bad there are not six search dialog windows.

06-16-2011, 01:41 AM	#9
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	You can use parentheses with the '\|' character: Code: (\d+\|\d+\s\d+\|\d+\s\d+\s\d+) You could also probably simplify it further by using this regex: Code: (\d+\s?)+ Don't forget to include <br> or <a name=\d+> somewhere in the expression so that numbers in the middle of the text don't get eaten. Last edited by ldolse; 06-16-2011 at 01:43 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Content DjVu to PDF Conversion for Kindle	stvs	Amazon Kindle	10	12-10-2011 08:21 AM
Test PDF -> Kindle conversion?	redrob2	Amazon Kindle	2	01-14-2011 07:58 AM
PDF to kindle(Mobi) conversion	ephzee	Calibre	0	04-19-2010 10:58 AM
PDF Conversion For Use With Kindle 2	Bluesman7	Calibre	3	04-11-2010 11:41 AM
PDF Kindle Conversion troubleshooting	cepino	Reading and Management	7	09-07-2009 02:18 PM

Advert

Advert