Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 06-11-2011, 07:47 PM   #1
remltr
Junior Member
remltr began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jun 2011
Device: Kindle
PDF conversion for Kindle

Option 1 - Converting pdf to mobi:

Did a search but did not find the specific answer to my question.

I did a search and replace of the page headers when I did the conversion and that helped although I was not able to strip the page numbers themselves. Since the page format does not fit the Kindle perfectly, I get page numbers randomly in the top, middle or wherever on the screen. I would like to get rid of them. Any help there would be appreciated.

Option 2 - Just using the pdf:

Sent the original pdf with drm removed, the reason for using calibre in the first place, and it looked pretty good right out the gate. Until that is, you turn to the following pages of any chapter. The first page of each chapter looks great, all subsequent pages are much smaller text with about 1/3 page width margins left and right. So you are looking at a narrow column of very small text. Then the next chapter first page looks great again. Not sure that this is calibre issue or not. The only weird thing is that the page headers and the page numbers don't display in the pdf format, which is what I wanted in the first place.

I know from reading the pdf sticky that the pdf format is the hardest to deal with, but if someone has some information on how to fix either of these issues, that would be great as I can work with either format.

The converted to mobi format at the moment I guess is the least irritating option, just would be nice to remove those page numbers.

Thanks

Last edited by remltr; 06-11-2011 at 07:49 PM.
remltr is offline   Reply With Quote
Old 06-11-2011, 09:43 PM   #2
remltr
Junior Member
remltr began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jun 2011
Device: Kindle
OK I found the string of code that generates the page numbers and it looks like this <i>5</i><br>. This would be page 5.

Is there a wild card character(s) that I can substitute for the number in the search and replace function that would find all the numbers and then I could replace them with nothing?

I tried # and ? and it didn't return any results in the wizard. I don't understand the code so I am just shooting in the dark here.
remltr is offline   Reply With Quote
Advert
Old 06-11-2011, 10:14 PM   #3
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Apparently you didn't read the stickies in full - the PDF FAQ includes links to both the search and replace tutorial and the regular expression tutorial:
Search & Replace: https://www.mobileread.com/forums/sho...d.php?t=118570
Regular Expressions: https://www.mobileread.com/forums/sho...d.php?t=118569

Anway what you want to wildcard any numbers is \d+ e.g.:
Code:
<i>\d+</i>\s*<br>
ldolse is offline   Reply With Quote
Old 06-11-2011, 11:34 PM   #4
remltr
Junior Member
remltr began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jun 2011
Device: Kindle
ldolse,

That worked great.

I did kind of look over the links that you referenced previously, but not totally understanding the code, my brain went toes up on it.

Looking at the code you provided and going back and reviewing the links again I am still a little confused as to how it works.

What I think I understand:

\d+ = numbers
\s* = white space
<br> = line break

What I don't understand: what are these symbols searching or are they just formatting code?

<i>
</i>

Thanks for the quick answer.
remltr is offline   Reply With Quote
Old 06-12-2011, 02:18 AM   #5
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Glad it worked. <i> </i> is part of the html markup - that tells the rendering device to use italics when rendering the text between the opening and closing tag.
ldolse is offline   Reply With Quote
Advert
Old 06-14-2011, 06:24 PM   #6
Bookworm80
Junior Member
Bookworm80 began at the beginning.
 
Posts: 3
Karma: 10
Join Date: May 2011
Device: Kindle 3
I just wanted to report that I downloaded a PDF from a publisher's Website, of a book I had seen mentioned on "Books on the Knob." It didn't work to send the PDF from my@free.kindle.com email, but it worked flawlessly by placing the PDF in my Calibre library. Not only that, but I downloaded the cover and metadata.
Calibre is amazing!
Bookworm80 is offline   Reply With Quote
Old 06-15-2011, 08:22 AM   #7
remltr
Junior Member
remltr began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jun 2011
Device: Kindle
I have other books in a series that relates to my original post and they have changed the format of the headers and I can't figure out how to remove them.

This is an example of the title header (the N equals the exact character position of the title) on the odd numbered pages.

THE NNNNN NNN / 3

This is what I find when I view the code via the search & replace feature

<A name=9></a>W3888-Nnnnn Nnn 8/25/03 10:23 AM Page 3<br>
N N N N N N N N N N N<br>
/ 3<br>

And this one is the author info on the even numbered pages.

4 / NNNNNNN NNNNNNNN

This is what I find when I view the code via the search & replace feature

<A name=10></a>W3888-Nnnnn Nnn 8/25/03 10:23 AM Page 4<br>
4<br>
/ N N N N N N N N N N N N N N N<br>

I tried the tags previously described, but they are not working in this situation.

Can you assist me one more time please?

Thank you.

Last edited by remltr; 06-15-2011 at 09:03 AM.
remltr is offline   Reply With Quote
Old 06-15-2011, 11:57 PM   #8
remltr
Junior Member
remltr began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jun 2011
Device: Kindle
OK - I have figured out most of this on my own, but the way the page numbers are done is kind of insidious.

There are letter spaces between each number that is greater than one digit.

What I am going to show below is only the variable page number portion of the string being searched. The rest remains the same throughout the search, so need to muddy up what I am trying to point out.

This string \d will find pages 1 thru 9

This string \d+\s\d+ will find pages 1 0 thru 9 9 (notice the letter space)

This string \d+\s\d+\s\d+ will find pages 1 0 0 thru the end of the book

Granted I don't the + after the d, for this particular pattern, but it doesn't change the outcome with or without it.

Is there a way to put this all together in one search string? It would not be a problem if the search on each page was the same. But the even number pages include the author name, while the odd number pages include the title name. So that means it will take six search strings to clear it all out. Three searches for odd, complete the conversion from pdf to mobi, then reopen the newly created mobi and then do three searches for the even, finally converting once again to mobi. To bad there are not six search dialog windows.
remltr is offline   Reply With Quote
Old 06-16-2011, 01:41 AM   #9
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
You can use parentheses with the '|' character:
Code:
(\d+|\d+\s\d+|\d+\s\d+\s\d+)
You could also probably simplify it further by using this regex:
Code:
(\d+\s?)+

Don't forget to include <br> or <a name=\d+> somewhere in the expression so that numbers in the middle of the text don't get eaten.

Last edited by ldolse; 06-16-2011 at 01:43 AM.
ldolse is offline   Reply With Quote
Old 06-16-2011, 09:53 PM   #10
remltr
Junior Member
remltr began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jun 2011
Device: Kindle
Quote:
Originally Posted by ldolse View Post
You can use parentheses with the '|' character:
Code:
(\d+|\d+\s\d+|\d+\s\d+\s\d+)
You could also probably simplify it further by using this regex:
Code:
(\d+\s?)+

Don't forget to include <br> or <a name=\d+> somewhere in the expression so that numbers in the middle of the text don't get eaten.



This worked pretty well: (\d+\s?)+

I still needed to go in clean out the pages where a chapter started, but those did not have letter spacing between the digits and had a different string of characters surrounding them. In the end I still needed to do a couple of conversions, but your assistance made it much easier.

I am saving the searches that worked in a spreadsheet for future reference as there are about 12 books in this authors series. Maybe they will come in handy, if they don't keep changing the structure of the page titles.

Now I need to clear out all the useless searches I created that Calibre saves so I can have a cleaner drop down. Just need to figure out what file that info is stored in.

ldolse thanks once again for your help. I believe that I am understanding the structure for the search tags a little better and hopefully can carry on on my own.

Last edited by remltr; 06-16-2011 at 09:57 PM.
remltr is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Content DjVu to PDF Conversion for Kindle stvs Amazon Kindle 10 12-10-2011 08:21 AM
Test PDF -> Kindle conversion? redrob2 Amazon Kindle 2 01-14-2011 07:58 AM
PDF to kindle(Mobi) conversion ephzee Calibre 0 04-19-2010 10:58 AM
PDF Conversion For Use With Kindle 2 Bluesman7 Calibre 3 04-11-2010 11:41 AM
PDF Kindle Conversion troubleshooting cepino Reading and Management 7 09-07-2009 02:18 PM


All times are GMT -4. The time now is 03:08 AM.


MobileRead.com is a privately owned, operated and funded community.