Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 11-05-2010, 08:16 AM   #1
JoTH
Junior Member
JoTH began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: iPad
Question Page breaks - PDF input to ePub output

I have done a search but can't see anything mentioning this (sorry if I missed it).

From reading through I understand that converting from PDF's isn't ideal but unfortunately this is what I have.

Generally I have got the conversion to work OK (or it seems to at first glance) but it isn't putting page breaks in. So page numbers, headers and footers are put throughout the text. I hope this is making sense I am not the best at describing.

Can anyone point me in the right direction in basic terms that my basic brain can follow e.g. step 1...? (I don't understand programming language)

Thanks.
JoTH is offline   Reply With Quote
Old 11-05-2010, 08:35 AM   #2
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Page breaks generally make little sense in reflowable formats like ePub, which is why they are (mostly) removed.
Are you trying to preserve the pagebreaks as they are in the PDF file, or do you want to get rid of headers/footers and page numbers? If it's the latter, you might want to take a look at the relevant manual pages.
Manichean is offline   Reply With Quote
Old 11-05-2010, 11:41 AM   #3
JoTH
Junior Member
JoTH began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: iPad
Thanks Manichean!

Now to get it to work, so far I am not having much luck.

Would it be possible to give me the correct expression to copy and paste? It is to remove the title on the even pages, the author on the odd pages and the page number (just as the number: 1, 2 etc.).

I don't know if it makes any difference but I am using Calibre version 0.7.25.

I am also wondering if I am putting it the correct place: Structure Detection, header regular expresion, with a tick in remove header.

Sorry to be a pain!
JoTH is offline   Reply With Quote
Old 11-05-2010, 01:41 PM   #4
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by JoTH View Post
Thanks Manichean!

Now to get it to work, so far I am not having much luck.

Would it be possible to give me the correct expression to copy and paste? It is to remove the title on the even pages, the author on the odd pages and the page number (just as the number: 1, 2 etc.).

I don't know if it makes any difference but I am using Calibre version 0.7.25.

I am also wondering if I am putting it the correct place: Structure Detection, header regular expresion, with a tick in remove header.
That's the right place. However, I cannot help you, without you providing some detail, in creating the right expression. What you need to post is the code (what you see when you click on the magic wand besides the header/footer regex) of the page numbers and headers/footers you want to remove. A snippet containing the offending entries and a little before and after suffices. However, I'd strongly suggest that you read the manual entries I linked to. I'll gladly help if you don't understand something, but you'll gain so much more if you learn to write the expressions yourself.
Manichean is offline   Reply With Quote
Old 11-05-2010, 02:12 PM   #5
JoTH
Junior Member
JoTH began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: iPad
Thanks for replying.

I have read the info on both links you gave and done a further search and thought I understood but when I try it out it doesn't do anything.

Here is a snippet of the book from the wizard:

The sprawling Eller-Stapleton Inn, a coaching stop for <br>
travelers on the way north, was miles from the nearest town <br>
<hr>
<A name=8></a><i>8 </i><br>
<i>Highland Fling </i><br>
and constable. Ordinarily she and her staff took care of their <br>
own problems. Her capable innkeeper, Mr. Carson, main*<br>

and

“Who are they? Did they not give names?” she asked, <br>
hoping they had refused. By law, an inn’s patrons had to <br>
identify themselves and sign a register to obtain lodgings. <br>
<hr>
<A name=9></a><i>Betina Krahn </i><br>
<i>9 </i><br>
“They give names, all right.” Carson glowered, reaching <br>
for his big leather register and opening it to the current page. <br>


These being the parts I am trying to remove:

<hr>
<A name=8></a><i>8 </i><br>
<i>Highland Fling </i><br>

and

<hr>
<A name=9></a><i>Betina Krahn </i><br>
<i>9 </i><br>



I hope that after I see how the expression should be written I will understand for the future. I generally do after seeing a working example.

Thanks again for your help with this, I really do appreciate it!
JoTH is offline   Reply With Quote
Old 11-05-2010, 03:03 PM   #6
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Generally, you need to identify the whitespaces and the variable parts. Thus, the page number
Code:
<hr>
<A name=8></a><i>8 </i><br>
<i>Highland Fling </i><br>
is removed by the expression
Code:
<hr>\s+<A\s+name=\d+></a><i>\d+\s+</i><br>\s+<i>Highland\s+Fling\s+</i><br>
and the authors name
Code:
<hr>
<A name=9></a><i>Betina Krahn </i><br>
<i>9 </i><br>
is removed by
Code:
<hr>\s+<A\s+name=\d+></a><i>Betina\s+Krahn\s+</i><br>\s+<i>\d+\s+</i><br>
I'll just let this stand as is, since you have read the manual entries already. If you are unclear on any special point in the expressions, feel free to ask.
Manichean is offline   Reply With Quote
Old 11-05-2010, 03:36 PM   #7
JoTH
Junior Member
JoTH began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: iPad
Thank you!

I understand where I went wrong now and have managed to do another one on my own.

Thanks again. Have a great weekend!
JoTH is offline   Reply With Quote
Old 11-05-2010, 05:16 PM   #8
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Glad it worked.

Out of interest, where did you go wrong?
Manichean is offline   Reply With Quote
Old 11-12-2010, 05:31 AM   #9
JoTH
Junior Member
JoTH began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: iPad
I forgot to put the white space between the words in the title and author. Daft I know, I was having a bad day.
JoTH is offline   Reply With Quote
Old 11-18-2010, 04:35 PM   #10
Mixx
Zealot
Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.
 
Posts: 138
Karma: 387
Join Date: Sep 2010
Device: Kindle 3
I apologize, regular expressions are just beyond me (but I did try!). Could a kind soul tell me, how to eliminate these very simple footer lines with a page number in them? E.g. the break from page 7 to 8 looks like this:

Code:
 text of last line on previous page <br>
 <br>
7<br>
<hr>
<A name=8></a>first line of new page
I am not having much luck with the "Test" facility, it does not seem to work on my system. What regular expression do I put in?

Thanxx, Mixx
Mixx is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to ePub in Calibre - input somewhat scrambled Seanette ePub 2 11-04-2010 07:34 AM
PDF output - page size/orientation problems kurokaze Calibre 1 09-26-2010 06:08 PM
PDF to EPUB - spurious paragraph breaks RichieTheK Calibre 2 09-08-2010 11:27 AM
Any way to force page breaks when converting HTML to EPUB Bierkonig Calibre 23 10-31-2009 01:51 PM
PDF to LRF with page breaks jupinator Calibre 0 07-27-2009 03:57 PM


All times are GMT -4. The time now is 12:39 PM.


MobileRead.com is a privately owned, operated and funded community.