Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 01-08-2011, 06:55 AM   #1
dapex
Junior Member
dapex began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jan 2011
Device: kindle
major problems converting pdf

Hi all, I have to admit I am pretty new to ebook readers, Wife bought me a kindle for xmas and its great, however most my reading material is in PDF format and when I use Calibre to convert books I always get a messed up conversion. I end up with the name of the book randomly inserted into the pages, or quite simply text appears in the converted book that isnt in the PDF.

Can anyone advise why this is happening, or am I simply expecting to much?

I thought I should be able to take a perfectly formatted pdf and convert it to epub or mobi and have the same output?

If it helps I have uploaded a PDF and the mobi conversion in a zip file here
http://www.fileserve.com/file/EQPxZHd

If anyone can hekp then please do as its really ruining the reading experience at present.

Cheers

edited to add, just found out whats causing one particular issue, just dont know how to resolve it, I have a few PDF's and at the top of each page it has the page number and the title of the book, when I convert these PDF's into either epub or mobi (doing epub conversion for a friend with a samsung ereader) the page number and book title are being made bold and larger text and then being insterted into the middle of the sentance, so the conversion isnt able to tell thats its the start of a new page, I ahve no idea how to tell it thats this is the start of a new page, ideally I want it to ignore the page number and the book title unless it can add it as it is in the PDF.

Any thoughts?

Last edited by dapex; 01-08-2011 at 07:33 AM.
dapex is offline   Reply With Quote
Old 01-08-2011, 08:14 AM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
You need to write a regular expression to remove the header/footer. Go to structure detection under the conversion options, enable either 'remove header' or 'remove footer', and then enter the appropriate regular expression. You can click the magic wand button to pull up a wizard to help you write/test it. There's several a tutorial in the Calibre manual and several tutorials online for regular expressions/regex if you're not familiar with them.
ldolse is offline   Reply With Quote
Advert
Old 01-08-2011, 09:39 AM   #3
dapex
Junior Member
dapex began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jan 2011
Device: kindle
cheers for that, tried the remove header and footer but that didnt seem to do anything, will google for the tuturials and see if that helps
dapex is offline   Reply With Quote
Old 01-08-2011, 09:44 AM   #4
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Found the Calibre tutorial - couldn't find it when I posted before:
https://www.calibre-ebook.com/user_manual/regexp.html
ldolse is offline   Reply With Quote
Old 01-12-2011, 08:23 AM   #5
dapex
Junior Member
dapex began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jan 2011
Device: kindle
OK, read the tutorial you showed me and to be honest its way over my head. I have had a look at the page struction detection section on calibre and found out where the problem is. below is a section of the PDF file I am currently working on

When you're just about to be really<br>
mean to someone you love, you could stop and do this. And with<br>
<hr>
<A name=28></a><i>26 Using Your Brain</i><br>
the look that's on your faces right now, who knows what you<br>
could get into . . . .all kinds of fun trouble!<br

Basically the whole line <A name=28></a><i>26 Using Your Brain</i><br> is at the top of a page and its the page number and chapter title, this is on the top of every page but the A name= changes number every time in increments of one.
Because the software doesnt realise this is the page number and chapter title it is adding it into the text of the book which is obviously a tad annoying.

Can anyone tell me how I can tell calibre to either ignore the <A name=28></a><i>26 Using Your Brain</i><br> or tell it that this is a page header and so to just put it at the top of the page in smaller txt instead of in the middle of a sentance???

Please help as this is a problem on many of the PDF's I have and its really bugging me that I cant fix it. (I can fix it buy going into a PDF editor and manually removing each page number etc) but as you can imagine, this is a painfully slow process and when I have loads of PDF's to do its not really practical.

Cheers

Dave
dapex is offline   Reply With Quote
Advert
Old 01-12-2011, 09:01 AM   #6
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
You should use the option to remove headers (and/or footers) in the Structure Detection part of PDF input. Note despite their names these are really just generic string removal options - it is just that header/footer removal is their commenst usage.

You have to construct a regex expression that is specific to the file in question. However it is quite easy to do in most cases if you take advantage of the wizard. The steps I use are:
- Press the Wizard button alongside the inpout text box for one of the above options, and select the PDF file
- When the window opens up, find an example of the text you want to remove, and then copy/paste it into the regex box at the top replacing what is already there.
- replace anywhere there is a number with \d* to allow for any number of any length. This handles things like the page number varying.
- replace anywhere there is white space with \s*. This also handle tab, newlines etc
- Press the Test button to make sure the text you want removed is highlighted - if not you probably got one of the \ d* or \s* replacements wrong
- If the correct text was highlighted, scroll down to the next occurrence of similar strings to check it was also highlighted so that you have generalised the expression correctly
- Press OK
- Make sure the checkbox to use the expression just created is ticked.
- Repeat if necessary for the footer box as typically the footers need a different regex to the header.
- Press OK to actually do the conversion
- When conversion completes you can view the results to check they are what you want.

It sounds more complicated than it actually turns out to be, and you do not have to really understand regex to carry out the above steps.

The settings you used fir this particular book will be remembered so if you need to tweak the settings you last set will be the new starting point.
itimpi is offline   Reply With Quote
Old 01-12-2011, 09:39 AM   #7
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
there are other ways to remove the header & footer , without learning/using regex- google pdfscissors or search for it in this forum
cybmole is offline   Reply With Quote
Old 01-12-2011, 10:13 AM   #8
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
All you need to do is delete the
Code:
<i>26 Using Your Brain</i><br>
references.

The stuff with <A name=....> gets deleted as part of the default processing, so you don't need to particularly worry about that.

The regex should be something like:
Code:
<i>\d+\s*Using\s*Your\s*Brain\s*</i>\s*<br>
There are alternate tools like Cybmole mentioned, mobipocket creator, etc which should be able to do a basic conversion as well, perhaps with less pain and suffering on your part. That said I haven't tried them, so can't really comment.
ldolse is offline   Reply With Quote
Old 01-12-2011, 06:09 PM   #9
Sydney's Mom
Wizard
Sydney's Mom ought to be getting tired of karma fortunes by now.Sydney's Mom ought to be getting tired of karma fortunes by now.Sydney's Mom ought to be getting tired of karma fortunes by now.Sydney's Mom ought to be getting tired of karma fortunes by now.Sydney's Mom ought to be getting tired of karma fortunes by now.Sydney's Mom ought to be getting tired of karma fortunes by now.Sydney's Mom ought to be getting tired of karma fortunes by now.Sydney's Mom ought to be getting tired of karma fortunes by now.Sydney's Mom ought to be getting tired of karma fortunes by now.Sydney's Mom ought to be getting tired of karma fortunes by now.Sydney's Mom ought to be getting tired of karma fortunes by now.
 
Sydney's Mom's Avatar
 
Posts: 2,895
Karma: 6995721
Join Date: Dec 2008
Location: Idaho, on the side of a mountain
Device: Kindle Oasis, Fire 3d Gen and 5th Gen and Samsung Tab S
Unfortunately, this is over my head. I use Mobipocket Creator Pro to convert pdf. Calibre does epub flawlessly, but fine-tuning pdf is too much for me. MPC does a really good job - just import, then click build
Sydney's Mom is offline   Reply With Quote
Old 01-12-2011, 07:58 PM   #10
vulcan_girl
Groupie
vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.
 
Posts: 156
Karma: 1010345
Join Date: Jun 2009
Device: PRS 350
Like Sydney's Mom, this is way over my head as well. I'm perfectly happy using Briss to crop and not converting the file, especially if it's a book that I'm not planning on reading more than once. If it's one I'd like to keep, I'd probably try and get it in anther format.
vulcan_girl is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
problems converting .pdf to .rb for REB1100 RKnack Conversion 5 08-15-2011 02:44 AM
Problems with converting pdf to mobi Holger Calibre 1 08-27-2010 11:41 PM
Problems with converting Palm PDB-PDF files to other formats/show in calibre-viewer Tobago Calibre 7 04-29-2010 04:57 PM
DR1000 two major problems with 2.0 firmware splendor iRex 29 04-18-2010 04:11 AM


All times are GMT -4. The time now is 08:52 AM.


MobileRead.com is a privately owned, operated and funded community.