![]() |
#1 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jan 2011
Device: kindle
|
major problems converting pdf
Hi all, I have to admit I am pretty new to ebook readers, Wife bought me a kindle for xmas and its great, however most my reading material is in PDF format and when I use Calibre to convert books I always get a messed up conversion. I end up with the name of the book randomly inserted into the pages, or quite simply text appears in the converted book that isnt in the PDF.
Can anyone advise why this is happening, or am I simply expecting to much? I thought I should be able to take a perfectly formatted pdf and convert it to epub or mobi and have the same output? If it helps I have uploaded a PDF and the mobi conversion in a zip file here http://www.fileserve.com/file/EQPxZHd If anyone can hekp then please do as its really ruining the reading experience at present. Cheers edited to add, just found out whats causing one particular issue, just dont know how to resolve it, I have a few PDF's and at the top of each page it has the page number and the title of the book, when I convert these PDF's into either epub or mobi (doing epub conversion for a friend with a samsung ereader) the page number and book title are being made bold and larger text and then being insterted into the middle of the sentance, so the conversion isnt able to tell thats its the start of a new page, I ahve no idea how to tell it thats this is the start of a new page, ideally I want it to ignore the page number and the book title unless it can add it as it is in the PDF. Any thoughts? Last edited by dapex; 01-08-2011 at 07:33 AM. |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
You need to write a regular expression to remove the header/footer. Go to structure detection under the conversion options, enable either 'remove header' or 'remove footer', and then enter the appropriate regular expression. You can click the magic wand button to pull up a wizard to help you write/test it. There's several a tutorial in the Calibre manual and several tutorials online for regular expressions/regex if you're not familiar with them.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jan 2011
Device: kindle
|
cheers for that, tried the remove header and footer but that didnt seem to do anything, will google for the tuturials and see if that helps
![]() |
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Found the Calibre tutorial - couldn't find it when I posted before:
https://www.calibre-ebook.com/user_manual/regexp.html |
![]() |
![]() |
![]() |
#5 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jan 2011
Device: kindle
|
OK, read the tutorial you showed me and to be honest its way over my head. I have had a look at the page struction detection section on calibre and found out where the problem is. below is a section of the PDF file I am currently working on
When you're just about to be really<br> mean to someone you love, you could stop and do this. And with<br> <hr> <A name=28></a><i>26 Using Your Brain</i><br> the look that's on your faces right now, who knows what you<br> could get into . . . .all kinds of fun trouble!<br Basically the whole line <A name=28></a><i>26 Using Your Brain</i><br> is at the top of a page and its the page number and chapter title, this is on the top of every page but the A name= changes number every time in increments of one. Because the software doesnt realise this is the page number and chapter title it is adding it into the text of the book which is obviously a tad annoying. Can anyone tell me how I can tell calibre to either ignore the <A name=28></a><i>26 Using Your Brain</i><br> or tell it that this is a page header and so to just put it at the top of the page in smaller txt instead of in the middle of a sentance??? Please help as this is a problem on many of the PDF's I have and its really bugging me that I cant fix it. (I can fix it buy going into a PDF editor and manually removing each page number etc) but as you can imagine, this is a painfully slow process and when I have loads of PDF's to do its not really practical. Cheers Dave |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
You should use the option to remove headers (and/or footers) in the Structure Detection part of PDF input. Note despite their names these are really just generic string removal options - it is just that header/footer removal is their commenst usage.
You have to construct a regex expression that is specific to the file in question. However it is quite easy to do in most cases if you take advantage of the wizard. The steps I use are: - Press the Wizard button alongside the inpout text box for one of the above options, and select the PDF file - When the window opens up, find an example of the text you want to remove, and then copy/paste it into the regex box at the top replacing what is already there. - replace anywhere there is a number with \d* to allow for any number of any length. This handles things like the page number varying. - replace anywhere there is white space with \s*. This also handle tab, newlines etc - Press the Test button to make sure the text you want removed is highlighted - if not you probably got one of the \ d* or \s* replacements wrong - If the correct text was highlighted, scroll down to the next occurrence of similar strings to check it was also highlighted so that you have generalised the expression correctly - Press OK - Make sure the checkbox to use the expression just created is ticked. - Repeat if necessary for the footer box as typically the footers need a different regex to the header. - Press OK to actually do the conversion - When conversion completes you can view the results to check they are what you want. It sounds more complicated than it actually turns out to be, and you do not have to really understand regex to carry out the above steps. The settings you used fir this particular book will be remembered so if you need to tweak the settings you last set will be the new starting point. |
![]() |
![]() |
![]() |
#7 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
there are other ways to remove the header & footer , without learning/using regex- google pdfscissors or search for it in this forum
|
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
All you need to do is delete the
Code:
<i>26 Using Your Brain</i><br> The stuff with <A name=....> gets deleted as part of the default processing, so you don't need to particularly worry about that. The regex should be something like: Code:
<i>\d+\s*Using\s*Your\s*Brain\s*</i>\s*<br> |
![]() |
![]() |
![]() |
#9 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,896
Karma: 6995721
Join Date: Dec 2008
Location: Idaho, on the side of a mountain
Device: Kindle Oasis, Fire 3d Gen and 5th Gen and Samsung Tab S
|
Unfortunately, this is over my head. I use Mobipocket Creator Pro to convert pdf. Calibre does epub flawlessly, but fine-tuning pdf is too much for me. MPC does a really good job - just import, then click build
|
![]() |
![]() |
![]() |
#10 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 156
Karma: 1010345
Join Date: Jun 2009
Device: PRS 350
|
Like Sydney's Mom, this is way over my head as well. I'm perfectly happy using Briss to crop and not converting the file, especially if it's a book that I'm not planning on reading more than once. If it's one I'd like to keep, I'd probably try and get it in anther format.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
problems converting .pdf to .rb for REB1100 | RKnack | Conversion | 5 | 08-15-2011 02:44 AM |
Problems with converting pdf to mobi | Holger | Calibre | 1 | 08-27-2010 11:41 PM |
Problems with converting Palm PDB-PDF files to other formats/show in calibre-viewer | Tobago | Calibre | 7 | 04-29-2010 04:57 PM |
DR1000 two major problems with 2.0 firmware | splendor | iRex | 29 | 04-18-2010 04:11 AM |