08-15-2009, 06:13 AM | #1 |
Member
Posts: 10
Karma: 10
Join Date: Aug 2009
Device: none
|
This is driving me nuts...
Go my new Sony 505 yesterday Loving the hardware so far.
I have a few PDF's I am trying to convert and it is becoming very frustrating. I put another post on yesterday but I have moved on a bit since. I want to remove the header and footer. I have done alot of reading, searching and playing. This is what I have done so far: I use NitroPDF to crop the PDF (all pages option) to remove the header which has chapter title in and the footer which has the page numbers in. I then use calibre to convert them to either Epub or LRF (tried both several times). Copy the file to the reader using explorer. THE HEADER AND FOOTER TEXT IS STILL IN THE BOOK ALL THE WAY THROUGH!! If i load my cropped PDF into Adobe Reader there is no header and footer. Where is it getting the information from? I have also tried reinstalling Calibre in case it was picking anything up from when I wasnt using NitroPDF, still no joy. All I want to do is read without the header and footer all the way through the text, LOL |
08-15-2009, 07:47 AM | #2 |
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
The PDF cropping application you are using is not removing the cropped portion from the file. It is just hiding it. Think of it this way. You have a page, within the page is a display area. The application is just making the display area smaller so that the header and footer are not shown when you view the file but they are still there.
calibre includes the ability to remove headers and footers based on regular expressions. However, I haven't gotten around to producing any GUI tools to make it easier. The process right now is a bit difficult but, it is ebook-convert in.pdf .epub --debug-input ./out_dir, look at the html produced, create a regex, convert using the --remove-header, --remove-footer and set the header and footer regexs. |
Advert | |
|
08-15-2009, 07:52 AM | #3 |
Member
Posts: 10
Karma: 10
Join Date: Aug 2009
Device: none
|
Thanks. That makes sense for the crop. No way of deleting this info then?
So convert my PDF to html 1st then convert to epub and use remove header and footer? |
08-15-2009, 07:55 AM | #4 |
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Remove header and footer is supported by all formats. What happens is: the input (no matter what format) is converted to html and run though a preprocessor before being turned into an internal OEB book. The preprocessor applies the regex to remove the header and footer as well as doing other things. The OEB is then turned into what ever your output format is.
PM me your PDF and I'll look at it to see what kind of regex you will need to remove the header and footer. |
08-15-2009, 12:19 PM | #5 |
Member
Posts: 21
Karma: 10
Join Date: Jul 2008
Device: EZ Reader Pocket Pro
|
hi... i used to have the same problem with pdf files ... im not expert but i think the only way to remove those are with either a software to convert pdf to word and then remove the header& footer manually or to use a regular expression i think, anyway i found long time ago a program that converts files to pdb files (palm ebooks) and it does it on 3 steps first it converts the pdf file to txt, then cleans the txt (the result may be wrong but almost every time deletes header & footer) and then converts this clean txt file to pdb, but you can use the clean txt file and import it to calibre (it what i do) so... it maybe suit you... hope i make myself clear cause my english ... not very good... anyway here's the link http://www.reblusoft.com/index.php?o...d=20&Itemid=49
|
Advert | |
|
08-16-2009, 11:53 AM | #6 | |
Enthusiast
Posts: 25
Karma: 16
Join Date: Aug 2009
Device: Pocketbook 360, Sony PRS-T1
|
Quote:
- xxx - (xxx = page number, one to three digits)? I tried to use the following (simple) RE: \- [0-9]+ \- But it removes only - x - (- 1 - to - 9 -) and - xxx - (- 100 - to - 999 -) but not - xx - (- 10 - to - 99 -) Whats wrong? Thanks a lot for your help, DSP Last edited by DerSchwarzePrinz; 08-16-2009 at 12:37 PM. |
|
08-16-2009, 03:54 PM | #8 | |
Enthusiast
Posts: 25
Karma: 16
Join Date: Aug 2009
Device: Pocketbook 360, Sony PRS-T1
|
Quote:
If I use "\d{1,3}" all the page numbers are removed, but naturally the hyphens stay behind. If I use "- \d{1,3} -" all page numbers including hyphens are removed without the page numbers that have two digits. Last edited by DerSchwarzePrinz; 08-16-2009 at 04:52 PM. |
|
08-16-2009, 04:56 PM | #9 |
Member
Posts: 10
Karma: 10
Join Date: Aug 2009
Device: none
|
I sorted it another way, thanks for the help guys.
I used NitroPDF to export the PDF to word and chose the delete header and footer info. Then with word I saved the new doc to pdf. Then I used calibre to convert to EpuB. Long winded but worked a treat. seeing as i will only be converting 4-5 books a month its ok. |
08-16-2009, 05:37 PM | #10 |
Connoisseur
Posts: 53
Karma: 10
Join Date: Feb 2008
Device: iPad Pro, Kobo Libra 2, PW4
|
I use Abbyy FR9 to scan the pdf then I export the pdf to html. Then, I use snowsoft's htmlbookfixer to clean everything up. Works perfectly.
|
08-16-2009, 07:52 PM | #11 | ||
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Quote:
Code:
"-\s*\d{1,3}\s*-" Quote:
|
||
08-16-2009, 07:59 PM | #12 |
01000100 01001010
Posts: 1,889
Karma: 2400000
Join Date: Mar 2009
Device: Polyamorous
|
I have had the same problem and never successfully got Calibre to remove the headers (some of the footers in particular are complicated, with 2 lines of footer). So I used the Adobe Acrobat cropping feature, saved as PDF, and suffered through reading the PDF.
|
08-16-2009, 08:10 PM | #13 |
Hey Trashcan Man
Posts: 66
Karma: 658
Join Date: Jan 2008
Location: So Cal
Device: Nook color, prs 505, Axim x30, psp, Acer Aspire One [running xp]
|
I don't know about the software your using, but I use acrobat pro and usually I crop the header and footer first. Then I go to the document tab and choose examine document. After it searches the doc, it list the items it has found. I uncheck metadata and bookmarks as I don't usually want those removed. I do however wish to remove hidden data, which is the header and footers I can no longer view. After that I can convert it with Calibre. The software your using should offer a way to remove the hidden data once you have cropped your document. I hope this helps
|
08-17-2009, 10:10 AM | #14 | |
Enthusiast
Posts: 25
Karma: 16
Join Date: Aug 2009
Device: Pocketbook 360, Sony PRS-T1
|
Quote:
"-\s*\d.+-" That seems to work ... What RE could I use for "normal" page numbers e. g. 1, 2, 3 ...? I want to avoid removing numbers from the document, so I need something like "remove any number from one to three digits on a single line". Could you please give me a hint? |
|
09-02-2009, 01:44 AM | #15 | |
Junior Member
Posts: 1
Karma: 32
Join Date: Sep 2009
Device: Kindle 2
|
Quote:
- D |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Unutterably Silly Unsalted Nuts | recluse | Lounge | 19 | 04-30-2010 07:06 PM |
Going Nuts with Calibre | saitekx36 | Calibre | 17 | 06-10-2009 06:04 PM |
Short Fiction Ebers, Georg: The Nuts. V1. 28 Mar 2009 | crutledge | Kindle Books | 0 | 03-28-2009 07:57 AM |
PRS-505 Date reset driving me nuts! | nathantw | Sony Reader | 2 | 07-11-2008 04:00 PM |