![]() |
#1 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
![]()
Hello,
According to the "Read this before Posting PDF Questions" (section "There are page numbers, headers, or footers in my output"), using regexes as Search & Replace is the way to go to remove unwanted chapter names + line numbers. Is there no way to prevent this at the source, when calling Calibre? Code:
"C:\Program Files\Calibre2\ebook-convert.exe" input_file output_file [options] -- Edit: If a regex must be used for that, can it be included on the CLI, eg. Code:
--remove "^\d+$" Edit: Tried Briss to first crop the input PDF, ignoring each chapter's first page, but it's stuck at "Loading new file - Creating merged previews" -- Edit: cpdf is supposed to be able to crop some pages; I can't figure out how to use the coordinates to trim the top and bottom of the relevant pages Code:
cpdf.exe -crop "0 0 600pt 400pt" input.pdf 19-24 -o output.pdf Edit: Through trial and error… Code:
#Provided page 25 is one of the pages that need to have its header removed cpdf -page-info input.pdf 25 MediaBox: 0.000000 0.000000 424.800000 640.800000 CropBox: 0.000000 39.924500 424.147000 640.800000 #Unlike Briss, cpdf seems unable to exclude pages, only include cpdf.exe -crop "0 0 424pt 600pt" input.pdf 19-24,26-97,99-156 -o cropped.pdf -- Edit: I don't get it. cpdf seems to crop the PDF just fine, but for some reason, Calibre seems to add the header back into the EPUB. Same display in SumatraPDF and STDU Viewer. Code:
cpdf.exe -mediabox "0 0 424pt 600pt" input.pdf 19-24,26-97 AND -crop "0 0 424pt 600pt" 19-24,26-97 input.pdf -o output.pdf Edit: Opening the (supposedly) cropped PDF in LibreOffice shows the stuff's still there. Incidentally, LO doesn't support exporting to EPUB. -- Edit: At this point, I found no open-source software that can perform permanent cropping. The different tools I tried only handle "visual cropping", ie. it's hidden on the screen, but the data is still in the PDF, which explains why Calibre includes it in its EPUB. Hence the need for a regex to try to search and remove. -- Edit: "printing" the cropped PDF into a PDF using the CutePDF Writer does finally crop the output ![]() … but for some reason, Calibre saves all the pages as pictures instead of text ![]() Code:
"C:\Program Files\Calibre2\ebook-convert.exe" "test.print.crop.pdf" "test.print.crop.epub" Edit: It's because the CutePDF Writer driver "prints" as pictures. But when selecting "Print to file" in Chrome's Print dialog, the job is stuck in the print list, altough I restricted the job to a few pages. If someone knows of a better way to "print to PDF"… -- Edit: It seems like no (open-source, at least) tool is available to actually crop/trim a PDF so that the data is definitly removed and Calibre won't see it Last edited by Shohreh; 05-12-2023 at 09:29 AM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Preventing breaks at the end of a line. | AlexBell | ePub | 68 | 05-11-2022 03:03 PM |
Preventing hyphenation in chapter titles? | graycyn | ePub | 20 | 11-20-2016 01:22 PM |
Preventing chemical symbol from being split by page or line break | Nick Payne | ePub | 8 | 10-23-2013 03:06 AM |
PDF to EPUB conversion results in numbers at the end of each line | godinpain | Conversion | 0 | 09-04-2013 10:12 AM |
Author sort in tag browser has numbers before the names | LWTBP | Calibre | 4 | 05-28-2012 11:52 AM |