Preventing chapter names + line numbers during conversion?

Shohreh · 05-09-2023, 04:02 AM

Hello,

According to the "Read this before Posting PDF Questions" (section "There are page numbers, headers, or footers in my output"), using regexes as Search & Replace is the way to go to remove unwanted chapter names + line numbers.

Is there no way to prevent this at the source, when calling Calibre?

Code:

"C:\Program Files\Calibre2\ebook-convert.exe" input_file output_file [options]

Thank you.

--
Edit: If a regex must be used for that, can it be included on the CLI, eg.

Code:

--remove "^\d+$"

--
Edit: Tried Briss to first crop the input PDF, ignoring each chapter's first page, but it's stuck at "Loading new file - Creating merged previews"

--
Edit: cpdf is supposed to be able to crop some pages; I can't figure out how to use the coordinates to trim the top and bottom of the relevant pages

Code:

cpdf.exe -crop "0 0 600pt 400pt" input.pdf 19-24  -o output.pdf

--
Edit: Through trial and error…

Code:

#Provided page 25 is one of the pages that need to have its header removed
cpdf -page-info input.pdf 25
MediaBox: 0.000000 0.000000 424.800000 640.800000
CropBox: 0.000000 39.924500 424.147000 640.800000

#Unlike Briss, cpdf seems unable to exclude pages, only include
cpdf.exe -crop "0 0 424pt 600pt" input.pdf 19-24,26-97,99-156 -o cropped.pdf

EPUB built by Calibre still NOK: mediabox vs. trimbox?

--
Edit: I don't get it. cpdf seems to crop the PDF just fine, but for some reason, Calibre seems to add the header back into the EPUB. Same display in SumatraPDF and STDU Viewer.

Code:

cpdf.exe -mediabox "0 0 424pt 600pt" input.pdf 19-24,26-97 AND -crop "0 0 424pt 600pt" 19-24,26-97 input.pdf -o output.pdf

--
Edit: Opening the (supposedly) cropped PDF in LibreOffice shows the stuff's still there. Incidentally, LO doesn't support exporting to EPUB.

--
Edit: At this point, I found no open-source software that can perform permanent cropping. The different tools I tried only handle "visual cropping", ie. it's hidden on the screen, but the data is still in the PDF, which explains why Calibre includes it in its EPUB. Hence the need for a regex to try to search and remove.

--
Edit: "printing" the cropped PDF into a PDF using the CutePDF Writer does finally crop the output

… but for some reason, Calibre saves all the pages as pictures instead of text

Code:

"C:\Program Files\Calibre2\ebook-convert.exe" "test.print.crop.pdf" "test.print.crop.epub"

--
Edit: It's because the CutePDF Writer driver "prints" as pictures. But when selecting "Print to file" in Chrome's Print dialog, the job is stuck in the print list, altough I restricted the job to a few pages. If someone knows of a better way to "print to PDF"…

--
Edit: It seems like no (open-source, at least) tool is available to actually crop/trim a PDF so that the data is definitly removed and Calibre won't see it

05-09-2023, 04:02 AM	#1
Shohreh Addict Posts: 207 Karma: 304158 Join Date: Jan 2016 Location: France Device: none	Preventing chapter names + line numbers during conversion? Hello, According to the "Read this before Posting PDF Questions" (section "There are page numbers, headers, or footers in my output"), using regexes as Search & Replace is the way to go to remove unwanted chapter names + line numbers. Is there no way to prevent this at the source, when calling Calibre? Code: "C:\Program Files\Calibre2\ebook-convert.exe" input_file output_file [options] Thank you. -- Edit: If a regex must be used for that, can it be included on the CLI, eg. Code: --remove "^\d+$" -- Edit: Tried Briss to first crop the input PDF, ignoring each chapter's first page, but it's stuck at "Loading new file - Creating merged previews" -- Edit: cpdf is supposed to be able to crop some pages; I can't figure out how to use the coordinates to trim the top and bottom of the relevant pages Code: cpdf.exe -crop "0 0 600pt 400pt" input.pdf 19-24 -o output.pdf -- Edit: Through trial and error… Code: #Provided page 25 is one of the pages that need to have its header removed cpdf -page-info input.pdf 25 MediaBox: 0.000000 0.000000 424.800000 640.800000 CropBox: 0.000000 39.924500 424.147000 640.800000 #Unlike Briss, cpdf seems unable to exclude pages, only include cpdf.exe -crop "0 0 424pt 600pt" input.pdf 19-24,26-97,99-156 -o cropped.pdf EPUB built by Calibre still NOK: mediabox vs. trimbox? -- Edit: I don't get it. cpdf seems to crop the PDF just fine, but for some reason, Calibre seems to add the header back into the EPUB. Same display in SumatraPDF and STDU Viewer. Code: cpdf.exe -mediabox "0 0 424pt 600pt" input.pdf 19-24,26-97 AND -crop "0 0 424pt 600pt" 19-24,26-97 input.pdf -o output.pdf -- Edit: Opening the (supposedly) cropped PDF in LibreOffice shows the stuff's still there. Incidentally, LO doesn't support exporting to EPUB. -- Edit: At this point, I found no open-source software that can perform permanent cropping. The different tools I tried only handle "visual cropping", ie. it's hidden on the screen, but the data is still in the PDF, which explains why Calibre includes it in its EPUB. Hence the need for a regex to try to search and remove. -- Edit: "printing" the cropped PDF into a PDF using the CutePDF Writer does finally crop the output … but for some reason, Calibre saves all the pages as pictures instead of text Code: "C:\Program Files\Calibre2\ebook-convert.exe" "test.print.crop.pdf" "test.print.crop.epub" -- Edit: It's because the CutePDF Writer driver "prints" as pictures. But when selecting "Print to file" in Chrome's Print dialog, the job is stuck in the print list, altough I restricted the job to a few pages. If someone knows of a better way to "print to PDF"… -- Edit: It seems like no (open-source, at least) tool is available to actually crop/trim a PDF so that the data is definitly removed and Calibre won't see it Attached Thumbnails Last edited by Shohreh; 05-12-2023 at 09:29 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Preventing breaks at the end of a line.	AlexBell	ePub	68	05-11-2022 03:03 PM
Preventing hyphenation in chapter titles?	graycyn	ePub	20	11-20-2016 01:22 PM
Preventing chemical symbol from being split by page or line break	Nick Payne	ePub	8	10-23-2013 03:06 AM
PDF to EPUB conversion results in numbers at the end of each line	godinpain	Conversion	0	09-04-2013 10:12 AM
Author sort in tag browser has numbers before the names	LWTBP	Calibre	4	05-28-2012 11:52 AM