Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 05-09-2023, 04:02 AM   #1
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Question Preventing chapter names + line numbers during conversion?

Hello,

According to the "Read this before Posting PDF Questions" (section "There are page numbers, headers, or footers in my output"), using regexes as Search & Replace is the way to go to remove unwanted chapter names + line numbers.

Is there no way to prevent this at the source, when calling Calibre?

Code:
"C:\Program Files\Calibre2\ebook-convert.exe" input_file output_file [options]
Thank you.

--
Edit: If a regex must be used for that, can it be included on the CLI, eg.
Code:
--remove "^\d+$"
--
Edit: Tried Briss to first crop the input PDF, ignoring each chapter's first page, but it's stuck at "Loading new file - Creating merged previews"

--
Edit: cpdf is supposed to be able to crop some pages; I can't figure out how to use the coordinates to trim the top and bottom of the relevant pages

Code:
cpdf.exe -crop "0 0 600pt 400pt" input.pdf 19-24  -o output.pdf
--
Edit: Through trial and error…

Code:
#Provided page 25 is one of the pages that need to have its header removed
cpdf -page-info input.pdf 25
MediaBox: 0.000000 0.000000 424.800000 640.800000
CropBox: 0.000000 39.924500 424.147000 640.800000

#Unlike Briss, cpdf seems unable to exclude pages, only include
cpdf.exe -crop "0 0 424pt 600pt" input.pdf 19-24,26-97,99-156 -o cropped.pdf
EPUB built by Calibre still NOK: mediabox vs. trimbox?

--
Edit: I don't get it. cpdf seems to crop the PDF just fine, but for some reason, Calibre seems to add the header back into the EPUB. Same display in SumatraPDF and STDU Viewer.

Code:
cpdf.exe -mediabox "0 0 424pt 600pt" input.pdf 19-24,26-97 AND -crop "0 0 424pt 600pt" 19-24,26-97 input.pdf -o output.pdf
--
Edit: Opening the (supposedly) cropped PDF in LibreOffice shows the stuff's still there. Incidentally, LO doesn't support exporting to EPUB.

--
Edit: At this point, I found no open-source software that can perform permanent cropping. The different tools I tried only handle "visual cropping", ie. it's hidden on the screen, but the data is still in the PDF, which explains why Calibre includes it in its EPUB. Hence the need for a regex to try to search and remove.

--
Edit: "printing" the cropped PDF into a PDF using the CutePDF Writer does finally crop the output

… but for some reason, Calibre saves all the pages as pictures instead of text

Code:
"C:\Program Files\Calibre2\ebook-convert.exe" "test.print.crop.pdf" "test.print.crop.epub"
--
Edit: It's because the CutePDF Writer driver "prints" as pictures. But when selecting "Print to file" in Chrome's Print dialog, the job is stuck in the print list, altough I restricted the job to a few pages. If someone knows of a better way to "print to PDF"…

--
Edit: It seems like no (open-source, at least) tool is available to actually crop/trim a PDF so that the data is definitly removed and Calibre won't see it
Attached Thumbnails
Click image for larger version

Name:	C9E85301-BC77-475A-9484-896BC5DB4B9C.png
Views:	90
Size:	318.3 KB
ID:	201470   Click image for larger version

Name:	4EC6324E-4AAA-45E9-B7E7-6DAFB45C576F.png
Views:	59
Size:	512.5 KB
ID:	201479   Click image for larger version

Name:	30F10889-5321-4FF2-A997-DD2961AF9D1B.png
Views:	58
Size:	37.0 KB
ID:	201480  

Last edited by Shohreh; 05-12-2023 at 09:29 AM.
Shohreh is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Preventing breaks at the end of a line. AlexBell ePub 68 05-11-2022 03:03 PM
Preventing hyphenation in chapter titles? graycyn ePub 20 11-20-2016 01:22 PM
Preventing chemical symbol from being split by page or line break Nick Payne ePub 8 10-23-2013 03:06 AM
PDF to EPUB conversion results in numbers at the end of each line godinpain Conversion 0 09-04-2013 10:12 AM
Author sort in tag browser has numbers before the names LWTBP Calibre 4 05-28-2012 11:52 AM


All times are GMT -4. The time now is 03:51 AM.


MobileRead.com is a privately owned, operated and funded community.