02-11-2021, 05:47 AM | #1 |
Junior Member
Posts: 2
Karma: 10
Join Date: Feb 2021
Device: Kindle
|
How do I find the Chapter and or Page break in a PDF
Real newbie to Calibre here.,..
I'm trying to convert a pdf to MOBI. Mostly it works fine. but no matter what I do I can't get it to insert a page break between Chapters or anywhere else that I've got a chapter or even a page break. In Debug output, the Input shows a <hr/> and displays a line in the webpage output, but after that, it's ignored. What am I doing wrong? There is nothing special in the document at all. No fancy formatting nothing. Just can't get it to insert page breaks at chapters. Which I believe it can do? So where do I start please? |
02-11-2021, 07:39 AM | #2 |
Grand Sorcerer
Posts: 6,678
Karma: 86234809
Join Date: Nov 2011
Location: Charlottesville, VA
Device: Kindles
|
See Read this before Posting PDF Questions, especially the section "My PDF has a table of contents or links/bookmarks, but they weren't used during conversion".
|
Advert | |
|
01-25-2024, 07:57 PM | #3 |
Groupie
Posts: 172
Karma: 248528
Join Date: Jan 2016
Device: none
|
Hello,
Indeed and sadly, adding bookmarks to the source PDF doesn't help Calibre splitting chapters right when converting to EPUB. Is there no way to help Calibre with this task, without having to mess with the HTML in the EPUB output? Last edited by Shohreh; 01-25-2024 at 08:42 PM. |
01-25-2024, 08:07 PM | #4 |
Bibliophagist
Posts: 39,471
Karma: 154108302
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Pretty much no. See the numerous comments about PDF being the worst format to convert from.
|
01-25-2024, 08:13 PM | #5 |
Groupie
Posts: 172
Karma: 248528
Join Date: Jan 2016
Device: none
|
I know, but it's odd that Calibre can't just use the PDF's bookmarks to know where new chapters start.
At this point, the conversion went pretty well after I removed the headers in the source PDF with a bit of Python. I just need to figure out how to have Calibre start a new page with each new chapter. -- Edit: If Calibre really is unable to split chapters by relying on the PDF's bookmarks, what about splitting the PDF into multiple files (one chapter = one PDF; For this, check qpdf, cpdf, mutool, etc.), have Calibre convert them into EPUB files, and then merge them into a single EPUB? -- Edit: Is there an option to prevent ebook-convert.exe from adding a "Document Outline" at the end of the EPUB? Last edited by Shohreh; 01-25-2024 at 10:10 PM. |
Advert | |
|
01-26-2024, 10:39 AM | #6 |
the rook, bossing Never.
Posts: 12,249
Karma: 89531599
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
If you knew how PDFs work you'd not think it odd.
Only actual paper, vellum and stone tablets are a worse conversion source than PDF, and some scanned PDFs are so bad that photographing the paper with a phone is better. It's a waste otr time, because what you get working for one PDF (if you ever get it to 'work') may not work on the next PDF. My TCL NxtPaper 11 arrived today which is my latest solution to reading PDFs. Cheaper than Scribe, Elipsa, Boox etc. Last edited by Quoth; 01-26-2024 at 10:41 AM. |
01-26-2024, 12:15 PM | #7 |
Groupie
Posts: 172
Karma: 248528
Join Date: Jan 2016
Device: none
|
Out of curiosity, why can't Calibre use a PDF's bookmarks to know where each chapter starts instead of guessing while reading the XHTML generated by pdftohtml?
|
01-26-2024, 01:00 PM | #8 |
creator of calibre
Posts: 44,346
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Let's count the reasons:
1) there is no guarantee bookmarks entries are chapter starts 2) PDF consists of a bunch of font glyphs placed at absolute co-ordinates on the page. A bookmark or any link really is also just another co-ordinate on a page. There is no way to map that to some semantic element reliably. One has to use heuristics. |
01-26-2024, 01:47 PM | #9 |
the rook, bossing Never.
Posts: 12,249
Karma: 89531599
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
3) Other mad stuff too.
|
01-27-2024, 08:25 PM | #10 |
Groupie
Posts: 172
Karma: 248528
Join Date: Jan 2016
Device: none
|
Thanks for the infos.
Suggestion: If it's the user who added the bookmark into the PDF, it's reliable, and Calibre could make it an option to use that info to find chapters. Anyway, an alternative, simpler solution than adding+removing bookmarks is to… 1) open the PDF in a reader eg. SumatraPDF, 2) make a list of pages/slices that make up chapters, 3) use it to split the source PDF into sub-PDFs (eg. cpdf; one chapter = one PDF) 4) run Calibre to convert them into EPUBs, and 5) finally join them into a single EPUB. It does nothing to help converting PDFs into clean EPUBs, but at least, Calibre will know where each chapter starts and ends. Last edited by Shohreh; 01-27-2024 at 08:58 PM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Chapter/Page Break Conversion in Calibre | Draft Works | Conversion | 0 | 09-17-2019 11:03 AM |
Page Break vs Chapter Split | Trane | Sigil | 35 | 12-16-2016 02:14 PM |
How to add a page break before each chapter | barryem | Editor | 13 | 10-03-2016 10:11 PM |
Manual page break as chapter | ardeegee | ePub | 4 | 04-08-2011 11:35 PM |
How to avoid page break after heading/chapter | tkirke | ePub | 6 | 01-22-2010 02:12 PM |