[PDF to EPUB] Remove dots on every page break?

Red_AM · 01-18-2021, 07:58 AM

Hi! So after I've converted a book from PDF to EPUB it worked perfectly except for the fact that there are dots on every page break

example:

Code:

page 1 ends
.
.
.
.
page 2 starts

I clicked on edit to see what the code looks like and found it looks like this:

Code:

<p class="calibre1"><a id="p8"></a><img src="index-8_1.png" class="calibre2"/></p>
<p class="calibre1"><img src="index-8_2.png" class="calibre2"/></p>
<p class="calibre1"><img src="index-8_3.png" class="calibre2"/></p>
<p class="calibre1"><img src="index-8_4.png" class="calibre2"/></p>

So I'm assuming the conversion decided to keep the image files from the pdf that were included with each page break. There are some images in the book that I would like to keep and found that the images from the page breaks are each 100 bytes.

Is there a quick way I could search and delete all of these page break images and lines of code?

Maybe something like:
if a line starts with

Code:

<p class="calibre1"><a id="p

delete that whole line and the 3 lines under it.

if that's possible?

Red_AM · 01-18-2021, 08:39 AM

Ah, I found a way to fix this. I opened up the epub in 7zip, sorted by size and deleted all the 100 byte images. Then opened it in Calibre's book editor and hit the "Check Book" option (which i'm guessing found all the lines of code referring to images that don't exist anymore and removing them).

Albeit if anyone knows a better solution, like finding and fixing the issue during conversion, would really appreciate it if you'd let me know!

Thanks!

gbm · 01-18-2021, 10:09 AM

Quote:

Originally Posted by Red_AM

Hi! So after I've converted a book from PDF to EPUB it worked perfectly except for the fact that there are dots on every page break

example:

Code:

page 1 ends
.
.
.
.
page 2 starts

I clicked on edit to see what the code looks like and found it looks like this:

Code:

<p class="calibre1"><a id="p8"></a><img src="index-8_1.png" class="calibre2"/></p>
<p class="calibre1"><img src="index-8_2.png" class="calibre2"/></p>
<p class="calibre1"><img src="index-8_3.png" class="calibre2"/></p>
<p class="calibre1"><img src="index-8_4.png" class="calibre2"/></p>

So I'm assuming the conversion decided to keep the image files from the pdf that were included with each page break. There are some images in the book that I would like to keep and found that the images from the page breaks are each 100 bytes.

Is there a quick way I could search and delete all of these page break images and lines of code?

Maybe something like:
if a line starts with

Code:

<p class="calibre1"><a id="p

delete that whole line and the 3 lines under it.

if that's possible?

Quote:

Originally Posted by Red_AM

Ah, I found a way to fix this. I opened up the epub in 7zip, sorted by size and deleted all the 100 byte images. Then opened it in Calibre's book editor and hit the "Check Book" option (which i'm guessing found all the lines of code referring to images that don't exist anymore and removing them).

Albeit if anyone knows a better solution, like finding and fixing the issue during conversion, would really appreciate it if you'd let me know!

Thanks!

Yes you can fix it during the conversion possess.

Quote:

These options are useful primarily for conversion of PDF documents or OCR conversions, though they can also be used to fix many document specific problems. As an example, some conversions can leaves behind page headers and footers in the text.

But I recommend you become familiar with the calibre ebook editor and its search and replace function.

https://manual.calibre-ebook.com/edit.html

https://manual.calibre-ebook.com/edi...search-replace

Bookmark or save a local copy of this page of Quick reference for regexp syntax

bernie

01-18-2021, 07:58 AM	#1
Red_AM Junior Member Posts: 3 Karma: 10 Join Date: Feb 2019 Device: Android	[PDF to EPUB] Remove dots on every page break? Hi! So after I've converted a book from PDF to EPUB it worked perfectly except for the fact that there are dots on every page break example: Code: page 1 ends . . . . page 2 starts I clicked on edit to see what the code looks like and found it looks like this: Code: <p class="calibre1"><a id="p8"></a><img src="index-8_1.png" class="calibre2"/></p> <p class="calibre1"><img src="index-8_2.png" class="calibre2"/></p> <p class="calibre1"><img src="index-8_3.png" class="calibre2"/></p> <p class="calibre1"><img src="index-8_4.png" class="calibre2"/></p> So I'm assuming the conversion decided to keep the image files from the pdf that were included with each page break. There are some images in the book that I would like to keep and found that the images from the page breaks are each 100 bytes. Is there a quick way I could search and delete all of these page break images and lines of code? Maybe something like: if a line starts with Code: <p class="calibre1"><a id="p delete that whole line and the 3 lines under it. if that's possible?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Remove page break between html files	Christrick	Editor	2	07-09-2020 01:11 PM
Remove Page Break after Images	luthar28	ePub	17	04-05-2017 03:16 PM
Epub to PDF can't remove page breaks between headers	Vykan12	Calibre	11	07-25-2012 03:01 AM
PDF to HTML page break questions	michaelbr	PDF	3	01-27-2011 08:49 PM

01-18-2021, 08:39 AM	#2
Red_AM Junior Member Posts: 3 Karma: 10 Join Date: Feb 2019 Device: Android	Ah, I found a way to fix this. I opened up the epub in 7zip, sorted by size and deleted all the 100 byte images. Then opened it in Calibre's book editor and hit the "Check Book" option (which i'm guessing found all the lines of code referring to images that don't exist anymore and removing them). Albeit if anyone knows a better solution, like finding and fixing the issue during conversion, would really appreciate it if you'd let me know! Thanks!

Advert