Converting from epub to pdf results in dollarsigns

paperbackebook · 09-08-2020, 06:54 AM

Hi,

I've been trying to convert an ebook in epub format to PDF so I can print some parts of it.

I've tried to use the Adobe ebook software but although it allows me to print it crashes or starts printing empty pages or only prints the first page.

This file is not print protected. I can open it properly in Calibre.
But when I convert it takes ages and the resulting PDF has on each page a whole bunch of dollar signs. These look like links and when clicked nothing happens.
There's e.g. 20 lines with 15 dollar signs on each of those lines.
This seems to repeat on all pages.

Any idea where this is coming from? Or how I can get rid of these? The content still seems to be a bit borked but I hope that will be resolved when those dollar signs are removed.

If there are other methods please let me know.

Thanks a lot in advance (this has been quite frustrating)!

Quoth · 09-08-2020, 12:36 PM

Use Calibre to convert to RTF.
Load RTF into LO Writer or MS Word and set a sensible page size, margins and fix styles.

Then print.

Only make a PDF if wanting to do POD or fixed layout on a Tablet. Word & Writer can both make PDFs. Embed fonts if printing on a different computer.

Basically there is no need for epub to PDF ever, unless you own a Sony Digital Paper. Even then Calibre -> RTF -> Wordprocessor & Fix -> Print to Digital paper

paperbackebook · 09-08-2020, 12:45 PM

Quote:

Originally Posted by Quoth

Use Calibre to convert to RTF.
Load RTF into LO Writer or MS Word and set a sensible page size, margins and fix styles.

Then print.

Only make a PDF if wanting to do POD or fixed layout on a Tablet. Word & Writer can both make PDFs. Embed fonts if printing on a different computer.

Basically there is no need for epub to PDF ever, unless you own a Sony Digital Paper. Even then Calibre -> RTF -> Wordprocessor & Fix -> Print to Digital paper

Thanks for the suggestion.
I'll try that.

What odd is that when I open the xhtml files everything is rendered correctly. Then when printing to PDF the content is borked.
And now I also tried opening those files in the browser and printing from there. Again the content is borked in another way.
So why the *** is it impossible to just print what is rendered...

paperbackebook · 09-08-2020, 01:05 PM

And when I print from Firefox the layout is good however it prints 13 pages. 1 page with the contents and then 12 with headers and footers and after disabling those 12 empty pages :s What a sh**show.
Incredible that we can put a man on the moon 60 years ago but still can't get a normal print :s

paperbackebook · 09-08-2020, 01:49 PM

In RTF it actually worse. The text is parsed correctly but the layout is completely gone.

Deskisamess · 09-08-2020, 01:52 PM

Quote:

Originally Posted by paperbackebook

In RTF it actually worse. The text is parsed correctly but the layout is completely gone.

Can't you fix the layout once the file is in Word? Or at least fix the parts you want to print.

How about doing simple screen grabs of the bits you want to print?

JSWolf · 09-08-2020, 03:41 PM

Convert to HTMLZ and then unzip the contents and load the Index.html file into Word. Then print the bits you want.

Another solution is to unzip the ePub and load the HTML files into Word that you want to print from

BetterRed · 09-08-2020, 05:26 PM

@paperbackebook - Why convert the PDF to anything, PDF is the worse format to convert to EPUB or most anything else.

Have you tried using a different PDF viewer to Acrobat, not a browser based one like firefox or edge, they're half-baked. I switched to PDF-XChange because of Acrobat frustrations, this is its print dialogue.

Click image for larger version

Name: Annotation 2020-09-09 065631.jpg
Views: 224
Size: 194.2 KB
ID: 181835

Or, if you have a recent edition of Word (2013 or later I think) try opening the PDF directly in it and print from there. It can produce surprisingly good results for some PDFs. On large PDFs it may run for a while (as in 10-20 minutes) and then fail, gracefully though.

IMO, conversion of PDF to HTML, or its derivatives, should only be done when all else fails.

BR

paperbackebook · 09-09-2020, 03:23 AM

Quote:

Originally Posted by Deskisamess

Can't you fix the layout once the file is in Word? Or at least fix the parts you want to print.

How about doing simple screen grabs of the bits you want to print?

Some of the issues are "minor" like it all of a sudden switches to a smaller font.
Sometimes it's a sentence that goes from normal to what seems words in subscript and superscript.
Others are that between some sentences all of a sudden the "margin" between the previous one and this one is too small so they partly overlap.

There's 230 pages of these. So even if I only fix these easy ones it'll still cost me a big chunk of time.

Others are that 2 sentences are squashed together. So the characters from one sentence is merged with the other sentence and the whole sentence is gibberish. In that case I would have to go to the ebook and retype those sentences. From a software point of view I don't get how this happens. The sentences are read through the xhtml and the sentences are clearly seperate. So how Word manages to completely mangles those sentences is beyond me.

I appreciate the suggestions but this is not workable.

paperbackebook · 09-09-2020, 03:32 AM

Quote:

Originally Posted by JSWolf

Convert to HTMLZ and then unzip the contents and load the Index.html file into Word. Then print the bits you want.

Another solution is to unzip the ePub and load the HTML files into Word that you want to print from

I indeed did something like this yesterday. But not using Word. xhtml is html so the browser should be able to properly do it.

So one approach that works, which is time consuming though, is:

- extract the epub file
- change the printer settings in firefox to not print footers and headers
- open the xhtml file in firefox (firefox only: edge, IE, Chrome mess up printing a simple file they render correctly)
- select "pages" so it only prints the first page instead of an extra 12 empty pages
- print
- repeat a gazillion times

paperbackebook · 09-09-2020, 03:38 AM

Quote:

Originally Posted by BetterRed

@paperbackebook - Why convert the PDF to anything, PDF is the worse format to convert to EPUB or most anything else.

Have you tried using a different PDF viewer to Acrobat, not a browser based one like firefox or edge, they're half-baked. I switched to PDF-XChange because of Acrobat frustrations, this is its print dialogue.

Attachment 181835

Or, if you have a recent edition of Word (2013 or later I think) try opening the PDF directly in it and print from there. It can produce surprisingly good results for some PDFs. On large PDFs it may run for a while (as in 10-20 minutes) and then fail, gracefully though.

IMO, conversion of PDF to HTML, or its derivatives, should only be done when all else fails.

BR

You've got it the other way around

I want to print an epub document.
Apparently the Adobe Digital Editions that opens the epub can't print properly. So the idea was to convert the epub to PDF and then print that.

It's just bizarre that it can render those pages properly in whatever browser or epub viewer, however once I print they all of a sudden forgot how to render the page and completely screw it up.

Also the conversion takes ages. The only thing that seemed to work a bit quick and mediocre was PDF Candy software.
I also had to switch to the 64bit version of Calibre since it ran out of memory.

So if anyone knows about a decent xhtml viewer/epub viewer that can do a simple print of what is rendered on screen that would be great.

JSWolf · 09-09-2020, 04:48 AM

Quote:

Originally Posted by paperbackebook

I indeed did something like this yesterday. But not using Word. xhtml is html so the browser should be able to properly do it.

So one approach that works, which is time consuming though, is:

- extract the epub file
- change the printer settings in firefox to not print footers and headers
- open the xhtml file in firefox (firefox only: edge, IE, Chrome mess up printing a simple file they render correctly)
- select "pages" so it only prints the first page instead of an extra 12 empty pages
- print
- repeat a gazillion times

That also is a good way to do it so it prints properly.

paperbackebook · 09-09-2020, 05:20 AM

Quote:

Originally Posted by JSWolf

That also is a good way to do it so it prints properly.

I wouldn't call that a good way

rather desperation kicking in

I had hoped there would be a simpler way since every reader and browser can render it correctly.
Just a simple print button or ctrl-p that prints what is rendered in calibre (without pdf conversion, ...) would probably already do the trick (although I know it's probably a lot more complex than that).

However what it does is create a PDF which takes an hour or more and then the content is mangled.

Sorry for the rants

just trying to wrap my head around this decades-long printing battle and how it's still not won.

paperbackebook · 09-09-2020, 09:05 AM

Continuing this sage...

I have looked into the xhtml and made a small reproducer.

What I notice is that every word in the book has a seperate span tag with absolute positioning. Is this accepted as normal in the ebook world? It looks ridiculous to me.

When I look at those positions in the span tag I see high numbers pixels. Example:
style="position:absolute;top:6113.53px;left:4640px ;letter-spacing:-1.29px;"

The decimal pixel values feel ridiculous tbh. Between each line there's 340px. (I removed the other lines). Yet some lines render properly and others seem to be displaced.
When looking at the page the words and lines are normally spaced. So nowhere near 340px diff between lines.
However the catch is this general div style:
transform: scale(0.05)
And the font size is 300px. Yes 300px.

So it seems during this scaling something goes wrong when printing. Still weird why it would render properly and then completely mess up though.

paperbackebook · 09-09-2020, 09:12 AM

I was wondering how it would react to the scale being removed so I did the following test:

Divided the font size by 20 (since scale is 0.05)
Removed the scale part
Calculated the px/20 of each span tag (2 span tags in my test case)

Guess what... the text is shown properly in the print dialog.
Why they use this transform I have no idea.

09-08-2020, 06:54 AM	#1
paperbackebook Member Posts: 11 Karma: 10 Join Date: Sep 2020 Device: none	Converting from epub to pdf results in dollarsigns Hi, I've been trying to convert an ebook in epub format to PDF so I can print some parts of it. I've tried to use the Adobe ebook software but although it allows me to print it crashes or starts printing empty pages or only prints the first page. This file is not print protected. I can open it properly in Calibre. But when I convert it takes ages and the resulting PDF has on each page a whole bunch of dollar signs. These look like links and when clicked nothing happens. There's e.g. 20 lines with 15 dollar signs on each of those lines. This seems to repeat on all pages. Any idea where this is coming from? Or how I can get rid of these? The content still seems to be a bit borked but I hope that will be resolved when those dollar signs are removed. If there are other methods please let me know. Thanks a lot in advance (this has been quite frustrating)!

09-08-2020, 05:26 PM	#8
BetterRed null operator (he/him) Posts: 22,693 Karma: 33011292 Join Date: Mar 2012 Location: Sydney Australia Device: none	@paperbackebook - Why convert the PDF to anything, PDF is the worse format to convert to EPUB or most anything else. Have you tried using a different PDF viewer to Acrobat, not a browser based one like firefox or edge, they're half-baked. I switched to PDF-XChange because of Acrobat frustrations, this is its print dialogue. Or, if you have a recent edition of Word (2013 or later I think) try opening the PDF directly in it and print from there. It can produce surprisingly good results for some PDFs. On large PDFs it may run for a while (as in 10-20 minutes) and then fail, gracefully though. IMO, conversion of PDF to HTML, or its derivatives, should only be done when all else fails. BR Last edited by BetterRed; 09-08-2020 at 07:30 PM. Reason: clarity

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Converting to EPUB results in files that Calibre can't read	Sheeba	Calibre	7	01-20-2020 08:34 PM
Help! Converting from epub to mobi results in loss of all images and font data	Spankeh	Conversion	12	10-17-2019 07:45 PM
Converting AZW3 book to PDF results in messed up format	NeonHD	Conversion	3	07-24-2017 04:22 AM
Converting epub to epub results in 2 pages in book	deback	Conversion	13	01-31-2016 03:06 PM
Converting Microsoft Word documents to PDF for the eDGe (with good results)	borisb	enTourage Archive	1	10-22-2010 01:31 PM

09-08-2020, 12:36 PM	#2
Quoth Still reading Posts: 15,506 Karma: 114630155 Join Date: Jun 2017 Location: Ireland Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper	Use Calibre to convert to RTF. Load RTF into LO Writer or MS Word and set a sensible page size, margins and fix styles. Then print. Only make a PDF if wanting to do POD or fixed layout on a Tablet. Word & Writer can both make PDFs. Embed fonts if printing on a different computer. Basically there is no need for epub to PDF ever, unless you own a Sony Digital Paper. Even then Calibre -> RTF -> Wordprocessor & Fix -> Print to Digital paper

09-08-2020, 01:05 PM	#4
paperbackebook Member Posts: 11 Karma: 10 Join Date: Sep 2020 Device: none	And when I print from Firefox the layout is good however it prints 13 pages. 1 page with the contents and then 12 with headers and footers and after disabling those 12 empty pages :s What a sh**show. Incredible that we can put a man on the moon 60 years ago but still can't get a normal print :s

09-08-2020, 01:49 PM	#5
paperbackebook Member Posts: 11 Karma: 10 Join Date: Sep 2020 Device: none	In RTF it actually worse. The text is parsed correctly but the layout is completely gone.

09-08-2020, 03:41 PM	#7
JSWolf Resident Curmudgeon Posts: 83,625 Karma: 153646251 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3	Convert to HTMLZ and then unzip the contents and load the Index.html file into Word. Then print the bits you want. Another solution is to unzip the ePub and load the HTML files into Word that you want to print from

09-09-2020, 09:05 AM	#14
paperbackebook Member Posts: 11 Karma: 10 Join Date: Sep 2020 Device: none	Continuing this sage... I have looked into the xhtml and made a small reproducer. What I notice is that every word in the book has a seperate span tag with absolute positioning. Is this accepted as normal in the ebook world? It looks ridiculous to me. When I look at those positions in the span tag I see high numbers pixels. Example: style="position:absolute;top:6113.53px;left:4640px ;letter-spacing:-1.29px;" The decimal pixel values feel ridiculous tbh. Between each line there's 340px. (I removed the other lines). Yet some lines render properly and others seem to be displaced. When looking at the page the words and lines are normally spaced. So nowhere near 340px diff between lines. However the catch is this general div style: transform: scale(0.05) And the font size is 300px. Yes 300px. So it seems during this scaling something goes wrong when printing. Still weird why it would render properly and then completely mess up though.

09-09-2020, 09:12 AM	#15
paperbackebook Member Posts: 11 Karma: 10 Join Date: Sep 2020 Device: none	I was wondering how it would react to the scale being removed so I did the following test: Divided the font size by 20 (since scale is 0.05) Removed the scale part Calculated the px/20 of each span tag (2 span tags in my test case) Guess what... the text is shown properly in the print dialog. Why they use this transform I have no idea.