Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 01-26-2017, 11:18 AM   #1
memeplex
Member
memeplex began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Jul 2013
Device: none
epub to pdf: unable to select text after conversion

I'm converting an epub file to a pdf file using ebook-convert. The output is fine but after opening it with acrobat reader I'm unable to select text, it doesn't even recognize word limits. It's as if the underlying word structure was lost in translation and now the reader just sees a stream-of-characters. Why is this so? The input epub has a notion of stream-of-words, why was it lost during conversion?

Thank you
--Carlos

Last edited by memeplex; 01-26-2017 at 11:22 AM.
memeplex is offline   Reply With Quote
Old 01-26-2017, 10:41 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Text selection works fine for me with PDFs converted by calibre. The only case I know of when it will not work is if the epub has characters in it for which a suitable font was not found on the system, which will cause the renderer to draw the text as outlines instead.

Oh and by the way, PDF is literally a stream of characters with no word, line or other semantic information. PDF viewer manage to work around that by using heuristics to guess the extent of words/lines. (The exception is tagged PDF, but those are pretty rare).
kovidgoyal is offline   Reply With Quote
Advert
Old 01-27-2017, 02:43 PM   #3
memeplex
Member
memeplex began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Jul 2013
Device: none
Ok, I see. Let me put it another way then: for the epub converted to pdf adobe acrobat (I tested it with the android app) is not showing the text selection tool that it shows for every other pdf document I ever read; the context menu just offers highlighting and drawing tools, as if there were no text content.
memeplex is offline   Reply With Quote
Old 01-27-2017, 02:48 PM   #4
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,983
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Are you sure the PDF wasn't image based?
JSWolf is offline   Reply With Quote
Old 01-27-2017, 06:48 PM   #5
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,569
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by JSWolf View Post
Are you sure the PDF wasn't image based?
@JSWolf - the OP converted an ePub to PDF, so perhaps the ePub is image based.

@memeplex - try opening the ePub in the calibre book editor (ebook-edit.exe) and have a look inside to check if it's text based. Also try creating the pdf via the calibre book viewer (ebook-viewer.exe) print function - last button on toolbar.

And maybe try another pdf reader - on Windows, PDF XChange is pretty good.

BR
BetterRed is online now   Reply With Quote
Advert
Old 01-27-2017, 08:21 PM   #6
memeplex
Member
memeplex began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Jul 2013
Device: none
@BetterRed: below is how the content of the epub looks like inside the ebook editor; it seems pretty standard text inside html paragraphs to me. Converting it using the print function of the viewer didn't solve the problem. I would like to use the acrobat reader android app on my tablet since it has very good annotation capabilities. Maybe it's not the pdf content itself but some metadata or descriptor that makes the reader think there is no text there. For every other document when I long press over a word it's selected but for the only two documents I produced with ebook-convert no word is selected and I get a menu with functions not directly related to text: note, highlight. Nevertheless text limits are somehow recognized since the highlight is done at character boundaries, it's not a generic rectangle tool. Also, other readers like evince in linux seem able to select text in both books. All this makes me think it's more a meta-thing than the contents themselves.

<p class="para" id="red0000630">The 75,000 pairs of genes that make and run the average human body find themselves in much the same position as 75,000 human beings inhabiting a small town. Just as human society is an uneasy coexistence of free enterprise and social co-operation, so is the activity of genes within a body. Without co-operation, the town would not be a community. Everybody would lie and cheat and steal his way to wealth at the expense of everybody else and all social activities – commerce, government, education, sport – would grind to a mistrustful halt. Without co-operation between the genes, the body they inhabit could not be used to transmit those genes to future generations because it would never get built.</p>
<p class="para" id="red0000631">A generation ago, most biologists would have found that paragraph baffling. Genes are not conscious and do not choose to co
memeplex is offline   Reply With Quote
Old 01-27-2017, 08:23 PM   #7
memeplex
Member
memeplex began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Jul 2013
Device: none
Would it help if I send you the epub file?
memeplex is offline   Reply With Quote
Old 01-27-2017, 09:20 PM   #8
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,569
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
ONLY post epub here if you are 100% sure its not copyright protected, otherwise post it at ==>> Bugs : calibre and with a link back to this thread and mark it private.

If you can select text with evince then I think it's a peculiarity with acrobat for android. Try another android reader, like Google's or Foxit, if they can select text then it is acrobat specific. If they can't then - hmmm.

I doubt it would be the metadata, but if you think it is, then strip content.opf <metadata> block back to the bare bones DC elements with the calibre editor and try converting again.

BR
BetterRed is online now   Reply With Quote
Old 01-27-2017, 10:27 PM   #9
memeplex
Member
memeplex began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Jul 2013
Device: none
I tried Foxit and Xodo in Android, both selected text just fine. This has to be some stupid detail that's confusing Adobe reader. Could it be the PDF 1.4 version? Other documents I'm reading are PDF 1.6.
memeplex is offline   Reply With Quote
Old 01-27-2017, 10:30 PM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Text selection works fine for the PDFs generated by calibre with adobe reader desktop. I haven't tried it with the android version, but if it is not working there it is likely a bug in that app. Report it to adobe and they will hopefully fix it or at least tell us why it isn't working. There is no way for me to guess what the android app is choking on.
kovidgoyal is offline   Reply With Quote
Old 01-27-2017, 10:59 PM   #11
memeplex
Member
memeplex began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Jul 2013
Device: none
Ok, I was playing with the pdf metadata to no f@#*ing avail. I've reported this to adobe customer service.

As a workaround I opened the generated pdf in evince and then printed it to a pdf file. Adobe reader was happy with it (and as a bonus the file is smaller).
memeplex is offline   Reply With Quote
Old 01-28-2017, 12:01 AM   #12
memeplex
Member
memeplex began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Jul 2013
Device: none
Here is another workaround as a handy bash function passing the output of ebook-convert to gs, it preserves the outline:

Code:
topdf() {
    local in=$1 tmp=${1%.*}-tmp.pdf out=${1%.*}.pdf
    ebook-convert "$in" "$tmp" --output-profile=ipad \
                               --change-justification=justify
    gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook \
       -dNOPAUSE -dBATCH -sOutputFile="$out" "$tmp"
    rm "$tmp"
}
Just run it as topdf <input-file>
memeplex is offline   Reply With Quote
Old 07-27-2018, 09:22 AM   #13
piovac
Member
piovac began at the beginning.
 
piovac's Avatar
 
Posts: 14
Karma: 10
Join Date: Jan 2014
Device: Kindle Paperwhite, Samsung Note 4, Samsung Tab S, MacBook Pro
I converted several PDFs from ebook on a Mac using default settings and the following fonts:
Serif family: PT Serif
Sans family: PT Sans
Monospace family: Courier New

I get great results and files can be displayed and annotated. The main issue is if I select and copy text in the file and paste in a editor i just get a series of tab characters, nothing else. I tried several PDF viewers with the same results. I can cut and paste from the ebook no problem.

Is it a font problem do I need to embed the fonts? Suggestions welcome.
piovac is offline   Reply With Quote
Reply

Tags
epub pdf convert


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre PDF to epub conversion changes text 'll' to 'l' twinflameskiss Conversion 3 11-15-2015 12:18 PM
Unable to select words in a PDF brennus Kobo Reader 2 09-17-2015 03:17 PM
Repeated text pdf to epub conversion magicman1223 Conversion 3 04-25-2014 02:02 PM
disjointed text in pdf to epub conversion (calibre) Janelle12 Conversion 6 05-06-2013 09:57 AM


All times are GMT -4. The time now is 06:37 PM.


MobileRead.com is a privately owned, operated and funded community.