Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 12-31-2018, 06:04 AM   #1
VcSaJen
Junior Member
VcSaJen began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2018
Device: none
pdf->epub - using idents as a cue to line-unwrapping

I'm using latest Calibre to convert pdf to epub. Using default settings, all of my paragraphs are split into multiple paragraphs. But all paragraphs are clearly indented, so I'm not sure why Calibre is having troubles. I don't want to adjust "unwrap factor", as it's relies on line ending early and therefore inexact. How to force Calibre into taking indents into account when determining paragraph breaks? If it's not possible, are there any other tool that have that option?
VcSaJen is offline   Reply With Quote
Old 12-31-2018, 12:21 PM   #2
retiredbiker
Connoisseur
retiredbiker is faster than slow light.retiredbiker is faster than slow light.retiredbiker is faster than slow light.retiredbiker is faster than slow light.retiredbiker is faster than slow light.retiredbiker is faster than slow light.retiredbiker is faster than slow light.retiredbiker is faster than slow light.retiredbiker is faster than slow light.retiredbiker is faster than slow light.retiredbiker is faster than slow light.
 
retiredbiker's Avatar
 
Posts: 66
Karma: 29896
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Ubuntu, Jutoh,Kobo Forma
Quote:
Originally Posted by VcSaJen View Post
If it's not possible, are there any other tool that have that option?
Read the sticky post about pdf conversion. As to other tools, I use "pdftotext" from the Poppler utilities. The --layout option will give you a text file with all the leading spaces and any extra linefeeds between paragraphs intact. Then a little regex will easily get you to the "real" paragraphs you want. If you are really lucky, there may be, say, 5 linefeeds at each chapter break, so you can get those with regex as well.

Assuming, of course, that what you want exists to start with. As with anything pdf, success depends on what is inside the source file. Pdftotext will at least show you what is there, and it may vary from excellent to impossible. Simple books like novels often work well with this, but if you have double columns or something complex like a science textbook, its a lot more work.
retiredbiker is offline   Reply With Quote
Old 12-31-2018, 12:26 PM   #3
VcSaJen
Junior Member
VcSaJen began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2018
Device: none
Solved it with opening pdf with MS Word, then importing docx into calibre, thus converting it into epub and then manually fixing giant margins, or wrong text-align on rare paragraphs. That preserved all italics, which are all lost if I convert pdf to plain text. Bonus points for inline images which are actually inline.

Last edited by VcSaJen; 12-31-2018 at 12:29 PM.
VcSaJen is offline   Reply With Quote
Old 12-31-2018, 02:02 PM   #4
BetterRed
null operator
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 11,836
Karma: 10633600
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@VcSaJen - There are a couple of Word addins that have tools to help deal with PDF conversions ==>> eBook Tools and TransTools.

There's some discussion on the latter in this thread from post #19 onwards.

BR
BetterRed is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF lines not unwrapping truth1ness Conversion 2 11-20-2015 12:11 AM
Line Spacing on PDF to Epub conversion poodlemama Calibre 2 05-03-2010 09:28 PM
Still having problems PDF to MOBI line unwrapping jengwen Calibre 2 04-16-2010 10:14 AM
PDF to ePub (New line problem) Dark123 Calibre 3 02-13-2010 09:41 PM
Unwrapping hard line breaks across all input formats ldolse Calibre 17 05-11-2009 12:31 AM


All times are GMT -4. The time now is 07:33 PM.


MobileRead.com is a privately owned, operated and funded community.