|12-17-2012, 11:23 PM||#1|
Join Date: Dec 2012
Device: Kindle Paperwhite
Extra "<p>" tags when converting to AZW3 from pdf
Is there a way to modify Calibre to only use p tags when there is an actual paragraph? Simple line wrapping should be handled automatically by the reader, as per usual.
As a matter of interest, Adobe Acrobat pro handles this properly when you ask it to save a pdf as an HTML file. Which is to say it only uses paragraph tags when there is an actual paragraph, and lets the reader handle line wrapping in between paragraphs...
Your help is greatly appreciated!
Last edited by MrTanquery; 12-17-2012 at 11:25 PM.
|12-18-2012, 03:12 PM||#2|
Join Date: Jun 2012
You've just discovered the frustration of trying to convert from PDF's. I feel your pain.
First, read the sticky, especially the section titled "Some of my paragraphs are split into multiple paragraphs".
Short answer: PDF's don't have paragraphs; they have lines of text. The information to know where one paragraph ends and another begins gets lost in the conversion to PDF, so it's not available for Calibre or any other conversion program to make use of. Some PDF's use workarounds to maintain that information (e.g. by putting blank lines between paragraphs) and therefore Calibre is able to guess where to break paragraphs. The one you're working with apparently does not.
Possible solutions include converting and manual cleanup afterward (a lot of work), using Calibre's heuristic processing to try to guess where the line breaks are (good, but not perfect), or trying to obtain the original in a different format, like epub, mobi, or html. If this is possible, I recommend it as the best solution.
|calibre, extra <p> tags, pdf to html|
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Missing second "l" when converting from PDF||NewEreader123||Conversion||2||03-28-2011 10:55 AM|
|The option "--extra-css" doesn't work||slex||Conversion||2||02-19-2011 06:26 AM|
|Repeated "Ignoring missing TOC entry" when converting PDF to MOBI||goldenhair||Calibre||2||01-19-2011 10:30 AM|
|Converting PDF w/ "Calibre" Problem?||federalbetrayal||Calibre||4||09-28-2010 06:41 PM|
|Help needed converting PDF of "James Potter and the Hall of Elders' Crossing"||rgodby||Calibre||6||10-17-2009 12:32 AM|