Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 12-17-2012, 11:23 PM   #1
MrTanquery
Junior Member
MrTanquery began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Dec 2012
Device: Kindle Paperwhite
Extra "<p>" tags when converting to AZW3 from pdf

It's a simple question but there are hours and hours of mind numbing work on the line for me if I can't solve it. I'm getting extra "<p>" tags (HTML paragraph tags) when I convert from pdf to AZW3. I'm sure the extra p tags would be there in the other formats too, it seems to be something in how Calibre is written to handle converting pdf formatting to HTML.

Is there a way to modify Calibre to only use p tags when there is an actual paragraph? Simple line wrapping should be handled automatically by the reader, as per usual.

As a matter of interest, Adobe Acrobat pro handles this properly when you ask it to save a pdf as an HTML file. Which is to say it only uses paragraph tags when there is an actual paragraph, and lets the reader handle line wrapping in between paragraphs...

Your help is greatly appreciated!
C

Last edited by MrTanquery; 12-17-2012 at 11:25 PM.
MrTanquery is offline   Reply With Quote
Old 12-18-2012, 03:12 PM   #2
fidvo
Addict
fidvo ought to be getting tired of karma fortunes by now.fidvo ought to be getting tired of karma fortunes by now.fidvo ought to be getting tired of karma fortunes by now.fidvo ought to be getting tired of karma fortunes by now.fidvo ought to be getting tired of karma fortunes by now.fidvo ought to be getting tired of karma fortunes by now.fidvo ought to be getting tired of karma fortunes by now.fidvo ought to be getting tired of karma fortunes by now.fidvo ought to be getting tired of karma fortunes by now.fidvo ought to be getting tired of karma fortunes by now.fidvo ought to be getting tired of karma fortunes by now.
 
Posts: 296
Karma: 1599870
Join Date: Jun 2012
Device: none
You've just discovered the frustration of trying to convert from PDF's. I feel your pain.

First, read the sticky, especially the section titled "Some of my paragraphs are split into multiple paragraphs".

Short answer: PDF's don't have paragraphs; they have lines of text. The information to know where one paragraph ends and another begins gets lost in the conversion to PDF, so it's not available for Calibre or any other conversion program to make use of. Some PDF's use workarounds to maintain that information (e.g. by putting blank lines between paragraphs) and therefore Calibre is able to guess where to break paragraphs. The one you're working with apparently does not.

Possible solutions include converting and manual cleanup afterward (a lot of work), using Calibre's heuristic processing to try to guess where the line breaks are (good, but not perfect), or trying to obtain the original in a different format, like epub, mobi, or html. If this is possible, I recommend it as the best solution.
fidvo is offline   Reply With Quote
Advert
Reply

Tags
calibre, extra <p> tags, pdf to html


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Missing second "l" when converting from PDF NewEreader123 Conversion 2 03-28-2011 10:55 AM
The option "--extra-css" doesn't work slex Conversion 2 02-19-2011 06:26 AM
Repeated "Ignoring missing TOC entry" when converting PDF to MOBI goldenhair Calibre 2 01-19-2011 10:30 AM
Converting PDF w/ "Calibre" Problem? federalbetrayal Calibre 4 09-28-2010 06:41 PM
Help needed converting PDF of "James Potter and the Hall of Elders' Crossing" rgodby Calibre 6 10-17-2009 12:32 AM


All times are GMT -4. The time now is 06:53 AM.


MobileRead.com is a privately owned, operated and funded community.