Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 12-01-2010, 08:48 AM   #1
vbdasc
Junior Member
vbdasc began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: none
HTML to TXT and line breaks

Well, the HTML to text conversion in calibre apparently ignores the line break tags ( <br /> ), i.e. converts them to empty strings. As far as I understand, the reason for this behaviour is that far too many HTML texts tend to use the line break tags incorrectly, and this hack fixes the problem, allowing the text to reflow properly on any output device. However, I believe the <br /> tag should be converted to a space instead. Imagine the following snippet of HTML code:

cater<br />pillar

Every web browser out there will show you two distinct words, but Calibre will produce a text containing only one word - "caterpillar". And this is obviously incorrect. Any opinions on this? Thanks in advance.
vbdasc is offline   Reply With Quote
Old 12-01-2010, 09:53 AM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I haven't seen Calibre get rid of <br />. I think I've seen it get converted to <p></p> in some conversion pipelines, but not removed altogether.

You might want to post some example code that displays the problem you mean, or open a bug with an attached file at bugs.calibre-ebook.com for someone to look at it.
ldolse is offline   Reply With Quote
 
Enthusiast
Old 12-01-2010, 10:35 AM   #3
vbdasc
Junior Member
vbdasc began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: none
Actually, it is quite easy to reproduce. Try to convert the following HTML file

Code:
<html>
<head></head>
<body>
<p>cater<br />pillar</p>
</body>
</html>
to TXT and you'll see. I'm not opening a ticket for now because I don't think it's a bug, but rather a feature
vbdasc is offline   Reply With Quote
Old 12-01-2010, 11:00 AM   #4
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Ah, I didn't catch the fact that this was during text conversion. I would think that this is actually a bug. I doubt it has anything to do with it being mis-used in some html files.
ldolse is offline   Reply With Quote
Old 12-01-2010, 12:35 PM   #5
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,433
Karma: 950001
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Looks like a bug... I'll investigate later today (afer I get home from work).

Later Edit:

I've made a change that causes br tags to be replaced with spaces. I'll be pushing this change up probably tomorrow.

Last edited by user_none; 12-01-2010 at 08:39 PM.
user_none is offline   Reply With Quote
Old 12-04-2010, 11:32 AM   #6
vbdasc
Junior Member
vbdasc began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Nov 2010
Device: none
Hmmm, I installed 0.7.32 but the problem still persists... Why is that?
vbdasc is offline   Reply With Quote
Old 12-04-2010, 11:40 AM   #7
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,433
Karma: 950001
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by vbdasc View Post
Hmmm, I installed 0.7.32 but the problem still persists... Why is that?
Slight error on my part. I've pushed up a correction. Expect it in the next release.
user_none is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Removing unnecessary paragraph breaks in .txt citac Other formats 2 10-26-2010 05:16 PM
Chapters and page breaks in TXT files scarab1 Ectaco jetBook 0 03-06-2010 02:08 PM
No line breaks in TXT conversions - is it just me? TMF Calibre 3 09-24-2009 02:46 PM
No line breaks ecpepper Amazon Kindle 3 08-09-2009 06:42 PM
utility to eliminate unwanted line breaks in txt profnachos Workshop 11 11-27-2007 06:24 PM


All times are GMT -4. The time now is 01:37 AM.


MobileRead.com is a privately owned, operated and funded community.