![]() |
#1 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Nov 2010
Device: none
|
HTML to TXT and line breaks
Well, the HTML to text conversion in calibre apparently ignores the line break tags ( <br /> ), i.e. converts them to empty strings. As far as I understand, the reason for this behaviour is that far too many HTML texts tend to use the line break tags incorrectly, and this hack fixes the problem, allowing the text to reflow properly on any output device. However, I believe the <br /> tag should be converted to a space instead. Imagine the following snippet of HTML code:
cater<br />pillar Every web browser out there will show you two distinct words, but Calibre will produce a text containing only one word - "caterpillar". And this is obviously incorrect. Any opinions on this? Thanks in advance. |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
I haven't seen Calibre get rid of <br />. I think I've seen it get converted to <p></p> in some conversion pipelines, but not removed altogether.
You might want to post some example code that displays the problem you mean, or open a bug with an attached file at bugs.calibre-ebook.com for someone to look at it. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Nov 2010
Device: none
|
Actually, it is quite easy to reproduce. Try to convert the following HTML file
Code:
<html> <head></head> <body> <p>cater<br />pillar</p> </body> </html> ![]() |
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Ah, I didn't catch the fact that this was during text conversion. I would think that this is actually a bug. I doubt it has anything to do with it being mis-used in some html files.
|
![]() |
![]() |
![]() |
#5 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Looks like a bug... I'll investigate later today (afer I get home from work).
Later Edit: I've made a change that causes br tags to be replaced with spaces. I'll be pushing this change up probably tomorrow. Last edited by user_none; 12-01-2010 at 08:39 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Nov 2010
Device: none
|
Hmmm, I installed 0.7.32 but the problem still persists... Why is that?
|
![]() |
![]() |
![]() |
#7 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Removing unnecessary paragraph breaks in .txt | citac | Other formats | 2 | 10-26-2010 05:16 PM |
Chapters and page breaks in TXT files | scarab1 | Ectaco jetBook | 0 | 03-06-2010 02:08 PM |
No line breaks in TXT conversions - is it just me? | TMF | Calibre | 3 | 09-24-2009 02:46 PM |
No line breaks | ecpepper | Amazon Kindle | 3 | 08-09-2009 06:42 PM |
utility to eliminate unwanted line breaks in txt | profnachos | Workshop | 11 | 11-27-2007 06:24 PM |