![]() |
#1 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Conversion: EPUB to TXT
I've got an EPUB that is filled with paragraphs like this:
Code:
<p>Epiphany Sunday of the new year of 1801 had been on the fourth of January, and the next Term for King's Bench trials had, therefore, waited to open on the seventh, with all the theatre, majesty, and circumstance of which England was capable.</p> Is there a setting to fix this? A bug? Thanks. Last edited by Starson17; 05-25-2010 at 11:30 AM. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,168
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Sounds like a bug, opena ticket and assign it to either TXT output or rtf output.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Thanks. I'll investigate a bit more, and perhaps be able to post some code with the ticket, or at least some pointers to where I think a change should be made. I haven't looked at the conversion code much. I always prefer studying the code when it will help me fix or improve something that's affecting me.
|
![]() |
![]() |
![]() |
#4 | |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,168
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
![]() |
|
![]() |
![]() |
![]() |
#5 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
newlines are replaced with space when generating TXT output... I will look into why this is happening. And I will need a sample because I cannot recreate this behavior.
The basic idea behind text output is take the XHTML, and replace all newlines with spaces. Then start reading though the XHTML string that is all one line pulling out text and adding newlines based on tags that represent paragraph breaks. |
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
![]() I'm currently trying to recover from some sort of constant reboot problem - will probably restore yesterday's backup. I assume the sample I posted from the EPUB wasn't enough to reproduce it? Oops - got to go, it just threw the pre - reboot error - hope this gets posted ..... |
|
![]() |
![]() |
![]() |
#7 | ||
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Quote:
Quote:
Could this be related? |
||
![]() |
![]() |
![]() |
#8 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Could you try again, but convert to rtf, not txt? I can't reproduce the problem in txt, but can in rtf when converting from a test zip file and from the original epub. The test zip is just a basic html file added to Calibre having a single paragraph <p> with <html>, <title> and <body> tags. Make sure the breaks in the para are just 0x0A.
Debug during conversion shows no problem at any of the 4 intermediate output stages, so I assume it would be in the rtf output stage. Quote:
|
|
![]() |
![]() |
![]() |
#9 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
|
![]() |
![]() |
![]() |
#10 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
![]() |
![]() |
![]() |
#11 | |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Quote:
I've attached my changes. Basically all I changed was to read the XHTML into a string and replace all newline characters (windows, unix, and old mac) with a space. The conversion pipeline has some basic requirements for passing data between the various parts. But once you get it into a stage it allows you to do pretty much whatever you want for actually processing the data. You'll notice a lot of similarities between some of the formats I've created and vast differences between other ones. Feel free to email me if you want/need a more in depth explanation (john AT nachtimwald.com). |
|
![]() |
![]() |
![]() |
#12 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Preserving <br /> on epub -> txt conversion | billingd | Calibre | 1 | 08-11-2010 06:24 AM |
TXT conversion to ePub or LRF - paragraph formatting | Zapped | Calibre | 6 | 10-23-2009 05:06 PM |
HTML to TXT conversion | alkr | Calibre | 3 | 10-02-2009 09:54 AM |
Batch conversion of txt | BlackVoid | Sony Reader | 8 | 11-17-2007 09:53 PM |
conversion - pdf to txt? | fishcube | Sony Reader | 1 | 10-24-2007 02:02 PM |