Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 05-25-2010, 10:55 AM   #1
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Conversion: EPUB to TXT

I've got an EPUB that is filled with paragraphs like this:
Code:
<p>Epiphany Sunday of the new year of 1801 had been on the fourth of
January, and the next Term for King's Bench trials had, therefore,
waited to open on the seventh, with all the theatre, majesty, and
circumstance of which England was capable.</p>
Notice that the "ofJanuary" has no space between "of" and "January," but it does have a new line (0x0A). The same thing happens after "therefore," and "and". As I understand it, the new line character is seen as white space, and that's a word separator, so the EPUB displays correctly, putting a space between those words, not a line break. However, the conversion to .txt format (and rtf) does not seem to do that. All the words at the line ends have the new line characters stripped, but they aren't replaced by spaces, so I see lots of two-words combined into one-word form.

Is there a setting to fix this? A bug?

Thanks.

Last edited by Starson17; 05-25-2010 at 11:30 AM.
Starson17 is offline   Reply With Quote
Old 05-25-2010, 11:23 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Sounds like a bug, opena ticket and assign it to either TXT output or rtf output.
kovidgoyal is online now   Reply With Quote
Old 05-25-2010, 11:29 AM   #3
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
Sounds like a bug
Thanks. I'll investigate a bit more, and perhaps be able to post some code with the ticket, or at least some pointers to where I think a change should be made. I haven't looked at the conversion code much. I always prefer studying the code when it will help me fix or improve something that's affecting me.
Starson17 is offline   Reply With Quote
Old 05-25-2010, 11:34 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by Starson17 View Post
Thanks. I'll investigate a bit more, and perhaps be able to post some code with the ticket, or at least some pointers to where I think a change should be made. I haven't looked at the conversion code much. I always prefer studying the code when it will help me fix or improve something that's affecting me.
Tickets that help me find and fix the problem are my favourite kind
kovidgoyal is online now   Reply With Quote
Old 05-25-2010, 07:05 PM   #5
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
newlines are replaced with space when generating TXT output... I will look into why this is happening. And I will need a sample because I cannot recreate this behavior.

The basic idea behind text output is take the XHTML, and replace all newlines with spaces. Then start reading though the XHTML string that is all one line pulling out text and adding newlines based on tags that represent paragraph breaks.
user_none is offline   Reply With Quote
Old 05-25-2010, 09:01 PM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by user_none View Post
newlines are replaced with space when generating TXT output... I will look into why this is happening. And I will need a sample because I cannot recreate this behavior.

The basic idea behind text output is take the XHTML, and replace all newlines with spaces. Then start reading though the XHTML string that is all one line pulling out text and adding newlines based on tags that represent paragraph breaks.
You're going to deprive me of all the fun of figuring this out.
I'm currently trying to recover from some sort of constant reboot problem - will probably restore yesterday's backup. I assume the sample I posted from the EPUB wasn't enough to reproduce it? Oops - got to go, it just threw the pre - reboot error - hope this gets posted .....
Starson17 is offline   Reply With Quote
Old 05-29-2010, 07:29 AM   #7
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by Starson17 View Post
You're going to deprive me of all the fun of figuring this out.
At this point you might have to because I can't reproduce it.

Quote:
Originally Posted by Starson17 View Post
I assume the sample I posted from the EPUB wasn't enough to reproduce it?
Nope. Converted just fine for me.

Quote:
Originally Posted by Starson17 View Post
I'm currently trying to recover from some sort of constant reboot problem - will probably restore yesterday's backup. ...
Oops - got to go, it just threw the pre - reboot error - hope this gets posted .....
Could this be related?
user_none is offline   Reply With Quote
Old 05-29-2010, 11:19 AM   #8
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by user_none View Post
At this point you might have to because I can't reproduce it.
Could you try again, but convert to rtf, not txt? I can't reproduce the problem in txt, but can in rtf when converting from a test zip file and from the original epub. The test zip is just a basic html file added to Calibre having a single paragraph <p> with <html>, <title> and <body> tags. Make sure the breaks in the para are just 0x0A.

Debug during conversion shows no problem at any of the 4 intermediate output stages, so I assume it would be in the rtf output stage.

Quote:
Could this be related?
No. I spent last night doing an image restore of my entire computer to last week's status and this morning putting back some settings that had changed. The computer's stable again. (Something bad happened when upgrading AVG antivirus - it's now gone.)
Starson17 is offline   Reply With Quote
Old 05-29-2010, 12:04 PM   #9
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by Starson17 View Post
Could you try again, but convert to rtf, not txt?
Verified and fix committed to driver-dev.
user_none is offline   Reply With Quote
Old 05-29-2010, 12:08 PM   #10
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by user_none View Post
Verified and fix committed to driver-dev.
Do you mind telling me what change you made? I'd just like to get familiar with how the conversion code works.
Starson17 is offline   Reply With Quote
Old 05-29-2010, 12:18 PM   #11
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by Starson17 View Post
Do you mind telling me what change you made? I'd just like to get familiar with how the conversion code works.
Sure.

I've attached my changes. Basically all I changed was to read the XHTML into a string and replace all newline characters (windows, unix, and old mac) with a space.

The conversion pipeline has some basic requirements for passing data between the various parts. But once you get it into a stage it allows you to do pretty much whatever you want for actually processing the data. You'll notice a lot of similarities between some of the formats I've created and vast differences between other ones. Feel free to email me if you want/need a more in depth explanation (john AT nachtimwald.com).
Attached Files
File Type: txt rtf-fix.txt (1.5 KB, 258 views)
user_none is offline   Reply With Quote
Old 05-29-2010, 12:31 PM   #12
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by user_none View Post
Sure.
Thanks (and sorry for misleading you on the txt format - I could have sworn I saw it there too ...)
Starson17 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Preserving <br /> on epub -> txt conversion billingd Calibre 1 08-11-2010 06:24 AM
TXT conversion to ePub or LRF - paragraph formatting Zapped Calibre 6 10-23-2009 05:06 PM
HTML to TXT conversion alkr Calibre 3 10-02-2009 09:54 AM
Batch conversion of txt BlackVoid Sony Reader 8 11-17-2007 09:53 PM
conversion - pdf to txt? fishcube Sony Reader 1 10-24-2007 02:02 PM


All times are GMT -4. The time now is 03:17 AM.


MobileRead.com is a privately owned, operated and funded community.