Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 06-04-2011, 03:57 AM   #1
Hammerwell
Junior Member
Hammerwell began at the beginning.
 
Hammerwell's Avatar
 
Posts: 9
Karma: 10
Join Date: Jun 2011
Location: Germany
Device: PRS-505
Question large .txt file conversions [solved]

Hello,

I got some longer books as .txt-files. These are quite large (1.5MB or more). Calibre can convert them to my preferred format (epub or lrf) but stops at a point. The remaining text will be discarded. Only a partial text is converted.
Is this a limitation of the file format? I think not since I have bought some files originally in lrf that are even larger.
Is there a point in the Calibre-Options I overlooked - or something else?
I would really appreciate your help, since I would like to read the whole books.

Hammerwell
PRS-505

Last edited by Hammerwell; 06-13-2011 at 09:41 AM. Reason: solved
Hammerwell is offline   Reply With Quote
Old 06-04-2011, 09:16 AM   #2
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by Hammerwell View Post
I got some longer books as .txt-files. These are quite large (1.5MB or more). Calibre can convert them to my preferred format (epub or lrf) but stops at a point. The remaining text will be discarded. Only a partial text is converted.
Is this a limitation of the file format?
No, this should not be happening. The only way you should get a partial conversion if your computer runs out of memory and the conversion fails to finish. Typically in that case the output file will never be created.

Quote:
Originally Posted by Hammerwell View Post
I think not since I have bought some files originally in lrf that are even larger.
Is there a point in the Calibre-Options I overlooked - or something else?
I would really appreciate your help, since I would like to read the whole books.
If someone added an option to only convert part of a document and discard the rest I will have a very unfriendly chat with them...

Please open a ticket at https://bugs.launchpad.net/calibre . Attach the file you are having trouble converting. Also, do a conversion and attach the conversion log. Bottom right of the window, where it says jobs, click it. select the job, click details.

This way I can look into what's happening, it's easier to track issues using the bug tracker than here, and others can search for the issue if they run into it.
user_none is offline   Reply With Quote
Advert
Old 06-04-2011, 10:17 AM   #3
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,939
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Hammerwell View Post
Hello,

I got some longer books as .txt-files. These are quite large (1.5MB or more). Calibre can convert them to my preferred format (epub or lrf) but stops at a point. The remaining text will be discarded. Only a partial text is converted.
Is this a limitation of the file format? I think not since I have bought some files originally in lrf that are even larger.
Is there a point in the Calibre-Options I overlooked - or something else?
I would really appreciate your help, since I would like to read the whole books.

Hammerwell
PRS-505
You might have a stray EndOfFile mark in the file.
Can you see the parts of the book using the Notepad Editor?
theducks is online now   Reply With Quote
Old 06-04-2011, 02:53 PM   #4
Hammerwell
Junior Member
Hammerwell began at the beginning.
 
Hammerwell's Avatar
 
Posts: 9
Karma: 10
Join Date: Jun 2011
Location: Germany
Device: PRS-505
Thank you all for your reply.
Theducks had an interesting suggestion, so I played a little with the files. I used Open Office to convert one text file to .odt and tried to convert this with calibre to .epub. This went without a problem. In a next step I saved the .odt as .txt using OOo. The conversion of this .txt also failed to complete - but at a different point.
The log said there is a error: "XMLSyntaxError: error parsing attribute name, line 2310, column 12" in the original .txt and an other error in the .txt converted over .odt:
"XMLSyntaxError: Opening and ending tag mismatch: p line 1154 and li, line 1156, column 9"

I would like to open an ticket, but unfortunately I an not free to publicize the files. Would it be enough if I copied the sentence around the unexpected end in a new text file and send these two? I believe these would not rise unwanted problems. I would also send the logs of the two convertions from .txt and the one from .odt too.

Hammerwell
Hammerwell is offline   Reply With Quote
Old 06-04-2011, 02:58 PM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,939
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Hammerwell View Post
Thank you all for your reply.
Theducks had an interesting suggestion, so I played a little with the files. I used Open Office to convert one text file to .odt and tried to convert this with calibre to .epub. This went without a problem. In a next step I saved the .odt as .txt using OOo. The conversion of this .txt also failed to complete - but at a different point.
The log said there is a error: "XMLSyntaxError: error parsing attribute name, line 2310, column 12" in the original .txt and an other error in the .txt converted over .odt:
"XMLSyntaxError: Opening and ending tag mismatch: p line 1154 and li, line 1156, column 9"

I would like to open an ticket, but unfortunately I an not free to publicize the files. Would it be enough if I copied the sentence around the unexpected end in a new text file and send these two? I believe these would not rise unwanted problems. I would also send the logs of the two convertions from .txt and the one from .odt too.

Hammerwell
You can make the files private to the Developer as part of the bug report.

Does the file contain something that 'looks like' a tag but really isn't
theducks is online now   Reply With Quote
Advert
Old 06-04-2011, 03:11 PM   #6
Hammerwell
Junior Member
Hammerwell began at the beginning.
 
Hammerwell's Avatar
 
Posts: 9
Karma: 10
Join Date: Jun 2011
Location: Germany
Device: PRS-505
Quote:
Originally Posted by theducks View Post
You can make the files private to the Developer as part of the bug report.
I don't know if this is enough. I do not want to wake something.

Quote:
Originally Posted by theducks View Post
Does the file contain something that 'looks like' a tag but really isn't
How looks a tag like?

One ends as following in the epub:
###########snip#######
on her thoughts...

"... Manager?"

""

"...="" pet="" then="">"
###########snip#######
The original reads as follows:
###########snip#######
played on her thoughts...
"<... Manager?>"
"<Of course. She is her manager.>"
"<She isn't.... isn't a ... pet then?>"
"<Um... no. She operates the colosseum.
###########snip#######

Does this help?

Hammerwell
Hammerwell is offline   Reply With Quote
Old 06-04-2011, 05:06 PM   #7
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,939
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Hammerwell View Post
I don't know if this is enough. I do not want to wake something.



How looks a tag like?

One ends as following in the epub:
###########snip#######
on her thoughts...

"... Manager?"

""

"...="" pet="" then="">"

###########snip#######
The original reads as follows:
###########snip#######
played on her thoughts...
"<... Manager?>"
"<Of course. She is her manager.>"
"<She isn't.... isn't a ... pet then?>"
"<Um... no. She operates the colosseum.
###########snip#######

Does this help?

Hammerwell
RED items (barely)looks like a tag
Opening start symbol tag <
Closing start symbol tag >

Confusion
theducks is online now   Reply With Quote
Old 06-04-2011, 05:11 PM   #8
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by theducks View Post
You can make the files private to the Developer as part of the bug report.
Private bugs are not publicly viewable. You would open the bug (don't attach any files). Then go into the bug and on the upper right you can change the visibility to private. Then you would attach the files. As theducks said this would only allow you, me and a few other calibre developers to have access to the file.

Quote:
Originally Posted by Hammerwell
Would it be enough if I copied the sentence around the unexpected end in a new text file and send these two?
Unfortunately no. TXT files are converted to HTML. Also, TXT conversion by default enables some heuristics that are applied and manipulate the HTML. This is then used to convert to your desired output format. The issue is most likely in the heuristics. Only the entire file would allow for me to determine why it is causing malformed output if that is indeed the issue.
user_none is offline   Reply With Quote
Old 06-05-2011, 04:51 AM   #9
Hammerwell
Junior Member
Hammerwell began at the beginning.
 
Hammerwell's Avatar
 
Posts: 9
Karma: 10
Join Date: Jun 2011
Location: Germany
Device: PRS-505
Quote:
Originally Posted by theducks View Post
RED items (barely)looks like a tag
Opening start symbol tag <
Closing start symbol tag >

Confusion
OK, I replaced the angled brackets with other symbols. This did the trick! It seems that the angled brackets where misinterpreted at some point. For me this solves the problem with a little manual help.
The automatics in the heuristics may be a problem since they have to be way more sophisticated.
Could maybe the change from auto to plain in the formating txt input options solve this little problem automatically?

Edit: It seems not. I can change the txt-formating options as I like, the conversion log always says "Auto detected formatting as markdown".
Hm, this was apparently not the sole problem. In the log from the odt->txt-->epub conversion it says textile.


Hammerwell
Attached Files
File Type: txt conversion logs.txt (15.0 KB, 284 views)

Last edited by Hammerwell; 06-05-2011 at 05:20 AM. Reason: tried a little converting
Hammerwell is offline   Reply With Quote
Old 06-05-2011, 07:23 AM   #10
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by Hammerwell View Post
Edit: It seems not. I can change the txt-formating options as I like, the conversion log always says "Auto detected formatting as markdown".
Somewhere it's not being set properly. You will want to set the "Formatting style" option for TXT Input to either plain or heuristic. Be sure you're setting it in the conversion dialog. The settings in preferences are over ridden by the conversion dialog settings. If you do one conversion the settings from the conversion dialog are saved and are then used over the global settings.

Markdown and Textile formatting both allow for embedded html. When it detects the document as either of those it keeps the < and > as is so later they throw off the HTML parser. Plain and Heuristic formatting change < and > into entities so they are not interpreted as tags later.
user_none is offline   Reply With Quote
Old 06-13-2011, 07:20 AM   #11
Hammerwell
Junior Member
Hammerwell began at the beginning.
 
Hammerwell's Avatar
 
Posts: 9
Karma: 10
Join Date: Jun 2011
Location: Germany
Device: PRS-505
Sorry for the late answer, got other things to do.

Quote:
Originally Posted by user_none View Post
Somewhere it's not being set properly. You will want to set the "Formatting style" option for TXT Input to either plain or heuristic. Be sure you're setting it in the conversion dialog. The settings in preferences are over ridden by the conversion dialog settings. If you do one conversion the settings from the conversion dialog are saved and are then used over the global settings.
Oh, goo to know. I set this only in the preferences. Where is the use of preferences if they are not used as preferences?
I will try this at the next occurrence. Since I now know about the solution, thanks to your help, this should be no problem.

Quote:
Originally Posted by user_none View Post
Markdown and Textile formatting both allow for embedded html. When it detects the document as either of those it keeps the < and > as is so later they throw off the HTML parser. Plain and Heuristic formatting change < and > into entities so they are not interpreted as tags later.
Thanks for this clarification.

Thank you both for this help.

This problem is solved and the thread can be closed.
[edit] Hm, no "SOLVED"-Button anywhere. Did I overlook it?[/edit]

Last edited by Hammerwell; 06-13-2011 at 07:24 AM.
Hammerwell is offline   Reply With Quote
Old 06-13-2011, 08:43 AM   #12
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
When you edit the post you should be able to edit the title. Put solved in the title.
user_none is offline   Reply With Quote
Old 06-13-2011, 09:47 AM   #13
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,939
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Hammerwell View Post
Oh, goo to know. I set this only in the preferences. Where is the use of preferences if they are not used as preferences?
When they are used for the INITIAL conversion.
Changes made on the conversion screen, fine tune the overall preferences to what works for the specific book.

Really $%^& the book conversion setting?:
There is the button on the bottom that clears the Saved mess and restores it to the current preference setting.

Quote:
I will try this at the next occurrence. Since I now know about the solution, thanks to your help, this should be no problem.

Thanks for this clarification.

Thank you both for this help.

This problem is solved and the thread can be closed.
[edit] Hm, no "SOLVED"-Button anywhere. Did I overlook it?[/edit]
No button
theducks is online now   Reply With Quote
Old 06-13-2011, 09:48 AM   #14
Hammerwell
Junior Member
Hammerwell began at the beginning.
 
Hammerwell's Avatar
 
Posts: 9
Karma: 10
Join Date: Jun 2011
Location: Germany
Device: PRS-505
Quote:
Originally Posted by user_none View Post
When you edit the post you should be able to edit the title. Put solved in the title.
Too obvious.

Have a nice Pentecost. The WGT* was nice.

* http://www.wave-gotik-treffen.de/english/programm.php
Hammerwell is offline   Reply With Quote
Reply

Tags
angled brackets, large file, txt conversion


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Large file convert ejacevich Calibre 2 09-29-2010 08:51 PM
all file conversions now failing moransami Calibre 2 08-07-2010 06:23 PM
How can i convert HTML or txt file to EPUB file ? guguqiaqia ePub 7 05-28-2010 09:15 PM
LARGE pdf file taildragger-j3 Sony Reader 3 03-12-2010 08:48 AM
No line breaks in TXT conversions - is it just me? TMF Calibre 3 09-24-2009 02:46 PM


All times are GMT -4. The time now is 06:31 PM.


MobileRead.com is a privately owned, operated and funded community.