Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 11-02-2010, 12:55 PM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
rtf to mobi - spurious word splits appearing

I have a long book in rtf which I am concurrently reading on Kindle & editing via Word + calibre conversion to improve paragraph structure & spelling. I guess it was originally a PDF.

now, only after upgrading to 7.2.26 of calibre I'm seeing occasional word split strangeness i.e.

" a sentence which looks fine in word"

will , on conversion, become

" a sente

nce that does not look fine in mobi"

i.e. the calibre conversion infrequently split a word that is not split in the rtf, and will also add a blank line, as per the made up example above . I am at a loss to explain this. I see it both on Kindle & in calibre's own mobi reader, so it is not a kindle bug. it is infrequent, say one word per 1000 is affected, but it is still annoying. I have turned off optional all structure detection processing options for the conversion

I have also tried select all + copy to new blank doc + save again as .rtf in Word, in case mutilple edits were affecting the source file but that has not fixed anything. The text flows fine in word with no strange characters, and the words that are being split do not show up as issues in word spell / grammar check

If I go to the .rtf, /delete retype the broken word & reconvert, that fixes that word, but another random split will occur later on.

I've been working on & reading this for a couple of weeks now & only started seeing this issue yesterday.

any thought / clues on what to look for ?

did the latest release make any changes to how rtf to mobi works ?
cybmole is offline   Reply With Quote
Old 11-02-2010, 01:03 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
open a ticket and attach the RTF
kovidgoyal is offline   Reply With Quote
Advert
Old 11-02-2010, 03:39 PM   #3
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by kovidgoyal View Post
open a ticket and attach the RTF
i'd like to confirm that its actually a calibre problem & that I can reproduce it consistently before opening a ticket.

my hunch is that I've made so many edits/changes to the .rtf source that its internal structure has become suspect, or overcomplicated.

as an additional test, I asked calibre to convert the same file from rtf to txt, & I see some infrequent, spurious word splits also in the .txt output. ( but If I use word to save it as txt, then it looks OK )

what do calibre rtf to txt & rtf to mobi have in common, do they use a common front end ?

I don't know how the innards of rtf files work, after they are edited, but this could well be a microsoft problem. Will the rtf contain complex chains of pointers to changes - even after performing: select all - copy - paste into a new file ?

PS if I were to open a ticket ( link please ) his is a big file , over 1MB, so would I be able to attach it ?

PPS summarising why it may be a calibre issue:
I began this process with a pdf source which, after conversion, had the usual issue i.e. line breaks in annoying places, plus the odd typo. So after going pdf to epub ( not with calibre) then epub to mobi I made an rtf version from the epub and everytime i complete a chapter, I fix up the formatting for that chapter in word & then use calibre to make a new mobi file.

I've done this repeatedly for 20+ chapters, over 2 weeks, without ever encountering the split word bug - note that there were no split words in the initial pdf conversion -

so all that has changed recently is the version of calibre, and that fact that the total number of edits of the rtf has increased.

Now when I scroll though today's latest mobi conversion in the internal reader, I see a couple of split words in the early chapters which I'd previously "signed off " as done.
so fresh conversions are introducing word split errors in places where they had not used to be. ( but only very occasionally - about one per 50-100 pages )

Last edited by cybmole; 11-02-2010 at 03:59 PM.
cybmole is offline   Reply With Quote
Old 11-02-2010, 03:45 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
use word to save the rtf as html and convert that. That's the easiest way of dealing with suspect rtf.
kovidgoyal is offline   Reply With Quote
Old 11-02-2010, 04:02 PM   #5
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by kovidgoyal View Post
use word to save the rtf as html and convert that. That's the easiest way of dealing with suspect rtf.
tried that - saved it from word as filtered html - imported it to calibre ( which changed the htm file to zip) , then converted ZIP to mobi - but all line breaks then became fixed & text no longer flowed ?
cybmole is offline   Reply With Quote
Advert
Old 11-02-2010, 04:06 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
that will be a function of how the rtf is converted to html by word.
kovidgoyal is offline   Reply With Quote
Old 11-02-2010, 11:10 PM   #7
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by cybmole View Post
tried that - saved it from word as filtered html - imported it to calibre ( which changed the htm file to zip) , then converted ZIP to mobi - but all line breaks then became fixed & text no longer flowed ?
The line breaks would only become fixed in this scenario if you had hard breaks (used enter to start a new line) in your rtf file before you saved it as html filtered.

You can always try checking the Preprocess input file... under Structure Detection to see if that helps put the lines back together during a html to mobi conversion.
DoctorOhh is offline   Reply With Quote
Old 11-03-2010, 03:07 AM   #8
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
well the whole htm thing is drifting off -topic from the original issue of split wor

ds

for which there seems no obvious explanation. I'm inclined to blame it on workign with a huge + over-edited .rtf & move on.

There used to be a way ( in older versions of word) to display all the extra formating characters but I can't find that in word 2007. I read up on how word tracks edits & have tried to remove all stored changes but maybe that does not even apply in .rtf

Something is causing calibre, during conversion, to throw in an occasional spurious line break either mid -word or mid- sentence. I'd like to track it down, out of geeky curiosity, but don't know how.. If I get this scenario with any other books then I'll report back.

i didn't get an answer to whether the 7.26 release made any changes to conversion code ?

I don't really want the hassle of rolling back to 7.24, reconverting & re-inspecting to see if that makes any difference, if the code was not changed at all !
cybmole is offline   Reply With Quote
Old 11-03-2010, 03:20 AM   #9
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
[QUOTE=dwanthny;1196517]The line breaks would only become fixed in this scenario if you had hard breaks (used enter to start a new line) in your rtf file before you saved it as html filtered.

QUOTE]

hmm repeating the test: open .rtf file save as filtered html - confirm that I want office specific tags removed.

close word. then repoen word & re-open htm - change window size & confirm flow still works.

add it to calibre, convert to mobi without structure detection options.
flow seems Ok ! - I'm sure that's the same process as before!

can't see any split words on a quick scroll through. so that's a workable work-around.

still does not expla

in the origi

nal issue though

PS seems there's also a bug with merge books & ZIP that I reckon I've seen before. I decided to save the above work & merge it into my original book versions. so I do merge with delete option, wanting it to overwrite mobi and add zip to 1st selected book.

the ZIP file apparently vanishes!

i.e. before merge. book 1 has pdf, rtf mobi, book 2 has zip, mobi. after merge, zip has been lost ?
- well actually not totally lost - if I try open path to calibre folder, it's there, but it's not listed as an available format to view. if I shut down & restart calibre, then it is re-listed as an available format

Last edited by cybmole; 11-03-2010 at 03:27 AM.
cybmole is offline   Reply With Quote
Old 11-03-2010, 04:38 PM   #10
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by cybmole View Post
PS seems there's also a bug with merge books & ZIP that I reckon I've seen before. I decided to save the above work & merge it into my original book versions. so I do merge with delete option, wanting it to overwrite mobi and add zip to 1st selected book.

the ZIP file apparently vanishes!

i.e. before merge. book 1 has pdf, rtf mobi, book 2 has zip, mobi. after merge, zip has been lost ?
- well actually not totally lost - if I try open path to calibre folder, it's there, but it's not listed as an available format to view. if I shut down & restart calibre, then it is re-listed as an available format
I've just tested this and I can't reproduce it. I see the ZIP format immediately and can't make it disappear. Can anyone else reproduce it? Can you give me a sequence of steps that reproduces it?
Starson17 is offline   Reply With Quote
Old 11-04-2010, 02:46 AM   #11
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
my sequence of steps is as above.

here it is again.

take an existing multiformat book from your collection & , via word, produce a filtered html file copy outside of calibre.
add this copy to calibre- it will appear as a new book with incorrect metadata, and will be stored as zip.
merge this with the original book.
I see calibre delete the 2nd book entry as instructed, then when I focus back on the 1st book I do not see ZIP in the available formats ( not until I close & restart calibre )
cybmole is offline   Reply With Quote
Old 11-04-2010, 02:49 AM   #12
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
While you may have hit upon a real bug, there is a simpler process you can use.

Instead of adding the html as a whole new book and merging it, just add the html as a new format to the existing book. Edit the book and add the format, or just drag-n-drop the html file to same spot all the other formats are listed - either in the main view or in the edit book window.
ldolse is offline   Reply With Quote
Old 11-04-2010, 03:06 AM   #13
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by ldolse View Post
While you may have hit upon a real bug, there is a simpler process you can use.

Instead of adding the html as a whole new book and merging it, just add the html as a new format to the existing book. Edit the book and add the format, or just drag-n-drop the html file to same spot all the other formats are listed - either in the main view or in the edit book window.
thanks _ I was not aware that option existed !
cybmole is offline   Reply With Quote
Old 11-04-2010, 08:02 AM   #14
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by cybmole View Post
my sequence of steps is as above.

here it is again.

take an existing multiformat book from your collection & , via word, produce a filtered html file copy outside of calibre.
add this copy to calibre- it will appear as a new book with incorrect metadata, and will be stored as zip.
merge this with the original book.
I see calibre delete the 2nd book entry as instructed, then when I focus back on the 1st book I do not see ZIP in the available formats ( not until I close & restart calibre )
The steps of starting from a multiformat book in Word, saving as filtered HTML should have no effect. It's the existence of the ZIP that's all that should matter. I wrote the merge code and the content of the ZIP isn't even looked at. I tried a variety of tests with a ZIP file in a record being merged into another record that does not have a ZIP and could not reproduce this. If you can do this repeatedly, can you PM me the books in question in original and ZIP formats and let me try it. (I haven't tested on the latest Calibre - I'll test that as soon as I finish restoring my main machine - slight hardware problem)
Starson17 is offline   Reply With Quote
Old 11-04-2010, 08:54 AM   #15
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
should probably move this testing to a new thread, but anyway.

I'll start over with a small mobi book. - 1. I make a. rtf version,m with calibre & open it in word.
2. using word, I save it as htm outside of calibre folders.
3. I add that file to calibre as a new book
4. I do merge - this time it all works!

I try again with another book that exists already in 3 formats. no issue there either.

so now I go to the book which has been causing all of the trouble - Bob Woodward, the war within... I copy the zip to desktop then remove it from calibre via remove book of a specific format. then I add the zip back into calibre as a new book, & then I merge - then I move mouse to the original copy to view formats list - voila, the available format list has NOT updated! so I can reproduce the bug on this specific PC with this specific book.

not sure how to get the files to you if you want them all - it's an 8MB pdf, 1 MB RTF, smaller mobi & zip.

but the zip is not lost i.e. if I open path, then it is there. & if I click onto a different book in main calibre window, then back onto this book, the list has updated. so it is just a small display update issue , not a lost file issue? maybe the metadata for the 2 books has to mismatch, as it does in test 3 before merge ( as making the zip remakes metadata and changes the title slightly ) ?

Last edited by cybmole; 11-04-2010 at 09:11 AM.
cybmole is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Conversions from RTF (to mobi/epub) Gwen Morse Calibre 6 10-14-2010 06:00 AM
Table of Contents RTF > MOBI daxmon87 Calibre 12 10-09-2010 12:46 AM
convert PDF to Word/rtf/txt DrZoidberg Other formats 3 02-09-2010 06:12 AM
RTF to MOBI error rcuadro Calibre 1 06-21-2009 07:30 PM
Have switched from RTF/Word to BookDesigner/Sony Reader Dr. Drib Sony Reader 3 02-21-2007 08:03 AM


All times are GMT -4. The time now is 06:30 AM.


MobileRead.com is a privately owned, operated and funded community.