Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 11-14-2010, 11:11 AM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
best program for correcting typos / spelling in epub & mobi books ?

like the title says, what's the easiest way to fix up an e-book file
e.g. I've done reading a book but I recall there being some annoying typos or spelling errors & I now want to go into the source, find them & fix them, to keep my OCD at bay :-)

should I go directly to sigil ? or are there other products that I should short list.

essential feature would be a word style find+replace, or spell check, + the ability to work within the ebook format & not have to convert to something else before & after patching.

as an aside I'm currently puzzled by a novel which contains a lot of Japanese words; because very frequently a Japanese word and it's preceding English word have merged i.e. there is no intervening space. It's decipherable but I wonder what could cause such a conversion artifact. extract below is from my source (lit) format. it only seems to happen for english+japanese word pairs, never for english+english word pairs

e.g.
"The waitress brought us the twoyuzukiri. Without a sound she knelt on thetatami, placed each dish on the table, slightly repositioned them in accordance with some strict mental framework, stood, bowed, and departed."

e.g. 2
"I remembered for a moment the first time I heard the wordainoko, half-breed. It happened at school, and I asked my father about it that night. He scowled and said only, “Taishita koto nai.” It’s nothing. But pretty soon I got to hear the word while theijimekko, the school bullies, were busy trying to beat the shit out of me, and I put two and two together."

I realise I'd have to patch all of these instances by hand & in this case I doubt that I will bother, but I am curious as to possible cause ?
cybmole is offline   Reply With Quote
Old 11-14-2010, 11:30 AM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 26,213
Karma: 41365337
Join Date: Aug 2009
Location: The Central Coast of California
Device: K4NT(Fixed,New Bat.), Galaxy Tab A, Kobo Aura2
[QUOTE=cybmole;1215888]
as an aside I'm currently puzzled by a novel which contains a lot of Japanese words; because very frequently a Japanese word and it's preceding English word have merged i.e. there is no intervening space. It's decipherable but I wonder what could cause such a conversion artifact. extract below is from my source (lit) format. it only seems to happen for english+japanese word pairs, never for english+english word pairs
/QUOTE]

I have seen this happen when one word was (originally) in italics (common practice for foreign language representation).
You may also see the trailing white space is now larger.
Changing reader programs may correct the problem if the error did not make it into the document code.
theducks is offline   Reply With Quote
Advert
Old 11-15-2010, 04:42 AM   #3
toddos
Guru
toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.
 
toddos's Avatar
 
Posts: 695
Karma: 822675
Join Date: May 2010
Device: Kobo Aura, Nokia Lumia 920 (Freda)
What was the source material for this content? I'd bet the japanese words were probably italicized or otherwise had some odd formatting/markup that got lost in the conversion and took a space with it (I've seen similar things happen before with soft hyphens, for example). Without knowing where this content originated, there's no way to know why it converted poorly. Given that it apparently happens on every japanese word inline in a sentence, it was almost certainly a conversion issue.

As for fixing it, Sigil is probably your best bet. It has find/replace across all HTML files, and works within the epub format. Another option would be to use Calibre's "Tweak ePub" functionality (hotkey: T), which expands the epub into a temp folder where you can use your favorite text or HTML editor to modify the files, and then packages everything back up when done. Another similar option would be to expand the epub yourself. There is no real "epub format" per se. It's just HTML and CSS files in a zip container renamed .epub (there are metadata files and certain file location requirements, but as long as you're re-zipping everything back up you're not going to break the format). There's really no reason to use this approach, since it's just the manual version of using Calibre's "Tweak ePub" option.

Personally, I'd just use Sigil to fix things and edit the file directly from the calibre file store (yes, I know, not really recommended to go mucking about in the calibre storage folders, but as long as the epubs exist on disk this is relatively safe). However if you have another editor that you prefer (notepad++, visual studio, whatever), you could expand the epub, load all of the html files into your favorite editor, and use that editor's find/replace functionality to do the cleanup.

For this specific instance, you could certainly write a script in perl, python, sed, powershell, etc that would fix each instance of lost spaces. There's no universal automated solution for cleanups like this, though, because every cleanup task is different (for example, your next cleanup might be removing newlines after -s, rather than inserting spaces before certain words).
toddos is offline   Reply With Quote
Old 11-15-2010, 08:49 AM   #4
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
it was a lit source found online - 1 of 6 in a series, wcich may all have the same bug - - the gap between words is already missing in the lit source.
I had a quick go with sigil - it seems only able to handle epub - both lit & mobi version of same book appear as garbage. sigil has find replace, but not spell check.

notepad ++ displays garbage for all 3 formats unless I'm missing an obvious trick

convert to word + rtf is what I've used in the past then convert back again - spell check is good but has no way to turn off flagging all capitalised words, and a typical novel has lots of those. - like peoples' names..
i was not aware of teh tewak epub option so I@ll check that out also thanks.

the general issue though is that typos in ALL e-books annoy me - enough to want go go back & fix them if there is any likelihood of reading the book again at a future time.

without a spell checker & a good memory , it can be hard to track them all down though, after finishing the book; unless I make notes while reading - & that spoils the reading experience somewhat.

PS as an aside on legal sources, I checked kindle store for this series & they don't have it
- is it sensible to assume that if a book is not available on Kindle, then it is very unlikely to be available in any other e-book format, in any other store , and not bother looking ?
how were .lit source books sold, back in the pre-Kindle , pre-Sony era ?

Last edited by cybmole; 11-15-2010 at 08:57 AM.
cybmole is offline   Reply With Quote
Old 11-15-2010, 09:06 AM   #5
silentguy
Connoisseur
silentguy doesn't littersilentguy doesn't littersilentguy doesn't litter
 
Posts: 88
Karma: 200
Join Date: Nov 2010
Location: Dortmund, Germany
Device: Kindle Paperwhite (10. Generation)
I recently imported a lot of fan fiction and that one has same stupid typos and other problems. I ended up converting it all (back) to txt files and running some regexp for the most annoying problems. Only annoying thing is that I loose formating that way. Luckily there was not much formating in the text anyway, but to be on the safe side I made a feature request to export markdown
silentguy is offline   Reply With Quote
Advert
Old 11-15-2010, 10:28 AM   #6
Manichean
Wizard
Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 86098
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by cybmole View Post
I had a quick go with sigil - it seems only able to handle epub - both lit & mobi version of same book appear as garbage. sigil has find replace, but not spell check.

notepad ++ displays garbage for all 3 formats unless I'm missing an obvious trick
Sigil does only support ePub as a book format. LIT files have underlying HTML, so, unless you tried to directly open the file in Notepad++, you should see some, well, HTML. I remember using a utility called ConvertLIT to disassemble LIT files. The same is, I believe, true for Mobipocket and, of course, ePub, though you'll obviously need different disassembly tools.
Manichean is offline   Reply With Quote
Old 11-15-2010, 11:10 AM   #7
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
open the .lit source in sigil - it finds no text - just says ITOLITLS in something called section0001.xhtml.
there are empty tags for styles, images, fonts, misc.

no an issue as I could work with epub & find / replace tools

I try a completely different .lit book in sigil & get exactly the same (lack of )output

if I switch to code view I see a bit of HTML
[code]

?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
</head>

<body>
<p>ITOLITLS</p>
</body>
</html>
[code]

has the option of a plug-in text editor / spell checker tool for calibre ever been mooted / dismissed ?
i searched the forum for spell checker & did not find anything interesting.
cybmole is offline   Reply With Quote
Old 11-15-2010, 12:47 PM   #8
Manichean
Wizard
Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.Manichean has not lost his or her sense of wonder.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 86098
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
I think it came up when the "explode ePub"- feature got introduced and was dismissed because calibre is meant to manage books, not edit them.
Manichean is offline   Reply With Quote
Old 11-15-2010, 01:30 PM   #9
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 5,740
Karma: 14352952
Join Date: Sep 2009
Location: UK
Device: Kobo: H2O, GloHD, KA1, ClaraHD, Forma
@cybmole,
Have you tried using the Calibre [Convert] - [Debug] feature to extract the HTML from your LIT? I use it all the time when the source LIT is poor.

Alternatively, ZIP is now a standard output format option in Convert. So, if you convert LIT to ZIP you can get at the HTML that way.
jackie_w is offline   Reply With Quote
Old 11-15-2010, 02:43 PM   #10
toddos
Guru
toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.
 
toddos's Avatar
 
Posts: 695
Karma: 822675
Join Date: May 2010
Device: Kobo Aura, Nokia Lumia 920 (Freda)
Quote:
Originally Posted by cybmole View Post
I had a quick go with sigil - it seems only able to handle epub - both lit & mobi version of same book appear as garbage. sigil has find replace, but not spell check.

notepad ++ displays garbage for all 3 formats unless I'm missing an obvious trick
My bad. I thought you asked for tools to edit epubs? Sigil is an epub editor. It doesn't edit mobi or lit books.

Notepad++ will display garbage unless you expand the epub. As I mentioned, epub files are just renamed zips. You can explode them yourself by renaming the file to .zip (that is, "foo.epub" would be renamed to "foo.zip") and opening it with your favorite zip tool (winzip, windows explorer, etc). When you're done, just zip it all back up and rename it back to .epub (zip everything up to "foo.zip" and then rename that to "foo.epub"). Or you can skip all of those steps using Calibre's Tweak ePub option, as I mentioned.

Since I mainly use only epub, I've not spent any time looking at mobi or lit editing tools. Luckily the three formats convert back and forth relatively easily (as long as you don't get too fancy with the css in the epub), so you can edit the epub source and convert new mobi and lit copies from the epub.
toddos is offline   Reply With Quote
Old 11-15-2010, 04:24 PM   #11
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by toddos View Post
Luckily the three formats convert back and forth relatively easily (as long as you don't get too fancy with the css in the epub), so you can edit the epub source and convert new mobi and lit copies from the epub.
yes< I thought that was the case, so I'm happy to just learn epub editing.

the trouble with expanding zips or going the html route is that you potentially get hundreds of individual html pages to sort through. so that option probably wins 3rd place only, with 2nd place going to " convert to .rtf and use word."
cybmole is offline   Reply With Quote
Old 11-15-2010, 07:40 PM   #12
toddos
Guru
toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.
 
toddos's Avatar
 
Posts: 695
Karma: 822675
Join Date: May 2010
Device: Kobo Aura, Nokia Lumia 920 (Freda)
Quote:
Originally Posted by cybmole View Post
yes< I thought that was the case, so I'm happy to just learn epub editing.

the trouble with expanding zips or going the html route is that you potentially get hundreds of individual html pages to sort through. so that option probably wins 3rd place only, with 2nd place going to " convert to .rtf and use word."
Why would you want to do another conversion to rtf and back? That seems like not only a waste of time but a good way to introduce more conversion errors. If you're not going to use Sigil (which does have search/replace across all HTML files in the epub), use a tabbed editor like notepad++, textpad, pspad, visual studio, etc and just open up all of the HTML files in the directory. Then you can again do a find/replace across all open files. The only "difficulty" here is understanding how your chosen text editor handles opening multiple files (that is, don't try to do this using the Windows default Notepad text editor).
toddos is offline   Reply With Quote
Old 11-16-2010, 01:28 AM   #13
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
ok =- i get how to do epub + sigil.

for notepad++ I need to copy the epub, rename it as zip, then somehow open all files at once, edit, then save all files at once, then rename back to epub.

that seems much more complicated, suggesting sigil is the better choice ?

why I'd want to go to rtf and back is for the spell & grammar checker

I've done some via .rtf OK, but in others calibre crops one or more characters from left margin when it converts from epub/mobi to RTF. I've not searched for that bug but what seems to trigger it is a fancy initial indented capital letter in the original pdf source.pdf to epub goes ok then epub to rtf is screwed.:

I'll make a new .rtf conversion bug thread

calibre rtf output - preprocessing unticked:
" He'd always pictured the end of the world being a bit more … industrial. Loud
machines, cars crashing, people screaming, guns-a-blazing. Perhaps a world-cracking bomb that would shatter the Earth into bits.

ut here, there was nothing. Nothing at all, save for some calf-high grasses, endless rocks, and the towering white vistas of glaciers raised high on the horizon.

reenland was far from the minds of most apocalyptic visionaries. And yet here he was, the man responsible for stopping the end of the world. No cars crashing, none of that nonsense. Just a tiny virus, and some pigs.

ydney Chapman "

same text - epub - calibre reader:
"He'd always pictured the end of the world being a bit more … industrial. Loud
machines, cars crashing, people screaming, guns-a-blazing. Perhaps a world-cracking bomb that would shatter the Earth into bits.
But here, there was nothing. Nothing at all, save for some calf-high grasses, endless rocks, and the towering white vistas of glaciers raised high on the horizon.
Greenland was far from the minds of most apocalyptic visionaries. And yet here he was, the man responsible for stopping the end of the world. No cars crashing, none of that nonsense. Just a tiny virus, and some pigs.
Sydney Chapman...."

original source was pdf - the cropped characters were large font in original

Last edited by cybmole; 11-16-2010 at 02:48 AM.
cybmole is offline   Reply With Quote
Old 11-16-2010, 05:00 AM   #14
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 5,740
Karma: 14352952
Join Date: Sep 2009
Location: UK
Device: Kobo: H2O, GloHD, KA1, ClaraHD, Forma
Quote:
Originally Posted by cybmole View Post
original source was pdf - the cropped characters were large font in original
Perhaps the large initial characters were images rather than text.
jackie_w is offline   Reply With Quote
Old 11-16-2010, 06:06 AM   #15
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,525
Karma: 950063
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
Quote:
Originally Posted by jackie_w View Post
Perhaps the large initial characters were images rather than text.
I do not think that can be the problem as the epub looks OK. One thing I did notice that in the cases where the first character disappeared in the rtf, each paragraph in the ePub had a leading space, whereas the ones without a leading space do not lose the first character. That makes it sound more like a bug in the rtf output phase.
itimpi is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Hellenistic Astronomy e-book (ePub & Mobi) amoroso Deals and Resources (No Self-Promotion or Affiliate Links) 11 09-20-2010 06:21 AM
epub->mobi & kindle navpoints dmaddock1 Calibre 1 09-04-2010 10:43 AM
Correcting mobi book for personal use? krazy4katz Kindle Formats 9 01-09-2010 04:01 AM
Epub to Mobi & TOC Nate the great Calibre 12 10-01-2009 03:33 PM
Mobiperl Correcting typos in a mobi file Jellby Kindle Formats 1 07-16-2008 08:11 AM


All times are GMT -4. The time now is 02:55 AM.


MobileRead.com is a privately owned, operated and funded community.