Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Amazon Kindle > Kindle Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 01-18-2010, 06:32 PM   #331
soalla
Member
soalla began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Apr 2008
Device: iPod Touch, Sony PRS-505
first of all I must thank some_updates for all his hard work on this!!

I tried the pack of scripts, first using the old version of convert2xml and getting similar errors to those gotten by Stewball, and then using the new version of the script (created with help from Stew - thanks, man!! ) with success, creating the xml folder. Then I used the genhtml script and got a nice html file with some minor errors as when starting a chapter with an image or due (I believe) to some typos on the original topaz file and a not so small error as the original italics are gone!
but it's an excellent effort, nonetheless and I have now a perfectly readable file – I'll try to use it later on Calibre and convert it to epub for my prs-505

also, I got this errors when converting from xml to html:

Quote:
Processing …
metadata0000.dat
other0000.dat
page0000.dat
Unknown region type synth_fcvr.center
Warning: skipping this region
Unknown region type synth_fcvr.center
Warning: skipping this region
Unknown region type synth_fcvr.center
Warning: skipping this region
page0001.dat
page0002.dat
page0003.dat
anyone got it and know what it means?
soalla is offline   Reply With Quote
Old 01-18-2010, 07:15 PM   #332
stewball
Enthusiast
stewball has learned how to buy an e-book online
 
Posts: 37
Karma: 90
Join Date: Dec 2009
Device: PRS-500
Quote:
Originally Posted by soalla View Post
first of all I must thank some_updates for all his hard work on this!!

I tried the pack of scripts, first using the old version of convert2xml and getting similar errors to those gotten by Stewball, and then using the new version of the script (created with help from Stew - thanks, man!! ) with success, creating the xml folder. Then I used the genhtml script and got a nice html file with some minor errors as when starting a chapter with an image or due (I believe) to some typos on the original topaz file and a not so small error as the original italics are gone!
but it's an excellent effort, nonetheless and I have now a perfectly readable file – I'll try to use it later on Calibre and convert it to epub for my prs-505

also, I got this errors when converting from xml to html:



anyone got it and know what it means?
Soalla,
I get the same thing. I was wondering if came from the empty pages at the beginning of the book, or open space. Doesn't seem to affect the conversion though.


Stew
stewball is offline   Reply With Quote
Advert
Old 01-18-2010, 07:58 PM   #333
Liviu_5
Books and more books
Liviu_5 juggles neatly with hedgehogs.Liviu_5 juggles neatly with hedgehogs.Liviu_5 juggles neatly with hedgehogs.Liviu_5 juggles neatly with hedgehogs.Liviu_5 juggles neatly with hedgehogs.Liviu_5 juggles neatly with hedgehogs.Liviu_5 juggles neatly with hedgehogs.Liviu_5 juggles neatly with hedgehogs.Liviu_5 juggles neatly with hedgehogs.Liviu_5 juggles neatly with hedgehogs.Liviu_5 juggles neatly with hedgehogs.
 
Liviu_5's Avatar
 
Posts: 917
Karma: 69499
Join Date: Mar 2006
Location: White Plains, NY, USA
Device: Nook Color, Itouch, Nokia770, Sony 650, Sony 700(dead), Ebk(given)
I guess we can officially declare Topaz as done, so no major/current ebook drm is standing so far; it is amazing btw to look at the structure of the Topaz files as dissected with the scripts (or in the svg image) to see how Amazon just took the book and scanned it, ran an ocr and some corrections/images and put together a huge slow package and called it a salable ebook...

well at least we should be happy the books are available as e's and from now on they can be read anywhere

Last edited by Liviu_5; 01-18-2010 at 08:02 PM.
Liviu_5 is offline   Reply With Quote
Old 01-19-2010, 02:28 AM   #334
soalla
Member
soalla began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Apr 2008
Device: iPod Touch, Sony PRS-505
Quote:
Originally Posted by soalla View Post
(...)

also, I got this errors when converting from xml to html:

Quote:
Unknown region type synth_fcvr.center
Warning: skipping this region
anyone got it and know what it means?
well, that was solved with the last version (v1.3), and also the errors at the start of chapters when you have an image as the first letter...

now it's time for cleaning the html (reaplying the lost bold and italics formatting) and converting to epub

a very nice job was done (and I believe they're going to improve ir still a little more) by a great group of people, so here is again a BIG THANKS for all of them!!

Last edited by soalla; 01-19-2010 at 02:30 AM.
soalla is offline   Reply With Quote
Old 01-19-2010, 12:16 PM   #335
Terisa de morgan
Grand Sorcerer
Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.
 
Terisa de morgan's Avatar
 
Posts: 6,261
Karma: 11768331
Join Date: Jun 2009
Location: Madrid, Spain
Device: Kobo Clara/Aura One/Forma,XiaoMI 5, iPad, Huawei MediaPad, YotaPhone 2
Quote:
Originally Posted by delphidb96 View Post
Care to share it? I"d love one.

Derek
The bat to decrypt? I've only followed the instructions calling to the three scripts? If you want I post it here but it's very simple (I'm very lazy so I always make scripts )

Only one comment. If you try to convert a Topaz book downloaded to Kindle for PC you don't need PID.

Last edited by Terisa de morgan; 01-19-2010 at 12:18 PM.
Terisa de morgan is offline   Reply With Quote
Advert
Old 01-19-2010, 12:18 PM   #336
Terisa de morgan
Grand Sorcerer
Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.
 
Terisa de morgan's Avatar
 
Posts: 6,261
Karma: 11768331
Join Date: Jun 2009
Location: Madrid, Spain
Device: Kobo Clara/Aura One/Forma,XiaoMI 5, iPad, Huawei MediaPad, YotaPhone 2
Quote:
Originally Posted by soalla View Post
well, that was solved with the last version (v1.3), and also the errors at the start of chapters when you have an image as the first letter...
How do you know the version?

Quote:
Originally Posted by soalla View Post
now it's time for cleaning the html (reaplying the lost bold and italics formatting) and converting to epub
I don't understand, do you have to look all the books?
Terisa de morgan is offline   Reply With Quote
Old 01-19-2010, 12:26 PM   #337
Pardoz
Which side are you on?
Pardoz once ate a cherry pie in a record 7 seconds.Pardoz once ate a cherry pie in a record 7 seconds.Pardoz once ate a cherry pie in a record 7 seconds.Pardoz once ate a cherry pie in a record 7 seconds.Pardoz once ate a cherry pie in a record 7 seconds.Pardoz once ate a cherry pie in a record 7 seconds.Pardoz once ate a cherry pie in a record 7 seconds.Pardoz once ate a cherry pie in a record 7 seconds.Pardoz once ate a cherry pie in a record 7 seconds.Pardoz once ate a cherry pie in a record 7 seconds.Pardoz once ate a cherry pie in a record 7 seconds.
 
Posts: 370
Karma: 1964
Join Date: Dec 2009
Location: Variable, currently Czestochowa, Poland.
Device: Kindle 2 Int'l
Quote:
Originally Posted by Terisa de morgan View Post
I don't understand, do you have to look all the books?
Yes - the conversion process loses italics, boldface, em-dashes, etc. Of course you can also manually correct any OCR errors in the text while you're at it.
Pardoz is offline   Reply With Quote
Old 01-19-2010, 12:53 PM   #338
Terisa de morgan
Grand Sorcerer
Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.
 
Terisa de morgan's Avatar
 
Posts: 6,261
Karma: 11768331
Join Date: Jun 2009
Location: Madrid, Spain
Device: Kobo Clara/Aura One/Forma,XiaoMI 5, iPad, Huawei MediaPad, YotaPhone 2
Quote:
Originally Posted by Pardoz View Post
Yes - the conversion process loses italics, boldface, em-dashes, etc. Of course you can also manually correct any OCR errors in the text while you're at it.
And I've have begun with a book full of italics
Terisa de morgan is offline   Reply With Quote
Old 01-19-2010, 02:12 PM   #339
phenomshel
ZCD BombShel
phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.
 
phenomshel's Avatar
 
Posts: 4,793
Karma: 8293322
Join Date: Jan 2009
Location: The Frozen North (aka Illinois, USA)
Device: iPad, STB Kindle Oasis
Actually, version 1.3 kept dashes in the book I converted. But that's why they say to generate the svg version of the book, for proofreading purposes.
phenomshel is offline   Reply With Quote
Old 01-19-2010, 02:57 PM   #340
stewball
Enthusiast
stewball has learned how to buy an e-book online
 
Posts: 37
Karma: 90
Join Date: Dec 2009
Device: PRS-500
I have been using a free program called Sigil that can edit html and epub files. It can also create epub files after making corrections to the html. You can add in chapter breaks, correct typos, etc.

One caveat, it sometimes stops responding and can be slow to use. A new version is due out soon.

My first two topaz have been pretty good in quality, and definitely better than some scans I have seen.


Stew
stewball is offline   Reply With Quote
Old 01-19-2010, 03:00 PM   #341
clarknova
Addict
clarknova plays well with othersclarknova plays well with othersclarknova plays well with othersclarknova plays well with othersclarknova plays well with othersclarknova plays well with othersclarknova plays well with othersclarknova plays well with othersclarknova plays well with othersclarknova plays well with othersclarknova plays well with others
 
clarknova's Avatar
 
Posts: 241
Karma: 2617
Join Date: Mar 2009
Location: Greenwood, SC
Device: Kindle 2
Quote:
Originally Posted by phenomshel View Post
Actually, version 1.3 kept dashes in the book I converted. But that's why they say to generate the svg version of the book, for proofreading purposes.
The dashes should be there, but the Topaz creation software tends to OCR both en dashes (-) and em dashes (—) as hyphens (en dashes).

Granted, I've read so many darknet books with glaringly bad OCR errors, that the few mistakes that seem to be in Topaz books are pretty minor to me.
clarknova is offline   Reply With Quote
Old 01-19-2010, 03:37 PM   #342
phenomshel
ZCD BombShel
phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.phenomshel ought to be getting tired of karma fortunes by now.
 
phenomshel's Avatar
 
Posts: 4,793
Karma: 8293322
Join Date: Jan 2009
Location: The Frozen North (aka Illinois, USA)
Device: iPad, STB Kindle Oasis
Quote:
Originally Posted by clarknova View Post
The dashes should be there, but the Topaz creation software tends to OCR both en dashes (-) and em dashes (—) as hyphens (en dashes).

Granted, I've read so many darknet books with glaringly bad OCR errors, that the few mistakes that seem to be in Topaz books are pretty minor to me.
Yeah, exactly. A purchased book that I'm reading right now has so many glaring OCR errors that it's painful to read. Not to mention, it's a LOT worse than the output I'm getting with these scripts. So I can definitely handle mixups with en dashes and em dashes.
phenomshel is offline   Reply With Quote
Old 01-20-2010, 12:18 PM   #343
orwell2k
Addict
orwell2k can extract oil from cheeseorwell2k can extract oil from cheeseorwell2k can extract oil from cheeseorwell2k can extract oil from cheeseorwell2k can extract oil from cheeseorwell2k can extract oil from cheeseorwell2k can extract oil from cheeseorwell2k can extract oil from cheeseorwell2k can extract oil from cheese
 
orwell2k's Avatar
 
Posts: 357
Karma: 1112
Join Date: Oct 2008
Location: Euroland
Device: PocketBook 360°, BeBook (Hanlin V3), iRex DR1000S, iPad
Quote:
Originally Posted by phenomshel View Post
Yeah, exactly. A purchased book that I'm reading right now has so many glaring OCR errors that it's painful to read. Not to mention, it's a LOT worse than the output I'm getting with these scripts. So I can definitely handle mixups with en dashes and em dashes.
Make sure you submit a complaint to the publisher/distributor where you bought the book.

I have had numerous instances where there have been terrible OCR errors throughout the book. I have complained, submitted various screenshots of the errors, and demanded a corrected copy or a refund (e.g. open the ePub in ADE and take a few screen caps - I use ePub here because they seem to be the main culprits, as I don't recall so many badly formatted Mobi books). Naturally, a refund is always forthcoming (rather than a better copy - 'cos there aint one!). I also point out that if I bought a paper book with these kind of errors I would return it to the publisher immediately.

The only distributor so far who is dragging their feet is Waterstones (surprise, surprise) - but so far the others seem to be playing the game (Fictionwise, BooksOnBoard, Diesel, Penguin UK, WH Smith (UK)).

Seriously, this kind of shoddy product is just not on, especially given the still-inflated eBook prices of most places... we should be getting pristine eBook editions for those prices, but instead we're getting a lot of crappy scan-and-OCR versions full of typos and recognition errors. The only way to get these people in line is to exercise what limited consumer power we have, and return every purchased book that is not up to scratch for a refund... maybe then they'll get the message?
orwell2k is offline   Reply With Quote
Old 01-20-2010, 12:47 PM   #344
omk3
Wizard
omk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five wordsomk3 can name that ebook in five words
 
omk3's Avatar
 
Posts: 1,454
Karma: 37243
Join Date: Dec 2009
Location: Europe
Device: pocketbook 360, kindle 4
Orwell2k, I had exactly the same experience! I am now in my third email to waterstones (about the same book), and only getting generic "we care about you" responses. I bought a book with a lot of french names and phrases inside, and they were all completely unreadable. I sent them a very characteristic screenshot - I actually would have given me double the book price back if I was in their place and saw the state that book was in...
Anyway, I regret not having complained before, because of course it was not the first book with errors, just the worst.
If it was a paper book we of course could have returned it at once, but it seems that returning an ebook is another of those rights that are not self-evident in the digital world
omk3 is offline   Reply With Quote
Old 01-20-2010, 04:49 PM   #345
nothatkind
Member
nothatkind is on a distinguished road
 
nothatkind's Avatar
 
Posts: 17
Karma: 66
Join Date: Jan 2010
Location: Trieste, Italy
Device: Sony PRS-600, Kindle 4, Sony PRS-T1

Thanks to all of you!!!
Specials thanks for phenomshel that help me a lot to understand how the scripts worked...
Now I can read my topaz file on my sony!!!
Since I'm outside US I can't buy ebooks from sony... Amazon is having a new costumer...
nothatkind is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
discovering and loving this fb.2 reader.. oncdoc Astak EZReader 2 04-19-2010 06:05 PM
K4 Mac or PC Where are K4PC files? lmittell Amazon Kindle 3 01-06-2010 01:04 AM
Where is the PID on Pocket Pro, ADE and K4PC? rxsz Astak EZReader 7 12-20-2009 05:29 AM
Free on Kindle - Discovering Dani koland Deals and Resources (No Self-Promotion or Affiliate Links) 0 09-28-2009 09:57 AM
Kindle PID from Mobi PID - can anyone do it? delphidb96 Workshop 2 04-27-2009 04:42 PM


All times are GMT -4. The time now is 05:39 AM.


MobileRead.com is a privately owned, operated and funded community.