07-25-2008, 10:10 AM | #1 |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
lit2oeb -- calibre LIT extraction/conversion without ConvertLIT
Kovid pushed out a new version of calibre last night (0.4.80) which packs an old feature in new clothes: I've ported (most of) ConvertLIT to Python and calibre is now able to extract the contents of LIT files directly, without having a copy of ConvertLIT installed. Edit: As of version 0.4.83, the calibre-native code is the default, and may be accessed on the command-line as 'lit2oeb' (for just explosion) or as part of LRF conversion with 'lit2lrf'.
The calibre-native code fixes the following bugs in ConvertLIT:
"Ah!," you ask, "but what bugs does your new code introduce, other than being rather slow right now?" Well that's where you, the savvy early-adopter, come in: we need to find them! If you (a) have a fair number of LIT e-books and (b) can run a command from the command-line, please download the attached Python script and run it against your library. The arguments are the filename of a logfile to write out to and the directory to search for LIT files in. For example: Code:
python stress-lit2oeb.py log.txt library/ If you instead / then just use 'lit2oeb' or 'lit2lrf --lit2oeb' on individual files and find individual bugs, please use the calibre issue-tracker as per usual: check if anyone else has already posted the same bug, and if not post a new defect issue. Thanks, and I hope you find this useful! -Marshall P.S. In case it isn't obvious, the calibre LIT code does not include DRM removal. You'll still need ConvertLIT for that if you want to do such things, but there are no known bugs there. Last edited by llasram; 08-09-2008 at 09:51 PM. Reason: Updated status of code in 'lit2lrf' |
07-29-2008, 12:44 PM | #2 |
Connoisseur
Posts: 91
Karma: 1133066
Join Date: Sep 2007
Device: ipaq
|
Which version of ConverLIT is the python code based on?
Is it possible for you to back-port your fixes back into ConvertLIT? Granted, getting it into the "official" version might be difficult, but what about posting (here) a diff against the latest sources? jmurphy |
Advert | |
|
07-29-2008, 02:16 PM | #3 | |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
ConvertLIT 1.8, the most recent version available from the official site.
Quote:
Is there something stopping you from being able to just migrate to calibre for all your LIT-extraction needs? |
|
07-29-2008, 02:53 PM | #4 |
reader
Posts: 6,975
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
|
Not to mention the risk of going to straight to jail if the new maintainer ever visits the US. There is a similar risk posting a diff against the original source code. The changes are not DRM-related, but they are updating a DRM-cracking program and so risk falling foul of the DMCA.
|
08-03-2008, 09:12 AM | #5 | |
Linux User
Posts: 323
Karma: 13682
Join Date: Aug 2007
Location: Germany
Device: Kindle 3
|
Quote:
However, the downside of your change is that the resulting HTML file often has very long lines and is hard to read. Two suggestions: 1. Automatically replace "> <" with ">\n<". Notice the space between > and <. (\n = line break) I suggested this for mobi2oeb too and it has been accepted. 2. Make line breaks where it's safe to do them, e.g. after "</p>" and "</h1>" ... This is true for the resulting OPF as well, by the way. Nice work so far, I'll use your script to hunt down bugs. |
|
Advert | |
|
08-03-2008, 02:49 PM | #6 | |
Connoisseur
Posts: 91
Karma: 1133066
Join Date: Sep 2007
Device: ipaq
|
Quote:
How do you run this on Windows? I've got Python installed. When I run the script I get: Code:
Traceback (most recent call last): File "stress-lit2oeb.py", line 8, in <module> from calibre.ebooks.lit.reader import LitReader ImportError: No module named calibre.ebooks.lit.reader |
|
08-03-2008, 08:12 PM | #7 | |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Quote:
How would you feel about an option to run the markup through a pretty-printer on output? |
|
08-03-2008, 08:22 PM | #8 | |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Quote:
|
|
08-03-2008, 09:33 PM | #9 |
creator of calibre
Posts: 43,881
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
On windows, you can try something like:
Code:
calibre-debug __name__ = 'int' execfile('stress-lit2oeb.py', globals()) main(['stress', 'log.txt', 'path to directory with lit files']) |
08-04-2008, 06:24 AM | #10 |
Linux User
Posts: 323
Karma: 13682
Join Date: Aug 2007
Location: Germany
Device: Kindle 3
|
|
08-04-2008, 09:49 AM | #11 |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
The re-formatting part of 'tidy', yep, just not the markup-cleaning part. Which is probably obvious. Just being pedantic over here. Mmm.... Pedantic.
|
08-07-2008, 10:16 PM | #12 |
creator of calibre
Posts: 43,881
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
As of version 0.4.83, lit2oeb powers lit2lrf
|
08-08-2008, 09:23 PM | #13 |
Addict
Posts: 277
Karma: 1004969
Join Date: Mar 2007
Device: Sony Reader
|
The only downside of using lit2oeb instead of convertlit is that with convertlit you didn't have to go through multiple steps to load a .lit format book. Convertlit would work with Calibre to do everything in one step. (For those people who wanted to buy DRM'ed ebooks to load - strictly in theory, of course)
|
08-09-2008, 01:53 PM | #14 |
creator of calibre
Posts: 43,881
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Calibre has a policy of not removiing DRM. And if it didn't addind DRM stripping to lit2oeb would be trivial.
|
08-09-2008, 04:15 PM | #15 | |
Addict
Posts: 277
Karma: 1004969
Join Date: Mar 2007
Device: Sony Reader
|
Quote:
The last thing anyone wants is for anything to cause Calibre to run into anything that might cause it to be shut down. That certainly means that DRM stripping can't be a direct part of the application. Your application is WAY to useful to put at risk! |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Book Error In Sigil After Calibre Conversion (from lit to epub) | Guns4Hire | Sigil | 13 | 03-05-2010 05:02 PM |
.lit conversion | bubulac | Calibre | 0 | 01-07-2010 11:33 PM |
problem using convertlit & Calibre | Gravitas | Sony Reader | 5 | 09-25-2008 04:43 AM |
ConvertLit GUI: Secure LIT for Reader? | Michele | Sony Reader | 21 | 03-18-2008 03:52 PM |
LIT conversion (C#) developer | Jaapjan | Workshop | 35 | 09-26-2005 09:43 AM |