View Single Post
Old 03-30-2011, 01:35 AM   #25
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
Quote:
Originally Posted by kiwidude View Post
Thx for the extra info, I had edited my post while you were typing obviously

I claim no credit for the regex or that loop of code applying the matches - as per my very first post I took this code from bazbar's script that people were using. I figured they had in turn built it based on all of the earlier versions so just assumed it was "proven". Since you have questioned it I will take a look.
Some new hobby time to spent But if you have questions, please mail me, you got my address.

3 new problems occured:
1. It was not posible for me to read the isbn for in this case (# equals a \d (not x)). This is te complete line, so after the last number there is a linefeed.
ISBN 978 ## ### #### #
2. I've seen some really strange ISBN-numbers. Do not know if you even want to support it but I've seen a few cases the control number being a letter (not x). I've seen it just a few times. But the most strange thing is, in case the number should be 4, they used D. As a programmer you know about the problem of starting with 0 or 1. I do not know if this type of isbn is used often, In my collection of 3000+ books I have only seen it 4 times, but at the moment I got 2000 books without ISBN (lot of gutenberg project so they will never get an isbn).
3. I've also got some books with only a isbn, not an indication it is isbn. In all these cases the isbn is on one of the first 3 pages or on one of the last 2. This is a moment to use the regex without match 0. I could think of an (drobdown/submenu) option to use the regex this way. By default u use the standard way as it is / will be now. If you still got no ISBN error, you could use the second option (in this case I should not update existing isbn-numbers because it is less certain it is an isbn, or you should have to check the isbn validity before inserting it).
4. I have some isbn numbers the dit not got converted very well using ocr. In this way I got this letters or signs as numbers. To correct this, you have to check the validity of isbn-numbers if you replace the letters. Creates a much slower script. I would like to have it, but I build it myown if I will be the only user .
Signs i got:
i I (i) ! l (L) | (or) { } (all for the number 1)
o O B 8 (for a bad printed zero but also for a 8)
b (for a 6 or 10)
d (for a 01)

So some new regex to be made. But the question in this case is what is the best way. To get all mentioned cases into isbn, the plugin will be slower (in bad cases) but it is possible the parser will get isbn-numbers in books that do not have an isbn-number mentioned (I think there will be cases...)

Quote:
Originally Posted by kiwidude View Post
As for your question about searching from the end of the book. I assume this is a performance thing - and my answer remains the same as previously. I am at the mercey of the current implementation of the Calibre input converters. They do not stream the results to me, I cannot control their direction. I give them a path, and when they are "done" they give me a bunch of stuff representing the converted EPUB back.
If it is possible but only a case of performance, you could possible ad it as a submenu option. It is not possible to set is as default, would be a performance losse, but if someone wants it, he/she could use it.

Thank you for your time and if you want (even more ) info, just reply here or mail me.
drMerry is offline   Reply With Quote