MobileRead Forums - View Single Post

Sabardeyn · 05-25-2009, 05:01 PM

Ouch! Looking at your regex and testing it against the formula from the other topic, I've discovered neither one is perfect.

So far all of the formulas are dependent on " - " being used as a field delimiter only. It cannot be used for hyphenated author's names nor as a part of the series (unlikely) or title (where it is likely to occur). When an extra " - " occurs the automatic import fails as the parts of the filename are separated incorrectly.

So, for instance, this example fails in all formulas to import correctly:

Code:

John D. Smith - Jones - Bibliographic Perfection 1 - The Perfect Book - A Bedtime Story.pdf

This can obviously be corrected manually before (change to "Smith-Jones", etc) or after (edit the book's record). Let me be the first to admit that I would rather have things entered accurately the first time, automagically. Editing is a hassle and easily forgotten. Luckily for me I'm already using "Smith-Jones" (no space) for hyphenated names. However, their is no good way around the potential problem in the title.

For my purposes I would prefer a regex that resolves all of the following correctly:

John D. Smith - The Perfect Book.pdf
John D. Smith - Bibliographic Perfection - The Perfect Book.pdf
John D. Smith - Bibliographic Perfection 1 - The Perfect Book.pdf
John D. Smith - Bibliographic Perfection 189 - The Perfect Book.pdf
John D. Smith - Bibliographic Perfection 1 - The Perfect Book - A Bedtime Story.pdf
John D. Smith - Jones - Bibliographic Perfection 1 - The Perfect Book - A Bedtime Story.pdf
John D. Smith-Jones & Somebody Else - Bibliographic Perfection 1 - The Perfect Book - A Bedtime Story.pdf

It should also handle author names that are 133t5p34|< (leetspeak), numbers and/or symbols. Why? Because we're already headed down that path and I might as well get a jump on things. Unicode import and export would be good too - more books are being sold internationally and this trend will only grow. (I don't want much, do I?)

05-25-2009, 05:01 PM	#7
Sabardeyn Guru Posts: 644 Karma: 1242364 Join Date: May 2009 Location: The Right Coast Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)	Ok, I'm taking back my "Eureka!!" now... Ouch! Looking at your regex and testing it against the formula from the other topic, I've discovered neither one is perfect. So far all of the formulas are dependent on " - " being used as a field delimiter *only. It cannot be used for hyphenated author's names nor as a part of the series (unlikely) or title (where it is likely to occur). When an extra " - " occurs the automatic import fails as the parts of the filename are separated incorrectly. So, for instance, this example fails in all formulas to import correctly: Code: John D. Smith - Jones - Bibliographic Perfection 1 - The Perfect Book - A Bedtime Story.pdf This can obviously be corrected manually before (change to "Smith-Jones", etc) or after (edit the book's record). Let me be the first to admit that I would rather have things entered accurately the first time, automagically. Editing is a hassle and easily forgotten. Luckily for me I'm already using "Smith-Jones" (no space) for hyphenated names. However, their is no good way around the potential problem in the title. For my purposes I would prefer a regex that resolves all of the following correctly: John D. Smith - The Perfect Book.pdf John D. Smith - Bibliographic Perfection - The Perfect Book.pdf John D. Smith - Bibliographic Perfection 1 - The Perfect Book.pdf John D. Smith - Bibliographic Perfection 189 - The Perfect Book.pdf John D. Smith - Bibliographic Perfection 1 - The Perfect Book - A Bedtime Story.pdf John D. Smith - Jones - Bibliographic Perfection 1 - The Perfect Book - A Bedtime Story.pdf John D. Smith-Jones & Somebody Else - Bibliographic Perfection 1 - The Perfect Book - A Bedtime Story.pdf It should also handle author names that are 133t5p34\|< (leetspeak), numbers and/or symbols. Why? Because we're already headed down that path and I might as well get a jump on things. Unicode import and export would be good too - more books are being sold internationally and this trend will only grow. (I don't want much, do I?) Last edited by Sabardeyn; 05-25-2009 at 05:10 PM. Reason: Added to the resolution list - large series number example.*