Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre


Thread Tools Search this Thread
Old 01-09-2011, 01:41 AM   #1
Junior Member
GlennMaples began at the beginning.
Posts: 6
Karma: 10
Join Date: May 2010
Device: none
How should file names be parsed and prepared for calibre import? Use cases requested

I am cleaning up my library of e-books and wrote a program to do most of the grunt work. I am throwing the file & folder names into a sql database and running a cleaning program. Then I will use an GUI app to "touch up" the mal-contents before reorganizing the files. Finally I will import into calibre.

Of course this will require some manual work, but I am trying to minimize this with general rules.

I would like to get your opinions on the four questions below:

1) suppose I have two copies of the same book in the same format. Any good way to name them to import them both into calibre (perhaps one copy is better than the other, but it is unknown at this time which is the best)?

2) should I store compressed books as compressed (zips...rars) in the final folder to be imprted into calibre? Similarly, what is the best way to treat multi file books (html, jpeg, txt all part of the same book)-- compressed in a zip file or just left in the same parent folder?

3) should the authors be John Smith or Smith, John in the file names? I am leaning toward the second as it will make it easier for calibre to recognize the lastname of someone like Boris van Welke Jr.

4) what are your thoughts on parenthesis in book file names? Right now I am leaning toward parsing them out and rewrite using dashes.

I am planning on storing each book in a separate file with a folder named as to the file name (Smith, John & Davis, Eddie - XXX series - Other note - Title.ext) in preparation on importing into calibre.

Any other recommendation/thoughts?

If anyone would like the program I could package it up -- But it might take a little time--right now it is in C# and using a SQL server DB -- darned if I could get it going with the CE edition.

Here are some of the rules:


1) A.ext

A= title

2) A-B.ext
A = Authors
B = title

3) A-B-C.ext


I was going to look for long number strings and store as ISBN, but none of my files have one: so I commented this out :-)


1) if there are no dashes look for excess of periods and convert to dashes

2) if there are no spaces look for cap letters (make sure not all caps) and add spaces before the caps (except leading letter of filename)

AdamSmith.TheWealthOfNations.txt --> Smith, Adam - The Wealth of Nations

3) trim all strings

4) Eliminate multiple dashes & spaces

5)Convert underscores to spaces if there are dashes in the filename -- otherwise convert them to dashes.


1) Look for "shorties" and (e.g., von, van) and treat as part of last name
2) look for suffixes (e.g., jr.) and treat as part of last name
3) Use and, AND, And, & as name separators

Common multiple name forms to be parsed correctly:

john smith and Jane doe
Smith,john and Doe, jane
Tom Brady, john smith, and paul bunyon
Tom Brady, john smith & paul bunyon
Tom Brady & john smith & paul bunyon...
George and Martha Washington

If you are interested I would be glad to see some test cases from you!
Thanks again-

GlennMaples is offline   Reply With Quote

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre File Names ll Legion ll Calibre 4 10-13-2010 06:03 PM
PRC file doesn't fully import into Calibre MSJim Kindle Formats 1 06-01-2010 03:55 PM
Calibre doesn't import all of PRC file MSJim Calibre 5 06-01-2010 03:26 PM
file names in Calibre are not transferred to my ebook miss_stix Calibre 2 03-15-2009 12:12 PM
File Names in Calibre jimbo Calibre 6 07-19-2008 10:21 PM

All times are GMT -4. The time now is 11:20 AM. is a privately owned, operated and funded community.