Thread: What a regex is
View Single Post
Old 05-06-2010, 03:01 AM   #13
Disfrutalavida
Fairly happy old fart
Disfrutalavida began at the beginning.
 
Disfrutalavida's Avatar
 
Posts: 10
Karma: 10
Join Date: May 2010
Location: Mexico, China, Ecuador Philippines
Device: Palm T3, iPAQ 211
Hello all,
Don't mean to butt in, but I have been reading for a while and finally felt ready to post.

Kovid, some suggestions for you at the end.
I do NOT assume my ideas will be useful to you but I created, modified and maintained databases for 13 years. Maybe there are some things you can use there.
If you ever want to bounce ideas around I'd be happy to do so.

I used to work with large databases (>30 million records) years ago and we would have considerd RegEx an input filter, programming extension or a language so take your pick, and though I am far from an expert in RegEx I have programmed in many languages, including SQL, like Pepak, in the past 37 years.
Anyone old enough to remember DOS batch files?
It was considered a form of non compiled programming language.
Seems everyone is right. Maybe even the troll. LOL Sorry, couldn't resist.

I think Starson17 hit the nail on the head with his suggestion about RegEx's.
His was also one of the few responses in that thread that was written in a polite, reasoned and dignified manner. You got nothing to apologize for Starson17.
Most of the rest were little or no better than the supposed troll, no matter how justified the people thought they were.
Have you had a lot of problems with trolls here?
Just curious since what I saw was someone frustrated with a program which almost did what they wanted, but who expressed themselves badly.
But, I have been told I tend to think the best of people.
On soapbox. What has happened to courtesy and respect for others?
I must be getting too old or lived outside the US for too long.
If you disagree with someone's statements, fine, but please respond in a courteous manner.
Otherwise the responder looks as ignorant, rude, stupid or crazy as the original.
As my parents always said "2 wrongs do not make a right". Steps off soapbox.

I respectfully disagree with Pepak, but only partially. Sorry Pepak.
I think a small list of the most commonly used expressions would be easily usable by the average user. Agreed, a large list would be too confusing.

I'm looking to import more than 8000 ebooks, and the collection I have, which has been collected from many sources, seems to have the file names in the following 7 formats.

Author - Series Series # - Title.ext
Author - Title - Series Series#.ext
Author - Title.ext
Last name, first name - Series Series # - Title.ext
Last name, first name - Title - Series Series #.ext
Last name, first name - Title.ext
These last 3 could be a b***h, but the comma delimiter should make it easy with RegEx.

Some with only the title, but those can easily be group edited once in Calibre to add the author and other info then the other metadata downloaded off the net.
7 expressions will import 99% of my collection.

The others can usually, and easily, be bulk edited with free tools I found on nonags.com to conform to one of these basic file patterns ,then added to Calibre.
Search for multi or bulk rename files. Lots of good ones there.

A small # of simple expressions might be really, really useful to most people and save Kovid, and others, some work.

Maybe a list of the ones above and a few others posted in a sticky would satisfy 90+% of the needs of the average user? It sure would save the people here, like yourselves, a lot of time.

Nice to meet you all, and I hope to make use of Calibre in the future.
At present the database/library functions don't do what I need them to, mainly on the output structure when saved, and the inability to create the metadata.db without reading and saving the input files, but there is certainly hope for the future and it's better than anything else I've seen.

Kovid, it looks like you are saving all the input files in the structure you decided on but referencing them with the incrementing (xxx) as part of the {title} folder name?
Wouldn't that limit the database to 1000 folders or are you allowing for more in another fashion?
It also seems to continue to increment from where it left off even when records are deleted. Do you plan on reusing the (xxx) pointers?
On the bright side since you are using the (xxx) as the location pointer it should be easy to implement different output folder structures and add without saving so we can all have our cake and eat it too.

Would it be better to simply index existing folders and add the (xxx) to the folder name so Calibre could find the files?
Pros and cons below.
This would allow building the metadata.db without saving the files to a new location.
Fast library updates. More flexibility and the ability to quickly and easily update the metadata from the file data if the metadata, as is the case in many of my files, is incorrect.

You could always give us a choice to structure and save when adding or just add to the database.
The present way has the advantage of correcting metadata within the ebook files that support it and for small collections makes a lot of sense.

If just adding, a user could later do a save to bring the errant files into the same folder/file structure as everything else and add/correct metadata within the ebook files and/or out to an OPF file.
With a large collection this might be the better way since the "download metadata and covers" function could be run first to update the library.

Is the Author's name only one field? Bet it is.
How difficult would it be to make it 2 fields so the output folders could be changed?
Currently {Author}, etc gives First Last.
2 fields allows Last, First which is how most databases are organized.
That would make data exchange with other programs much easier.

Please feel free to ignore or not implement any of these ideas.
I can easily do them in Access or even in Excel, but have no idea how to do them in Calibre so all I can do is make suggestions.

Sorry for the long post. I just had to get it all out.

Last edited by Disfrutalavida; 05-06-2010 at 03:55 AM. Reason: Spelling and grammar.
Disfrutalavida is offline   Reply With Quote