Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 06-16-2012, 11:23 PM   #1
dFGJByjm4898IssG
Member
dFGJByjm4898IssG began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Jun 2012
Device: None
Loading 5000 Technical Papers in PDF format Advice

I need some advice on how to load my technical directory into Calibre. I have over 5000 technical papers in PDF format. The files are all named by the paper number, for example, SPE000123456. Calibre loads all the files and extracts the paper title and the authors from the meta data and creates the Calibre library.

The problems I am trying to resolve are:

Calibre no longer keeps the original file name which contains the the paper number, which is a major reference that I need to find papers etc. So how can I incorporate the paper number when I load the files into Calibre.

Calibre generates the authors from the meta data but it contains non author information, for example:

Y. Cheng, SPE, West Virginia University
K.H. Coats, Coats Engineering; C.H. Whit - authors truncated

So how can I clean this up or import the files in a "clean" state.

Any advice would be appreciated.
dFGJByjm4898IssG is offline   Reply With Quote
Old 06-17-2012, 01:37 AM   #2
Divingduck
Fanatic
Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.
 
Posts: 540
Karma: 36672
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Build a new library. Open setup in the add book dialog (pic 1) setup the correct entries (no check for "read metadata for content rather then file name" (pic2) and take a matching regular expression. You can test it when you put under File name a live example i.e. "Dirty, And Quick - This in an example.pdf" and then press test. Then you will see what happen with your metadata.
Attached Thumbnails
Click image for larger version

Name:	Aufzeichnen1.JPG
Views:	64
Size:	54.6 KB
ID:	87863   Click image for larger version

Name:	Aufzeichnen2.JPG
Views:	92
Size:	136.0 KB
ID:	87864  
Divingduck is offline   Reply With Quote
 
Enthusiast
Old 06-17-2012, 04:19 AM   #3
dFGJByjm4898IssG
Member
dFGJByjm4898IssG began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Jun 2012
Device: None
Loading 5000 Technical Papers in PDF format Advice

Divingduck,

Thank you for replying.

But this mutually exclusive, you can either get the data from the meta data or from the file name, yes?

What I want to is to pickup the Title and Authors data from the meta data and at the same time pickup the paper number from the file name so that I can store the paper number, or as a sequence or a file name in Calibra. Is there a way to do this?

Again, thanks for replying.
dFGJByjm4898IssG is offline   Reply With Quote
Old 06-17-2012, 05:26 AM   #4
Divingduck
Fanatic
Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.
 
Posts: 540
Karma: 36672
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
no, you can do both. The trick is to use a other field for your file name.

Lets look at an example. Your file name is "10201155.pdf"
In the PDF you have define a title and an author. I use for this example the series field. You should create a custom column for your library to store the document number.

Go to the import dialog and put in (?P<series>.+) as regular expression, check that "read metadata for content rather then file name" is selected and apply the change. When you test it in the same window with file name "10201155.pdf" you will see the file name "10201155" in the field series. Now add one book for test (check before running that the metadata are in the file on the correct position).
Divingduck is offline   Reply With Quote
Old 06-17-2012, 09:29 AM   #5
dFGJByjm4898IssG
Member
dFGJByjm4898IssG began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Jun 2012
Device: None
That is very nice, and works like a charm for series. However, when I created a custom column called "reference" and use (?P<reference>^.{12}) to get the paper number it does not work.

Last edited by dFGJByjm4898IssG; 06-17-2012 at 10:21 AM.
dFGJByjm4898IssG is offline   Reply With Quote
Old 06-17-2012, 10:56 AM   #6
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,619
Karma: 5628865
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by dFGJByjm4898IssG View Post
That is very nice, and works like a charm for series. However, when I created a custom column called "reference" and use (?P<reference>^.{12}) to get the paper number it does not work.
Custom columns (when referenced) all start with a hashmark (#) Hover the mouse pointer over the column title.
theducks is offline   Reply With Quote
Old 06-17-2012, 02:11 PM   #7
Divingduck
Fanatic
Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.
 
Posts: 540
Karma: 36672
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Oh, I forgot to mention this.
@ theducks, thank you for completing this explanation.
Divingduck is offline   Reply With Quote
Old 06-19-2012, 10:35 AM   #8
dFGJByjm4898IssG
Member
dFGJByjm4898IssG began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Jun 2012
Device: None
Still not working. The test file is

spe00095428 The Application of Cutoffs in Integrated Reservoir Studies.pdf

If I use (?P<title>.{8}) I get 00095428 in the title field which is what I want.

I then create a Custom Column called Reference with a look up name of #reference

I then change regular expression to (?P<#reference>.{8}) and I get

calibre, version 0.8.56
ERROR: Unhandled exception: <b>error</b>:bad character in group name

Traceback (most recent call last):
File "site-packages\calibre\gui2\preferences\main.py", line 324, in commit
File "site-packages\calibre\gui2\preferences\adding.py", line 124, in commit
File "site-packages\calibre\gui2\widgets.py", line 149, in commit
File "site-packages\calibre\gui2\widgets.py", line 146, in pattern
File "re.py", line 190, in compile
File "re.py", line 242, in _compile
error: bad character in group name

What I am I doing wrong?

I appreciate very much your help.
dFGJByjm4898IssG is offline   Reply With Quote
Old 06-19-2012, 10:41 AM   #9
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,619
Karma: 5628865
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by dFGJByjm4898IssG View Post

I then create a Custom Column called Reference with a look up name of #reference
Just checking: The lookup name used iduring column creation does NOT start with a #
The # is used when YOU refer to a custom column name

BTW I don't use more than the basic import template, so I am not much help there.
theducks is offline   Reply With Quote
Old 06-19-2012, 12:33 PM   #10
chaley
"chaley", not "charley"
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 5,412
Karma: 821648
Join Date: Jan 2010
Location: France
Device: Many android devices
Calibre does not support using custom columns in the metadata extraction template regular expression. You must use one of the supported fields listed in the test box. Perhaps publisher is one you don't otherwise need so can use. After importing, you would use bulk metadata search/replace to copy the value to your custom column.
chaley is offline   Reply With Quote
Old 06-19-2012, 12:49 PM   #11
Divingduck
Fanatic
Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.Divingduck can successfully navigate the Paris bus system.
 
Posts: 540
Karma: 36672
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Yes, this is the way. I thought it is possible to do it in one step, but this didn't work. As charley mention you need to do a second step for moving the data in the right position.
Doing this, you should maybe integrate a second custom column what indicate the completeness of your metadata so that you are aware of what metadata you have already finished. I do this with a yes/no column.

Chary, thanks for helping out.

Edit: Here an example how to make a quick replacement from one to an other field

I use your regex (?P<publisher>.{8}) to move the extracted information in the Metadata Publisher
File name: spe00095428 calibre User Manual — calibre User Manual.pdf
Publisher will become "spe00095428"
In my pic below I import the file two times. Then mark the imported books and click on "edit metadata" and select tab search and replace and select in 'Search field' publisher and in 'Destination field' your custom field (here my name is '#alt_title'. After performing the change you will find the data in the right place. You can do then a second edit to clean up the field publisher.

Be careful and test it on an example. When you do it with a bigger count of books, be sure you have select the books you want to change. You can't reverse these action.
Attached Thumbnails
Click image for larger version

Name:	Aufzeichnen.JPG
Views:	61
Size:	121.3 KB
ID:	88011  

Last edited by Divingduck; 06-19-2012 at 02:15 PM.
Divingduck is offline   Reply With Quote
Old 06-24-2012, 05:08 AM   #12
dFGJByjm4898IssG
Member
dFGJByjm4898IssG began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Jun 2012
Device: None
Sorry for the late reply, I have been experimenting. :-)

I have loaded about one third of the papers and I used the series and series-index fields to keep the paper number. Just a few question:

1) I wanted to keep the leading zeros, but Calibre deleted them is there a way to do this?

2) After the load the Published date field contained the current date and not the date from the PDF. Any suggestions.

3) I was thinking of using the "Search Internet" plugin to load the Abstracts, DOI number etc from the OnePetro website

http://www.onepetro.org/mslib/app/Pr...ocietyCode=SPE

But I have not got it work. I do know that I need the leading zeros though. What I would like to do is just right click select OnePetro and load the information. Any pointers on this would be really appreciated.
dFGJByjm4898IssG is offline   Reply With Quote
Old 06-24-2012, 11:03 AM   #13
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,619
Karma: 5628865
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by dFGJByjm4898IssG View Post
Sorry for the late reply, I have been experimenting. :-)

I have loaded about one third of the papers and I used the series and series-index fields to keep the paper number. Just a few question:

1) I wanted to keep the leading zeros, but Calibre deleted them is there a way to do this?

2) After the load the Published date field contained the current date and not the date from the PDF. Any suggestions.

3) I was thinking of using the "Search Internet" plugin to load the Abstracts, DOI number etc from the OnePetro website

http://www.onepetro.org/mslib/app/Pr...ocietyCode=SPE

But I have not got it work. I do know that I need the leading zeros though. What I would like to do is just right click select OnePetro and load the information. Any pointers on this would be really appreciated.
If you want leading 0's kept, you must use a text field. Numeric fields drop nonsense leading 0's. 001 is still = 1
theducks is offline   Reply With Quote
Old 06-25-2012, 10:58 AM   #14
dFGJByjm4898IssG
Member
dFGJByjm4898IssG began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Jun 2012
Device: None
Okay, thanks.

I can this format this another way and I have used the "Search Internet" plugin to load the correct page for a given paper, the site address is:

http://www.onepetro.org/mslib/app/newSearch.do

And a manual address to get a paper is:

http://www.onepetro.org/mslib/app/Pr...ocietyCode=SPE

And the plugin address using the series_index is:

http://www.onepetro.org/mslib/app/Preview.do?paperNumber=SPE-{series_index:re(0$,)}-PA&societyCode=SPE

And this works.

My question is how to I extract the Title and other fields from the web page and place them in Calibre?

I have also posted this question on the "Search Internet" plugin forum as well.

Many thanks for your help.
dFGJByjm4898IssG is offline   Reply With Quote
Reply

Tags
meta data, pdf

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[PROBLEM]ePub format texts not loading lahonda_99 Astak EZReader 7 10-23-2010 02:03 AM
Technical eBook Layout Advice Reg22 Writers' Corner 7 08-20-2010 01:10 PM
Normal books and technical papers (PDFs with annotation support) NautilusIII Which one should I buy? 7 08-05-2010 04:42 AM
Have anyone read technical papers on iLiad? physics@war iRex 2 04-16-2009 02:23 PM
Viewing Technical Papers on reader: Newbie addepalli1 Sony Reader 14 01-27-2008 03:46 PM


All times are GMT -4. The time now is 07:38 AM.


MobileRead.com is a privately owned, operated and funded community.