Frequently Asked Questions - Page 24

eboyhan · 05-09-2010, 07:24 AM

Quote:

Originally Posted by itimpi

Would this meet your requirements?

It might, I'll have to delve deeper into Calibre. Since I have over 1000 print books, is there any way to bulk import ISBNs? I can easily get some or all of the metadata maintained by Gurulib exported into Excel, CSV, or XML formats.

Okay, since my original answer here I have delved into the calibre user manual more deeply, played around with the calibre "add empty book" facility, the "edit metadata in bulk", calibredb.exe, etc.

One problem that I found right off the bat is that the calibre user manual predates when the add empty book capability was added. I can only find a few posts (after googling around) related to the add empty book facility -- and they weren't particularly helpful in describing how one might add empty books in bulk. Also I could find no documentation at all as to how one might use calibredb to add empty books.

@itimpi you said in one of your answers to a post elsewhere, that you no longer had to create dummy files, it would seem to me that if one has say 1000 empty books to create, and one does not want to manually enter metadata for all of these, then one approach would be to create 1000 dummy files whose filenames create a unique metadata signature. Ideally, if this is the approach I must take, I would like the files to have the ISBN number as their file name, and create a script using calibredb that would somehow add the 1000 files as empty books --each with their appropriate ISBN number. Unfortunately from the limited documentation that I have been able to find, I cannot find any easy way to link a file name with the ISBN metadata field.

Any help that anyone could give would be appreciated. By the way I'm not adverse to writing a script to accomplish this, I just need some pointers on how to get started. An example or two of a command line that adds an empty book and gives it an ISBN metadata entry would be really helpful here.

Anyhow, thanks for your prompt response

jeanniespc · 05-13-2010, 04:18 PM

After you change a book with the metadata....do you have to convert it? What do you do to get it back on the kindle?

Jeannie

Jedai · 05-15-2010, 04:33 AM

@eboyhan : A suggestion without a script (or at least only to create the dummy files from a list of ISBN) would be to fiddle with the import regex in Preference>Add/Save>Adding and then import all those books at once, you can then bulk import the metadata.

Yeti · 05-22-2010, 06:53 AM

Hi all,

I am new to the wonderful world of e-readers, have just bought a Kindle 2i. I am having a problem with converting PDF to MOBI using Calibre. I have spent a few hours trying to find a solution, but surprisingly have not seen it mentioned in any of the FAQ's or the Calibre user manual or web site- am I the only one who's having this problem?

What is happening is that in the converted MOBI document every four or five pages the text is interrupted by the following text:
"Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html"
Very disrupting! I have followed the link to ABC Amber's web site, and its FAQ tells me "The registered version removes all our banners, labels and ads." I would be happy to purchase their software and register it, but I use a Mac and the software appears to be for PC's.

I must be missing something, can anyone help?

Thanks in advance, and thanks for Calibre Kovid and all who have contributed.

Yeti.

Starson17 · 05-22-2010, 08:33 AM

Quote:

Originally Posted by Yeti

What is happening is that in the converted MOBI document every four or five pages the text is interrupted by the following text:
"Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html"

Your pdf was previously converted from another format into a pdf. ABC Amber LIT Converter was used to do that job, and it put the objectionable text into your pdf as a header. Calibre is just converting all of your pdf. As to how to fix it, you need to tell Calibre you don't want that text.

Try here.

DoctorOhh · 05-22-2010, 04:40 PM

[QUOTE=Starson17;922230]

Quote:

Originally Posted by Yeti

What is happening is that in the converted MOBI document every four or five pages the text is interrupted by the following text:
"Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html"

Quote:

Originally Posted by Starson17

Your pdf was previously converted from another format into a pdf. ABC Amber LIT Converter was used to do that job, and it put the objectionable text into your pdf as a header. Calibre is just converting all of your pdf. As to how to fix it, you need to tell Calibre you don't want that text.

One way I might do it is to put a directory in debug during conversion. After conversion I would grab the original html out of the folder, open it in notepad++ or other editor then find and replace Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html.

Since you might have many other books with this problem learning the method Starson17 linked to will benefit you in the long run.

Good Luck.

Yeti · 05-22-2010, 09:55 PM

Aha ... I am so pleased it wasn't something really dumb and obvious, I feel better now

Thank you Starson. After some trial and error I managed to get rid of the offending text. I still seem to have extra page-breaks where the text was, but I can live with that. Great!

dwanthny, thank you too, I get the gist of what you're saying although I don't follow completely. Notepad++ must be a PC application? Anyway, I will go with your advice and use the method that Starson provided the link for; I have made a note of it for future reference.

All this trial and error with Calibre has made me realize there is a lot of power hidden underneath its uncluttered-looking bonnet. Very nice software.

Yeti.

Starson17 · 05-23-2010, 09:14 AM

Quote:

Originally Posted by Yeti

Thank you Starson. After some trial and error I managed to get rid of the offending text. I still seem to have extra page-breaks where the text was, but I can live with that. Great!

IIRC, there are a pair of breaks around the offending text. They should show up in the wizard. What regular expression did you use to get rid of the junk? You may just have to add to the beginning and/or end to get rid of the page break. IIRC, after that thread I sent you to was written, Calibre was revised to work better in multipage settings.

Yeti · 05-23-2010, 06:13 PM

IIRC?

Starson, I am assuming your question is directed at me, Yeti? I'll answer anyway. Not having any idea about Regex or programming on anything like that, I simply followed instructions I found in the thread. I tried some of mshneour's expressions, but they didn't highlight anything in the wizard so I then tried Kovid's suggestion from his first reply (#2) in the thread:- Generated by.*abclit.html. That seemed to highlight most of the offending text I was trying to get rid off, so I used that, and the result is quite satisfactory, I can live with the extra page breaks. Thanks again.

Yeti.

Starson17 · 05-23-2010, 06:33 PM

Quote:

Originally Posted by Yeti

IIRC?

It stands for If I Recall Correctly.

Quote:

Starson, I am assuming your question is directed at me, Yeti? I'll answer anyway. Not having any idea about Regex or programming on anything like that, I simply followed instructions I found in the thread. I tried some of mshneour's expressions, but they didn't highlight anything in the wizard so I then tried Kovid's suggestion from his first reply (#2) in the thread:- Generated by.*abclit.html. That seemed to highlight most of the offending text I was trying to get rid off, so I used that, and the result is quite satisfactory, I can live with the extra page breaks. Thanks again.

Yeti.

If you looked at the "highlighted text" you should see the near it - probably just before or just after the stuff you are trying to remove. That's the code for a "break" and if you adjust your regex to get it highlighted, the extra page break will disappear. From your post above, it's clear that the "regex" you are currently using is :
Generated by.*abclit.html

A "regex" is a regular expression that defines text to be matched. Your regex has the following meaning: match any text that starts with the phrase "Generated by" followed by zero or more characters (the .* part) followed by "abclit" followed by a single character (the . part) followed by "html."

I was suggesting that it's not that hard to get rid of the extra page break by changing the regex slightly so that it also matches (and therefore highlights) the part (assuming it's there). It's just a matter of adding and maybe another character or two to your regex. If you don't care about the extra page break, ignore this, but if you want to get rid of it, post the text that surrounds your highlighted text (it will probably include as discusssed above) and someone will help you get a better regex. It will look something like this:

 Generated by.*abclit.html 

but perhaps not exactly like that, depending on what text is in your book. I was just pointing out that it's a tiny change and easy to make.

Yeti · 05-24-2010, 12:20 AM

Quote:

Originally Posted by Starson17

It stands for If I Recall Correctly.

Doh! Fifteen years + on the internet and I can't remember seeing that one before, even on the good old BBSs.

Ok, bit of a learning curve here, I am trying. Hopefully I will get to read the book eventually ...

Interesting to notice how some things - like the offending text we are talking about here - don't show up in the PDF before conversion and then suddenly appear in the MOBI afterwards ...

I just noticed also that neither the PDF before conversion, nor the MOBI afterwards have any italic print. I have the paper version of this book and, like all books it uses italics for emphasis, to indicate someone's train of thought, for foreign language and so on. This is quite important for a better understanding of the story, and would be nice to correct if possible too. But quite likely it was lost in creating the original PDF version?

Now, trying to get rid of the extra page breaks:

I tried using the expression Generated by.*abclit.html , but it does not highlight anything in the wizard. I also tried leaving off the , first at the start, then at the end - no luck, it does not highlight anything. Here is a copy-and-paste of a section of the text from the wizard after using the expression Generated by.*abclit.html :

... Central Intelligence Agency. He Generated by ABC Amber LIT Conv<a href="http://www.processtext.com/abclit.html">erter, http://www.processtext.com/abclit.html</a>
was also at this moment ...

and this is the part that gets highlighted by the wizard:

Generated by ABC Amber LIT Conv<a href="http://www.processtext.com/abclit.html">erter, http://www.processtext.com/abclit.html

As I have said, I can live with the extra page breaks, and even the lack of italics, but if anyone still feels like playing, I am open for other suggestions. Thanks again.

Yeti.

Starson17 · 05-24-2010, 09:29 AM

Quote:

Originally Posted by Yeti

Interesting to notice how some things - like the offending text we are talking about here - don't show up in the PDF before conversion and then suddenly appear in the MOBI afterwards ...

I believe that means it's been hidden in the pdf. It's there, but not being displayed until conversion makes it reappear.

Quote:

I just noticed also that neither the PDF before conversion, nor the MOBI afterwards have any italic print. I have the paper version of this book and, like all books it uses italics for emphasis, to indicate someone's train of thought, for foreign language and so on. This is quite important for a better understanding of the story, and would be nice to correct if possible too. But quite likely it was lost in creating the original PDF version?

Yes, it was probably stripped during conversion. I don't know why, as a good conversion wouldn't have done that.

Quote:

Now, trying to get rid of the extra page breaks:
I tried using the expression Generated by.*abclit.html , but it does not highlight anything in the wizard.

I didn't think it would. Without seeing the text you want removed, and the codes around it, that was just a guess.

Quote:

I also tried leaving off the , first at the start, then at the end - no luck, it does not highlight anything.

That's also not surprising - you don't have any codes

Quote:

Here is a copy-and-paste of a section of the text from the wizard after using the expression Generated by.*abclit.html :

... Central Intelligence Agency. He Generated by ABC Amber LIT Conv<a href="http://www.processtext.com/abclit.html">erter, http://www.processtext.com/abclit.html</a>
was also at this moment ...

Try this:

Code:

<b>Generated by.*abclit.*<p>

That may not do it, as I don't see the part causing the break. I think that will just remove some empty bold tags, and an empty paragraph - extra line. The part causing the page break may be in a part of the text you didn't post. If it's not bothering you, you don't need to go any further, but learning a bit about basic regex use can be helpful if you are going to use Calibre over the long term.

Starson17 · 05-24-2010, 10:15 AM

Quote:

Originally Posted by Starson17

I don't see the part causing the break. ... The part causing the page break may be in a part of the text you didn't post. If it's not bothering you, you don't need to go any further

I didn't see the part causing the break because there is no break code, and I was looking at your post in the forum reply editor (where there are already lots of extra breaks, so I couldn't see the one in your text.) Your page break problem is solvable, but I think it would require a multiline match, and that's probably more than you want to go into in your first regex attempt. I looked at similar "Converted by" text in one of my books and it had tags in it, which is why I initially thought the match would be easy for you.

LateAdopter · 05-27-2010, 02:13 PM

I just upgraded to Calibre 0.6.54 (from 0.6.51). Now, when I send a .TXT file to my device (Sony PRS-600), the filename and author change on the next device reset to blank for the Author, and title-author_XXX for the title. A concrete example: file "Adventures of Tom Sawyer.txt; Title "Adventures of Tom Sawyer"; Author "Twain, Mark". I copy this to the PRS-600 main memory successfuly, and it shows in the display correctly. After ejecting the device and allowing it to reset, the new book has the title "Adventures of Tom Sawyer - Twain, Mark_139" and author "Unknown".

Since this did work on previous versions, it loads all other types I tried as expected (EPUB, LIT, RTF) and I saw no other mention of the issue, I am guessing that I have set some variable incorrectly. Thoughts, anyone?

Steve

chaley · 05-27-2010, 05:43 PM

Quote:

Originally Posted by LateAdopter

Since this did work on previous versions, it loads all other types I tried as expected (EPUB, LIT, RTF) and I saw no other mention of the issue, I am guessing that I have set some variable incorrectly. Thoughts, anyone?

Are you sure it worked for text files? The reason I ask is that a Sony rebuilds its private database when you disconnect from the computer, cleaning up the metadata in ways that it thinks necessary. For example, on my 300, multiple authors always get truncated to one author. It seems to do this cleanup by looking for metadata in the files, and because text files have no metadata, the author field is cleaned. On my 300, it becomes empty, not 'Unknown'.

05-13-2010, 04:18 PM	#347
jeanniespc Member Posts: 11 Karma: 10 Join Date: May 2010 Location: NC Device: Kindle 2, Nook, Sony PRS300	calibre with Kindle2 After you change a book with the metadata....do you have to convert it? What do you do to get it back on the kindle? Jeannie

05-27-2010, 02:13 PM	#359
LateAdopter Junior Member Posts: 1 Karma: 10 Join Date: May 2010 Device: Sony PRS-600	Issue with names of TXT files copied to device I just upgraded to Calibre 0.6.54 (from 0.6.51). Now, when I send a .TXT file to my device (Sony PRS-600), the filename and author change on the next device reset to blank for the Author, and title-author_XXX for the title. A concrete example: file "Adventures of Tom Sawyer.txt; Title "Adventures of Tom Sawyer"; Author "Twain, Mark". I copy this to the PRS-600 main memory successfuly, and it shows in the display correctly. After ejecting the device and allowing it to reset, the new book has the title "Adventures of Tom Sawyer - Twain, Mark_139" and author "Unknown". Since this did work on previous versions, it loads all other types I tried as expected (EPUB, LIT, RTF) and I saw no other mention of the issue, I am guessing that I have set some variable incorrectly. Thoughts, anyone? Steve

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Frequently Asked Questions (and answers too!)	Stinger	Kobo Reader	112	05-12-2017 11:40 AM
PRS-300 Reader freezing frequently	paddy77	Sony Reader	15	01-17-2011 02:33 AM
PRS-600 Do you frequently read PDFs on your PRS600?	drmaxx	Sony Reader	20	09-22-2009 07:15 PM
Questions we wish we had asked before buying a Reader	Dr. Drib	Sony Reader	15	05-22-2009 06:13 AM
Three not asked earlier questions about iLiad	Malder1	iRex	9	08-14-2006 02:10 PM

05-15-2010, 04:33 AM	#348
Jedai Member Posts: 10 Karma: 10 Join Date: Oct 2009 Device: iPhone, Sony Touch Reader	@eboyhan : A suggestion without a script (or at least only to create the dummy files from a list of ISBN) would be to fiddle with the import regex in Preference>Add/Save>Adding and then import all those books at once, you can then bulk import the metadata.

05-22-2010, 06:53 AM	#349
Yeti Member Posts: 12 Karma: 26 Join Date: Jul 2009 Location: Queensland, Australia Device: Kindle 2i	Hi all, I am new to the wonderful world of e-readers, have just bought a Kindle 2i. I am having a problem with converting PDF to MOBI using Calibre. I have spent a few hours trying to find a solution, but surprisingly have not seen it mentioned in any of the FAQ's or the Calibre user manual or web site- am I the only one who's having this problem? What is happening is that in the converted MOBI document every four or five pages the text is interrupted by the following text: "Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html" Very disrupting! I have followed the link to ABC Amber's web site, and its FAQ tells me "The registered version removes all our banners, labels and ads." I would be happy to purchase their software and register it, but I use a Mac and the software appears to be for PC's. I must be missing something, can anyone help? Thanks in advance, and thanks for Calibre Kovid and all who have contributed. Yeti.

05-22-2010, 09:55 PM	#352
Yeti Member Posts: 12 Karma: 26 Join Date: Jul 2009 Location: Queensland, Australia Device: Kindle 2i	Aha ... I am so pleased it wasn't something really dumb and obvious, I feel better now Thank you Starson. After some trial and error I managed to get rid of the offending text. I still seem to have extra page-breaks where the text was, but I can live with that. Great! dwanthny, thank you too, I get the gist of what you're saying although I don't follow completely. Notepad++ must be a PC application? Anyway, I will go with your advice and use the method that Starson provided the link for; I have made a note of it for future reference. All this trial and error with Calibre has made me realize there is a lot of power hidden underneath its uncluttered-looking bonnet. Very nice software. Yeti.

05-23-2010, 06:13 PM	#354
Yeti Member Posts: 12 Karma: 26 Join Date: Jul 2009 Location: Queensland, Australia Device: Kindle 2i	IIRC? Starson, I am assuming your question is directed at me, Yeti? I'll answer anyway. Not having any idea about Regex or programming on anything like that, I simply followed instructions I found in the thread. I tried some of mshneour's expressions, but they didn't highlight anything in the wizard so I then tried Kovid's suggestion from his first reply (#2) in the thread:- Generated by.*abclit.html. That seemed to highlight most of the offending text I was trying to get rid off, so I used that, and the result is quite satisfactory, I can live with the extra page breaks. Thanks again. Yeti.

Advert

Advert