![]() |
#256 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,377
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
CALIBRE_DEVELOP_FROM will affect ebook-convert as well. If you want to understand the sequence of operations involved in a conversion look at the run method in the file plumber.py
|
![]() |
![]() |
![]() |
#257 |
Zealot
![]() ![]() ![]() Posts: 128
Karma: 278
Join Date: Jun 2008
Device: Kindle; PRS-500; MobiPocket on Windows Mobile
|
|
![]() |
![]() |
Advert | |
|
![]() |
#258 |
Connoisseur
![]() Posts: 53
Karma: 52
Join Date: Apr 2008
Device: Kindle
|
Google Book PDF to MOBI
>I have a 636 page pdf google book that I added to the calibre library, and now am trying to add to my Kindle, converted to mobi. The job seems to be stuck, but I'm not sure how long the conversion should take. I'm not sure whether to stop the process and start over, or whether I've done something wrong.
The original google pdf book consists of 636 bit map scanned pages of the original book -- that is what you are looking at if you open the PDF in Adobe Reader, for example. The file also contains OCR attempted conversions of the words on the page in order that you can attempt to search on the contents of that book. If you ask calibre to convert this google pdf to a mobi file format calibre tries to convert it ALL -- meaning that it has 636 huge bitmap images of pages to convert to 636 huge bit map images in the MOBI file. Is this really what you intended to try to do? The other option, which may be more sensible, depending on your needs, is instead of downloading the google PDF version of the book, download the google EPUB version of the book -- which just contains the google OCR results. This calibre can quickly and easily convert to MOBI. However if the OCR contains scan errors -- which it will -- the results may be usable for your needs, or not. Recently I've found the google OCR efforts to be pretty good. I have a Kindle DX, so I can read the google PDF files directly, looking at the original bitmap page scans, or I can use calibre to convert the EPUB format to MOBI and read that. For my purposes, both approaches work pretty well, and both had their advantages and disadvantages. Reading the PDF retains nuances of the original text -- including the occassional scanners thumb and decades of student's scribble marks, but the scan process tends to render text a little on the heavy and blurry side . Converting the EPUB results in a "real" e-book, where the fonts are clear, can be resized, reflowed, etc -- but now contains some scannos which one must "read around". Also, if you were successful in converting those 636 bitmap page images to MOBI file format, if you read on a smaller Kindle such as the International, you may find the pages have shrunk to a size small enough to make reading uncomfortable -- depending on the strength of your eyes and/or your reading glasses. Cheers! |
![]() |
![]() |
![]() |
#259 |
Zealot
![]() ![]() ![]() Posts: 128
Karma: 278
Join Date: Jun 2008
Device: Kindle; PRS-500; MobiPocket on Windows Mobile
|
OK, thanks to Kovid's help above, I was able to figure out why converting a book from OEB to Mobipocket strips out all of the <reference> links in the <guide> section of the OPF file except one Table of Contents link and a cover link. It turns out the OEB input contains a module named guide.py, in the directory:
src/calibre/ebooks/oeb/transform which actually has the specific function of stripping everything but one cover and one TOC link out of the <guide> section of the input OPF file. Commenting out the last three lines of the guide.py file (and of course setting the CALIBRE_DEVELOP_FROM variable to the c:\calibre\src directory, if that's where the calibre source is) fixes the problem: Code:
#if x.lower() not in ('cover', 'titlepage', 'masthead', 'toc', # 'title-page', 'copyright-page', 'start'): #self.oeb.guide.remove(x) Last edited by chorpler; 01-03-2010 at 11:19 PM. Reason: Forgot you had to comment out the last THREE lines, not just the last line, or it complains about indentation and whatnot |
![]() |
![]() |
![]() |
#260 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 765
Karma: 2825929
Join Date: Feb 2007
Location: Fresno
Device: Kindle 1; iPad Air; iPhone 7; Kobo Libra; Kindle Oasis 3
|
I've read all 18 pages of this thread and the user manual and haven't found a discussion of the problem I'm having converting prc documents to mobi. If a paragraph begins with an italicized word, it's not indented. This applies to any poetry that's quoted that is also italicized, that would normally be indented a few spaces from the left margin. Any way of getting around this?
Jim |
![]() |
![]() |
Advert | |
|
![]() |
#261 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 328
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
MOBI output from news feed
I'm seeing a TOC problem with MOBI output from a news feed. If an article consists of a headline, author, picture (jpeg) and then the article text, the MOBI TOC entry for that article takes you to a MOBI page that starts with the picture (this is true when viewing in Calibre, with MOBIpocket reader and on Kindle). However, if you advance to this article from the previous article via "next page" you get the headline, subhead and byline followed by the picture. If you advance to this article via "next article" you get the same behaviour as from the TOC. So, for some reason, the top of this article is set to the picture, not the headline.
Articles that don't start with a picture don't have this issue. Accessing them from the TOC or via "next article" gives you the headline, etc. Interestingly, when I get HTML output from the news feed, it is structured correctly. Any ideas? |
![]() |
![]() |
![]() |
#262 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,377
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The TOC and next article links are generated from the .ncx file. Use the
--debug-pipeline option and check if the NCX file is linking to the correct place. |
![]() |
![]() |
![]() |
#263 | |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 328
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Quote:
|
|
![]() |
![]() |
![]() |
#264 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,377
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The code to generate the MOBI TOC is in the file calibre/ebooks/mobi/writer.py
have a look and ask if you have more questions. |
![]() |
![]() |
![]() |
#265 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 328
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
I have determined that in the case of an article that is indexed correctly, the MOBI file has a bookmark placed in a <DIV id="filepos...."> tag before the navigation bar and article contents, whereas for an article with a picture between the headline and body, the bookmark is placed in a <a id="filepos..."> tag immediately before the <p class=...><img> tags representing the picture. That is why the TOC links to the picture, not the nav bar and headline. I'm not having any luck figuring out from the source writer.py why this is. Any suggestions?
|
![]() |
![]() |
![]() |
#266 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,377
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
look at the html in the input subdirectory from using --debug-pipeline is there a difference in the two cases that corresponds to the difference in the mobi files?
|
![]() |
![]() |
![]() |
#267 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 328
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
There is no difference except for the photo. Here is the input html for the article that is not indexed properly
Code:
<div class="navbar calibre_rescale_70" style="text-align: center;"> | <a href="../article_1/index.html">Next</a> | <a href="../index.html#article_0">Section menu</a> | <a href="../../index.html#feed_0">Main menu</a> | <hr /></div> <div id="storyheader"> <div class="headline"> <h1>Symphony Splash seeks sponsor for Victoria's most popular public event</h1> </div> <div class="subheadline"> <h2>$75,000 needed for free outdoor concert; players to take salary cut</h2> </div> <div class="byline"> <span class="name">By Jim Gibson, Times Colonist</span><span class="timestamp">January 19, 2010</span></div> </div> <div id="storycontent" class="para18"> <div id="imageBox"> <div class="wrapper_0_10_0_0"> <div class="storyimage" id=""> <a href="javascript:void(0);" onclick="tabClick(' - Photos Tab',false,'storypage','story_photo_content',true,true);"> <img id="storyphoto" class="thumbnail" border="0" alt="The 2009 event of Symphony Splash drew an estimated 40,000 people to the Inner Harbour on Aug. 2." src="images/img2.bin.jpg" /></a></div> <div class="imagetext"> <h1 id="photocaption">The 2009 event of Symphony Splash drew an estimated 40,000 people to the Inner Harbour on Aug. 2.</h1> <h2 id="photocredit"><b>Photograph by: </b>Adrian Lam, Times Colonist</h2> </div> </div> </div> <div id="page1"> <p>Symphony Splash, Victoria's most popular public event, is looking for a new sponsor.</p> Code:
<hr class="calibre5"/> <p class="calibre6"> <span class="calibre3"> <span class="bold">Symphony Splash seeks sponsor for Victoria's most popular public event</span> </span> </p> <p class="calibre6"> <span class="calibre3"> <span class="bold">$75,000 needed for free outdoor concert; players to take salary cut</span> </span> </p> <p class="calibre6">By Jim Gibson, Times Colonist</p> <p class="calibre7">January 19, 2010</p> <a></a> <a id="filepos970"></a> <p class="calibre7"> <img src="images/00006.jpg" class="calibre8"/> </p> <p class="calibre9"> <span class="calibre3"> <span class="bold">The 2009 event of Symphony Splash drew an estimated 40,000 people to the Inner Harbour on Aug. 2.</span> </span> </p> <a></a> <p class="calibre10"> <span class="calibre3"> <span class="bold">Photograph by: Adrian Lam, Times Colonist</span> </span> </p> <a></a> <p class="calibre11">Symphony Splash, Victoria's most popular public event, is looking for a new sponsor.</p> <p class="calibre11">The Victoria Symphony's free outdoor concert, which drew an estimated 40,000 people to the Inner Harbour last Aug. 2, needs a replacement for Bayview Residences, the title sponsor for the last three years.</p> <p class="calibre11">Bayview says it will continue to make "a significant contribution" to Splash, but not as title sponsor.</p> Code:
<div class="navbar calibre_rescale_70" style="text-align: center;"> | <a href="../article_2/index.html">Next</a> | <a href="../index.html#article_1">Section menu</a> | <a href="../../index.html#feed_0">Main menu</a> | <a href="../article_0/index.html">Previous</a> | <hr /></div> <div id="storyheader"> <div class="headline"> <h1>Handling of domestic violence overhauled</h1> </div> <div class="subheadline"> <h2>B.C. pressured to act after gaps in services cited in murder-suicide</h2> </div> <div class="byline"> <span class="name">By Rob Shaw and Lindsay Kines, Times Colonist</span><span class="timestamp">January 19, 2010</span></div> </div> <div id="storycontent" class="para18"> <div id="page1"> <p>The B.C. government unveiled changes yesterday to the way police and Crown prosecutors handle domestic violence cases, but critics say it's not enough to plug holes in the system.</p> <p>The province will help pay for a Greater Victoria regional domestic violence unit, launch a B.C. Coroners Service panel to review domestic violence homicides and try to better co-ordinate policies between Crown and police in the wake Code:
<div id="filepos6685" class="calibre1"> <p class="calibre2"> <span class="calibre3"> <tt class="calibre4"> | </tt> </span> <a href="#filepos13231"> <span class="calibre3"> <tt class="calibre4">Next</tt> </span> </a> <span class="calibre3"> <tt class="calibre4"> | </tt> </span> <a href="../index.html#article_1"> <span class="calibre3"> <tt class="calibre4">Section menu</tt> </span> </a> <span class="calibre3"> <tt class="calibre4"> | </tt> </span> <a href="../../index.html#feed_0"> <span class="calibre3"> <tt class="calibre4">Main menu</tt> </span> </a> <span class="calibre3"> <tt class="calibre4"> | </tt> </span> <a href="#filepos970"> <span class="calibre3"> <tt class="calibre4">Previous</tt> </span> </a> <span class="calibre3"> <tt class="calibre4"> | </tt> </span> </p> <hr class="calibre5"/> <p class="calibre6"> <span class="calibre3"> <span class="bold">Handling of domestic violence overhauled</span> </span> </p> <p class="calibre6"> <span class="calibre3"> <span class="bold">B.C. pressured to act after gaps in services cited in murder-suicide</span> </span> </p> <p class="calibre6">By Rob Shaw and Lindsay Kines, Times Colonist</p> <p class="calibre7">January 19, 2010</p> <a></a> <p class="calibre11">The B.C. government unveiled changes yesterday to the way police and Crown prosecutors handle domestic violence cases, but critics say it's not enough to plug holes in the system.</p> <p class="calibre11">The province will help pay for a Greater Victoria regional domestic violence unit, launch a B.C. Coroners Service panel to review domestic violence homicides and try to better co-ordinate policies between Crown and police in the wake of a tragic 2007 murder-suicide in Oak Bay, said B.C. Solicitor General Kash Heed.</p> |
![]() |
![]() |
![]() |
#268 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 328
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
MOBI output from news feed--workaround
I have discovered what causes this problem with MOBI output from news feeds. If there is a DIV tag with an empty string for an id, i.e.
Code:
<DIV id=""> ... </DIV> The workaround is to use preprocess_html to find DIVs with id="" and delete the id attribute. I have been unable to figure out where in the Calibre code this is happening and unfortunately, I'm giving up on this. I've looked in all of the obvious places, and short of exhaustively going through every single source file in Calibre I don't see any way for me to track it down. Perhaps someone who is familiar with the deep-down internals can take this cause and find the error in the code. |
![]() |
![]() |
![]() |
#269 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,377
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
GRiker (who wrote the MOBI toc code) will take a look at it when he has time.
|
![]() |
![]() |
![]() |
#270 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,409
Karma: 4132096
Join Date: Sep 2008
Device: Kindle Paperwhite/iOS Kindle App
|
I am having a lot of trouble with my mobi conversions. The conversion to LRF was much easier and more reliable. Specifically:
1) On books downloaded from here which are already in mobi, it is not always loading to the Kindle with correct metadata. So if I download it and it says something like 'author unknown' and I use the 'edit metadata' command to fix this and add in the author, it won't use that as the author when transferred to the Kindle. This has happened with maybe 4 out of 50 books. 2) It cannot justify the text when converting an LRF file. I badly want the text to be justified. I can't stand reading it with the ragged edges. I have tried ticking and unticking every box in there to no avail. I wish there was a checkbox to over-ride whatever the file says and force it to always justify the text when it converts. 3) On my 'liberated' eReader files, all of which have been converted to HTML using the exact same process: some of them convert with no glitches. Some have the 'do not justify' box pre-checked when I go into the options and some don't. Some will not justify at all no matter what I do. I am baffled. These files were all created the same way so why should they have different options and some will convert properly and some will not? What I am doing is running the decoder script, taking the resulting HTML file and opening it in a web browser. I select all, copy and paste into a Neo Office document. Then I save that as HTML, open THAT in a web browser and select all/copy again (to get HTML that is free from the funky coding you get from an Office suite), then paste that into Kompozer, an HTML program. I save the file, fun Text Wrangler to search for extra line breaks and remove them, then do a last save. These all are clean HTML files with no extra frills and all converted to LRF beautifully. But to mobi, it is hit or miss and I am just baffled as to why. Can anyone help me? I just want plain, simple book files where everything is justified. Why is this so hard? What am I doing wrong? I am on a mac fwiw. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
main menu, section menu, css for calibre mobipocket output | naisren | Calibre | 2 | 08-23-2010 11:42 PM |
Trying to get consistent look to all output | daveps | Calibre | 0 | 03-08-2010 02:18 PM |
Anyone Have mobipocket desktop? Mobipocket server is down. | Ireadfreely | Kindle Formats | 3 | 10-27-2008 10:29 AM |
convert from 'new' mobipocket to 'old' mobipocket? | Indigo Ink | Kindle Formats | 11 | 06-22-2008 01:43 AM |
Mobipocket Reader 4.8 and Mobipocket eNews Creator | Mobipocket | Reading and Management | 1 | 01-29-2004 08:03 AM |