Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 09-30-2008, 12:58 PM   #16
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by wallcraft View Post
Under the hood, this is a two step process and sometimes you might want to edit the intermediate HTML. MobiPocket Creator should leave behind its intermediate OEB ebook files, but you can also just use the command line version pdf2xml, see Mobipocket convert in mass?.
Nice to know, but I *do* like the complete run through to get the .prc actually produced. I'm a lazy bugger...

Dumb question time: What good is the .xml if I already have the .prc and .opf with .html from the Mobipocket Creator Import?

I've never used .xml files (my Word is 'stuck' at Word 2003). Which program converts these and/or reads them for further processing?

Would you say they are better for storage, portability or something else? I'm not in the know here. Any info would be appreciated!

Thanks!
nrapallo is offline   Reply With Quote
Old 09-30-2008, 01:43 PM   #17
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
You can add the import and export of XML files to your Word 2003 (Actually all the way back to Word 2000) via a free download from Microsoft. They are pushing their new docx format. See the wiki.

Dale
DaleDe is offline   Reply With Quote
Advert
Old 10-03-2008, 09:20 AM   #18
sasilk
Connoisseur
sasilk began at the beginning.
 
sasilk's Avatar
 
Posts: 75
Karma: 14
Join Date: Jun 2008
Location: Australia
Device: iPad Pro 12"; Kindle Paperwhite
For the iLiad users...

I found a nice easy way of outputting PDF files so that they're readable on the iLiad. You can create your own PDF format styles for the PDF printer. So I created one for a paper size and margins that fir my iLiad, with a font that I liked to read. Then all you have to do is print whatever format you have to the PDF printer using that style and it will create a file that works on your iLiad.
sasilk is offline   Reply With Quote
Old 05-04-2009, 12:59 PM   #19
chrisophus
Junior Member
chrisophus began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Aug 2008
Device: Kindle
Complex PDF to HTML

I wrote a python script which converts the output of pdf2xml to html and attempts to maintain formatting of complex pdf's. I then use calibre to generate the ebook format (mobi in my case). It seems to work pretty well. You can read more about it on my blog at http://talkings.org/2009/05/03/complex-pdf-html/.
chrisophus is offline   Reply With Quote
Old 05-04-2009, 06:38 PM   #20
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Cool, it was always in the back of my mind to write a script to implement column detection and a few other goodies form the output of pdf2xml, but I never found the time/motivation.

I'll be willing to integrate this into calibre (after the 0.6 release), so open a ticket and attch your script. Integration will depend on how easy it is to compile pdf2xml on various platforms.
kovidgoyal is offline   Reply With Quote
Advert
Old 05-04-2009, 10:35 PM   #21
chrisophus
Junior Member
chrisophus began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Aug 2008
Device: Kindle
That sounds good. What time frame are you looking at? I still need to do some work on it to automate detection of more aspects of the content.
chrisophus is offline   Reply With Quote
Old 05-04-2009, 10:54 PM   #22
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
0.6 will take another couple of moths, so there's no rush
kovidgoyal is offline   Reply With Quote
Old 05-06-2009, 01:21 PM   #23
chrisophus
Junior Member
chrisophus began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Aug 2008
Device: Kindle
I am pretty happy with the progress I've made in the last couple of days. It seems to be working with almost anything I throw at it. I am adding a lot of options to customize how it handles the formatting. I'll post again when I have a new version up. I wish I had a better name than cxpdfhtml.py...
chrisophus is offline   Reply With Quote
Old 05-06-2009, 01:28 PM   #24
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
the name i used for my abortive attempt was pdfreflow.py
kovidgoyal is offline   Reply With Quote
Old 05-08-2009, 03:39 AM   #25
tlc
Zealot
tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!
 
Posts: 140
Karma: 50288
Join Date: Feb 2009
Device: KK 3G, iPad
My interest is just getting better reflowable paragraphs on fiction. I tried cxpdfhtml.py on a novel and was surprised at how well the "break on short lines" approach worked, although I haven't read in depth to find the not-short-enough lines.

I was wondering if you are considering (or anyone else has implemented) detection of paragraphs based on indentation?
tlc is offline   Reply With Quote
Old 05-10-2009, 12:01 AM   #26
chrisophus
Junior Member
chrisophus began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Aug 2008
Device: Kindle
Actually it does use indentation to detect paragraphs. Basically if a line is indented and the next line is not, it is considered the beginning of a paragraph block. A short line break is detected if no other type of block/code is detected and the line is indented and doesn't quite go to the end of the line (10 pixels).
Although that could easily be made a configuration option as well.

Thanks for the feedback. I hope it proves useful.
chrisophus is offline   Reply With Quote
Old 05-10-2009, 08:43 PM   #27
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,897
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by chrisophus View Post
Actually it does use indentation to detect paragraphs. Basically if a line is indented and the next line is not, it is considered the beginning of a paragraph block. A short line break is detected if no other type of block/code is detected and the line is indented and doesn't quite go to the end of the line (10 pixels).
Although that could easily be made a configuration option as well.

Thanks for the feedback. I hope it proves useful.
And it is and where do we get whatever it is? Thanks!
JSWolf is offline   Reply With Quote
Old 05-10-2009, 09:06 PM   #28
chrisophus
Junior Member
chrisophus began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Aug 2008
Device: Kindle
It is cxpdfhtml. See my earlier post and my blog for details and download links: http://talkings.org/2009/05/07/cxpdfhtml/
chrisophus is offline   Reply With Quote
Old 05-11-2009, 10:48 AM   #29
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,897
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Thank you. I'll give it a go later when I have a chance and find a PDF I want to convert.
JSWolf is offline   Reply With Quote
Old 09-10-2009, 06:03 AM   #30
stisev
Austrian Economist
stisev began at the beginning.
 
Posts: 20
Karma: 16
Join Date: Jun 2009
Device: X51v
Hi all,
Like you guys, I have a lot of purchased PDF files, none of which are DRMed (I refuse to purchase any store that DRMs anything).

All I can say is that it is virtually impossible to convert everything successfully.

Like 2 previous recommendations, I like Nuance PDF Converter Pro. Nuance PDF Converter 6.0 Pro just came out and the converter is the most accurate so far, but still chokes with some books.
stisev is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert djvu to PDF, DOC, or HTML? enarchay Other formats 8 09-21-2011 09:22 AM
LRFTools. Convert LRF to EPUB, HTML, PDF and RTF elinares LRF 279 07-30-2011 11:48 PM
Qindle - Qt for Kindle (Now with PDF, DJVU, EPUB and CHM support) meem Kindle Developer's Corner 14 07-21-2011 04:49 PM
Qindle .. Qt port with PDF, DJVU, EPUB and CHM support meem Kindle Developer's Corner 17 10-03-2010 06:19 AM
[Old Thread] unable to convert ebooks(rtf, txt,lit,html,pdf) to lrf in calibre .4.131 jackdeth191 Calibre 9 05-02-2009 02:55 AM


All times are GMT -4. The time now is 06:47 PM.


MobileRead.com is a privately owned, operated and funded community.