Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 03-23-2011, 06:16 PM   #1
Sidetrack
Enthusiast
Sidetrack began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jan 2009
Location: South Pacific
Device: Kindle DX
Changing Format Without Parsing

for convenience there are some format changes I'd like to make, preferably without dumping everything out of Calibre and re-importing it. Anybody had any luck with this kind of thing? :

PRC-->MOBI (where the PRC file is actually just a MOBI file)
PDB-->Various (output the actual file the pdb contains)
TXT,RTF-->ZIP,RAR,TXTZ (convert uncompressed into compressed format)

I've specifically been trying reduce the size of my library by compressing text files TXT-->TXTZ in batch, but this results in formatting issues and needs to babysat on a per file basis. I'd be just as happy with TXT-->ZIP.

Conversion seems to engage file parsing so that it actually goes TXT-->HTML-->TXTZ and does annoying little things liking mashing together snippets of quoted verse/poetry. It seems like there are a few "conversions" of this nature that could skip the parsing process safely.
Sidetrack is offline   Reply With Quote
Old 03-24-2011, 10:08 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Sidetrack View Post
Conversion seems to engage file parsing so that it actually goes TXT-->HTML-->TXTZ and does annoying little things liking mashing together snippets of quoted verse/poetry. It seems like there are a few "conversions" of this nature that could skip the parsing process safely.
All conversions go through the intermediate XHTML conversion stage. You are right that this specific conversion might be better done without that intermediate. You might consider an enhancement request in the new bug tracker here.

You can also export, compress on your own, change extension to txtz and add to Calibre. Alternatively, with care, you could compress in the library (outside Calibre control), then run the library checker and it should find the new txtz files and the missing txt files.
Starson17 is offline   Reply With Quote
Advert
Old 03-24-2011, 10:32 AM   #3
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by Sidetrack View Post
It seems like there are a few "conversions" of this nature that could skip the parsing process safely.
In these cases you would be better off using a dedicated tool for modifying those formats. There are a number of tools for extracting MOBI Html, text from PDBs. TXTZ is just a ZIP archive with TXT, images and a metadata OPF file in it.

As stated all conversion is: Input - OEB - Output. As such all input plugins create OEB and all output plugins require OEB. This makes converting between formats much simplier. Previous <= 0.4 used per input / output conversion code. It was unmanagable and in may cases redundant. So the conversion process was changed to what you see now.
user_none is offline   Reply With Quote
Old 03-24-2011, 11:13 AM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by user_none View Post
TXTZ is just a ZIP archive with TXT, images and a metadata OPF file in it.
If I can go slightly off topic - what readers/programs use this format? I saw it appear in the list, and I assume the Calibre reader reads it, but where did it come from? I also was going by some quick Googling that indicated it was just a compressed txt file.

If it's also got a copy of the image and opf file, I wonder how much saving of space he'll get? That was the OP's purpose, but since Calibre already has the image in the library as cover.jpg, would converting a txt to txtz create a second stored copy of the image in the txtz file that duplicated the cover.jpg image already stored?

The jpg format is already compressed, so it won't get smaller inside the zip. It just strikes me as a lot of work to save a few pennies worth of storage capacity.

Edit: I realized I could answer the latter question myself. Yes, it duplicates/adds the cover/opf, but I still got about 50% file size reduction as compared to the uncompressed txt format. For my entire library, with many pdfs and epubs, it wouldn't reduce the size by much. For a txt only library, it might be worth it if your reader read that format.

Last edited by Starson17; 03-24-2011 at 11:32 AM.
Starson17 is offline   Reply With Quote
Old 03-24-2011, 11:34 AM   #5
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by Starson17 View Post
If I can go slightly off topic - what readers/programs use this format? I saw it appear in the list, and I assume the Calibre reader reads it, but where did it come from?
I created the TXTZ format because calibre requires ebooks be single files. No reader other than calibre supports it. Markdown and Textile formatted files can reference images so TXTZ allows everyting to be packaged into one file. A side effect of TXTZ is it allows for robust and standardized metadata due to the included OPF.
user_none is offline   Reply With Quote
Advert
Old 03-24-2011, 11:52 AM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by user_none View Post
I created the TXTZ format
Thanks for the explanation. I can see the need.

I have a very old reader program (uBook) that handles zip files as a directory. It also reads txt format and the reader device (an IPAQ) has limited space on its SD card. I'd actually created a few zipped txt files with covers - the near equivalent of the txtz format - for use on that reader. Calibre seemed happy with a txt file inside the zip with a cover inside, and the reader program just sees it as a a folder with a txt and a cover, which it automatically displays.
Starson17 is offline   Reply With Quote
Old 03-25-2011, 07:07 PM   #7
Sidetrack
Enthusiast
Sidetrack began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jan 2009
Location: South Pacific
Device: Kindle DX
OK guys, sounds like the sort of fiddling I'd intend is best done outside of Calibre. Still, I think if you've got something that you're happy with the structure of it'd be nice to be able to convert without picking through and re-structuring.

About TXTZ though, a conversion yields this:

\The Complete Works of William Shakespeare (11709)
..\cover.jpg
..\metadata.opf
..\The Complete Works of William Shakespear - William Shakespeare.txt 5.3mb
..\The Complete Works of William Shakespear - William Shakespeare.txtz
..\..\cover.jpg
..\..\index.txt 1990kb
..\..\metadata.opf

Good with the commentary, but does interesting things to the iambic pentameter... What would be the consequences of zipping text files myself en masse and ending up with:

\The Complete Works of William Shakespeare (11079)
..\cover.jpg
..\metadata.opf
..\The Complete Works of William Shakespear - William Shakespeare.txtz
..\..\The Complete Works of William Shakespear - William Shakespeare.txt 1990kb

And then doing a database restore to pick up the txtz files.

As an aside, any chance of getting windows mime code for txtz that would let windows explorer open the archive, or launch a text editor to open the text file?
Sidetrack is offline   Reply With Quote
Old 03-25-2011, 07:12 PM   #8
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
You can make your own TXTZ and use the metadata editor dialog to add the file to the entry. The TXTZ just needs the TXT file, any images (cover.jpg) and optionally a file called metadata.opf with the metadata in it.

Quote:
Originally Posted by Sidetrack View Post
As an aside, any chance of getting windows mime code for txtz that would let windows explorer open the archive, or launch a text editor to open the text file?
I'll ping Perkin to respond. I believe he uses Windows and has it setup like this.
user_none is offline   Reply With Quote
Old 03-26-2011, 03:32 AM   #9
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
with windows xp that you could set entire folders to be stored as compressed, then windows would compress / decompress indiividual files as needed. so you'd have your entire library in compressed folder, under the hood, yet calibre would work as normal & you'd have no need to mess with individual files formats.
in win 7 you right click calibre library goto properties ...advanced ...tick compress file to save space... then select apply to subfolders. I suspect it will not save much space tyhough, unless you have a lot of. txt, .rtf format books.
the epub, mobi, zip, pdf formats are already compressed, so compressing them again is not going to help much.
cybmole is offline   Reply With Quote
Old 03-26-2011, 04:11 AM   #10
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
Quote:
Originally Posted by user_none View Post
You can make your own TXTZ and use the metadata editor dialog to add the file to the entry. The TXTZ just needs the TXT file, any images (cover.jpg) and optionally a file called metadata.opf with the metadata in it.

I'll ping Perkin to respond. I believe he uses Windows and has it setup like this.
I'm not too sure how you'd do it for Windows explorer...probably similar to below.

I use 7Zip but you should be able to do it with any archive handler.
To get it to open the TXTZ, I opened the folder one was in, right-clicked on the *.txtz file and selected 'Open with...', then browsed to where 7Zip was installed and selected '7zFM.exe', and used the option 'Always use the selected program...'

Then when clicking the the link 'TXTZ' in 'Book details' pain or window, the archive will automatically open in 7Zip, I can then double-click on any entries to open them, or use drag'n'drop to add stuff to the archive.

I've been using Textile mark-up text a lot and have set any *.text files to open in EditPadPro (which I made a textile styler for), and can then double-click any *.text file and have it open in EPP and automatically style the Textile markup as well.

Any use?
Perkin is offline   Reply With Quote
Old 04-01-2011, 12:47 AM   #11
Sidetrack
Enthusiast
Sidetrack began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jan 2009
Location: South Pacific
Device: Kindle DX
Excellent stuff guys, thanks. This ought to let me compress my library size temporarily until I can get around to converting everything to a uniform format.

That's what I was looking for Perkin, pretty much mimics what explorer does.

Sorry I'm only able to check back sporadically... cyclone season is over and I've been jonesing to get out of the harbor and do some sailing.
Sidetrack is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Parsing Titles cgraving Calibre 3 01-17-2011 02:52 AM
Changing the Input format 69bonni Calibre 2 01-12-2011 11:16 AM
eHarlequin changing their ebook format to ePub only as of 3/2011 chilady1 General Discussions 4 12-21-2010 11:27 PM
Error parsing attribute name? seagull Calibre 1 01-01-2010 11:30 AM
Calibre Author/Title parsing sglinert Calibre 1 05-23-2008 10:18 AM


All times are GMT -4. The time now is 05:36 PM.


MobileRead.com is a privately owned, operated and funded community.