Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 02-20-2008, 12:21 AM   #301
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by DaleDe View Post
Generally PalmDOC files (the ones you process) are expected to be wrapped by the reader and only contain returns at paragraph boundaries thus there is no line end. Why would you want line endings?
I misspoke, I said 'Line endings' but meant 'paragraph boundaries'.

I'm looking to 'convert' .pdb text to .html code very simplicatically i.e.
Code:
Chapter One\nThe Sky was...\n
to become
Code:
<html><body><p>Chapter One</p>\n<p>The Sky was...</p>\n<p></p></body></html>
What I'm getting is:
Code:
<html><body><p>Chapter One The Sky was... </p></body></html>
The '\n' gets converted to a space BEFORE I'm able to replace it with </p>\n<p> somewhere in the bowels of 'Mobiperl'.

-Nick
nrapallo is offline   Reply With Quote
Old 02-20-2008, 03:26 AM   #302
DMcCunney
New York Editor
DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.
 
DMcCunney's Avatar
 
Posts: 6,384
Karma: 16540415
Join Date: Aug 2007
Device: PalmTX, Pocket eDGe, Alcatel Fierce 4, RCA Viking Pro 10, Nexus 7
Quote:
Originally Posted by DaleDe View Post
Generally PalmDOC files (the ones you process) are expected to be wrapped by the reader and only contain returns at paragraph boundaries thus there is no line end. Why would you want line endings?
Yep. Text files converted to PalmDOC files really need *nix line endings (LF) vs DOS/Windows (CRLF). The embedded CRs in PC text files are treated as hard breaks and screw up formatting.
______
Dennis
DMcCunney is offline   Reply With Quote
Old 02-20-2008, 09:20 AM   #303
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by nrapallo View Post
tompe:

I recently processed a .pdb (TEXt/REAd) and got a long series of words with no line breaks.

In 'mobi2html' I tried using '--rawhtml' and saw that there were <CR><LF> line endings in the text, but they seem to disappear when processed.

I couldn't find where the line endings were being stripped and replaced with spaces. Since the text feed to HTML::TreeBuilder had no HTML tags, would that be the culprit?

I tried using substituitions on the raw text to produce basic HTML code, but it didn't work.
Code:
my $book = $text;
$book = ~s/\cM//g;                   # Unix line endings
$book = ~s/\n/\x01/g;                # Collapse lines
$book = ~s/\x01\x01/<\/p>\n\n<p>/g;  # Separate paragraphs
$book = ~s/\x01/ /g;                 # Insert whitespace

$text = "<html><body><p>" . $book . "</p></body></html>";
-Nick
Here is a test file to see what can be done to convert .pdb properly (text to HTML code internally). All I get is one long line of words in the resulting .html with no form feeds/para boundaries.

-Nick
Attached Files
File Type: pdb Riddles_for_Kid.pdb (1.5 KB, 330 views)
nrapallo is offline   Reply With Quote
Old 02-20-2008, 10:52 AM   #304
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by nrapallo View Post
tompe:
I couldn't find where the line endings were being stripped and replaced with spaces. Since the text feed to HTML::TreeBuilder had no HTML tags, would that be the culprit?
Yes, the output from TreeBuilder does not contain line breaks but that is as it should be. The concept of line break does not exist in HTML. Do you want a line break you should add <br/>.
tompe is offline   Reply With Quote
Old 02-20-2008, 10:56 AM   #305
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by nrapallo View Post
Here is a test file to see what can be done to convert .pdb properly (text to HTML code internally). All I get is one long line of words in the resulting .html with no form feeds/para boundaries.
mobi2html assumed that you have packed correct HTML code. Your example file does not contain correct HTML code. If you put the code inside <pre> it will work better at least.
tompe is offline   Reply With Quote
Old 02-20-2008, 11:16 AM   #306
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by tompe View Post
mobi2html assumed that you have packed correct HTML code. Your example file does not contain correct HTML code. If you put the code inside <pre> it will work better at least.
What I wanted to do was, in 'mobi2html', recognize that the .pdb file was just text (ie. TEXt/REAd), add the correct HTML code (via substitution) internally and then feed it to HTML::TreeBuilder as fixed text (really now simple HTML code).

However, I will try the <pre>...</pre> to see if I get the desired results.

The test .pdb looked find in ubook (with para breaks and all) but 'mobi2html' somehow looses these para breaks. I want to find a way to 'correct' this internally.

-Nick
nrapallo is offline   Reply With Quote
Old 02-20-2008, 11:31 AM   #307
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,792
Karma: 146391129
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Nick, mobi2imp is losing the line spaces that are supposed to be there and are in the PRC. I'm using The Heretic by Jason K. Chapman as chapter 4 has line spaces in the PRC that get lost. It's posted here in the eBooks section someplace.
JSWolf is offline   Reply With Quote
Old 02-20-2008, 01:08 PM   #308
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by nrapallo View Post
What I wanted to do was, in 'mobi2html', recognize that the .pdb file was just text (ie. TEXt/REAd), add the correct HTML code (via substitution) internally and then feed it to HTML::TreeBuilder as fixed text (really now simple HTML code).

However, I will try the <pre>...</pre> to see if I get the desired results.

The test .pdb looked find in ubook (with para breaks and all) but 'mobi2html' somehow looses these para breaks. I want to find a way to 'correct' this internally.
I see. The problem is that TEXt/REAd can contain HTML also and it will work perfectly well. So you can not use this to indicate that it is a text content. Why do you not just use text and use the --rawhtml flag to get just the text? Why do you want mobi2html to wrap the text in HTML?
tompe is offline   Reply With Quote
Old 02-20-2008, 01:09 PM   #309
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by JSWolf View Post
Nick, mobi2imp is losing the line spaces that are supposed to be there and are in the PRC. I'm using The Heretic by Jason K. Chapman as chapter 4 has line spaces in the PRC that get lost. It's posted here in the eBooks section someplace.
What is line space? How is it specified in the MobiPocket HTML variant?
tompe is offline   Reply With Quote
Old 02-20-2008, 01:18 PM   #310
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by tompe View Post
I see. The problem is that TEXt/REAd can contain HTML also and it will work perfectly well. So you can not use this to indicate that it is a text content. Why do you not just use text and use the --rawhtml flag to get just the text? Why do you want mobi2html to wrap the text in HTML?
TEXt/REAd should not contain html. Any application doing this should have a different type. This is the PalmDOC format and should never contain any extra stuff or many PalmDOC readers would be broken. Have you seen an example of this and do you know what application created it?

Dale
DaleDe is offline   Reply With Quote
Old 02-20-2008, 01:29 PM   #311
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by DaleDe View Post
TEXt/REAd should not contain html. Any application doing this should have a different type. This is the PalmDOC format and should never contain any extra stuff or many PalmDOC readers would be broken. Have you seen an example of this and do you know what application created it?
Maybe I mis-remembered (I will lokk for it...). What I noticed was that when I created such files by mistake they worked on the Cybook and they work in the Mobipocket reader.
tompe is offline   Reply With Quote
Old 02-20-2008, 01:50 PM   #312
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by tompe View Post
Maybe I mis-remembered (I will lokk for it...). What I noticed was that when I created such files by mistake they worked on the Cybook and they work in the Mobipocket reader.
that is true. Almost all programs that have any roots in Palm will read PalmDOC files. It is practically a given. The only exception I know about is that Amazon seems to have disabled this even though, otherwise they are using MobiPocket developed code.

Dale
DaleDe is offline   Reply With Quote
Old 02-20-2008, 02:05 PM   #313
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by tompe View Post
What is line space? How is it specified in the MobiPocket HTML variant?
This is a .IMP related issue with the eBook Publisher interface. I has to do with the fact that '<br />' is sometimes 'ignored' AFTER the mobipocket .html is created properly.

I am trying to fix this behaviour in 'mobi2imp', but still testing.

-Nick
nrapallo is offline   Reply With Quote
Old 02-20-2008, 02:10 PM   #314
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,548
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by DaleDe View Post
TEXt/REAd should not contain html. Any application doing this should have a different type. This is the PalmDOC format and should never contain any extra stuff or many PalmDOC readers would be broken. Have you seen an example of this and do you know what application created it?

Dale
Hi Dale,

The "BookDesigner" program, which many people here use, creates "MobiPocket" books using the TEXt/REAd descriptors in the PRC header. You'll find hundreds of such books on this site.
HarryT is offline   Reply With Quote
Old 02-20-2008, 02:14 PM   #315
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by tompe View Post
I see. The problem is that TEXt/REAd can contain HTML also and it will work perfectly well. So you can not use this to indicate that it is a text content. Why do you not just use text and use the --rawhtml flag to get just the text? Why do you want mobi2html to wrap the text in HTML?
That's it!

Is there a way to use '--rawhtml' and NOT print $text to STDOUT, but rather re-direct (re-open) it internally for the my substitution code to work.

I think what is happening here is that printf to STDOUT in binmode 'actually' generates the para boundaries I want to use. They don't seem to be there when the text is first used i.e my $text = $pdb->text;

-Nick
nrapallo is offline   Reply With Quote
Reply

Tags
mobi2mobi, mobils


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Mobi2Mobi Mobi2Mobi v0.13 - GUI for Mobiperl tools Jad Kindle Formats 476 03-15-2015 05:51 PM
Tools for Editing Kindle .mobi Files? GJN Kindle Formats 33 12-26-2013 02:05 PM
Handy Perl Script to convert HTML0 files to smartquotes maggotb0y Sony Reader 0 04-12-2007 11:49 AM
PRS-500 Perl tools to generate Reader content TadW Sony Reader Dev Corner 0 01-08-2007 05:55 AM
gmail copy (gmcp) - Perl script to copy files to/from Gmail Colin Dunstan Lounge 0 09-04-2004 01:24 PM


All times are GMT -4. The time now is 09:51 AM.


MobileRead.com is a privately owned, operated and funded community.