View Full Version : calibre ereader output testers needed


user_none
04-25-2009, 05:31 PM
I've implemented ereader output more or less in Calibre for 0.6. There is still some more work to do with it but the base work is done. That is providing the files can be read properly in ereader pro. I've attached a pdb of newsweek which was downloaded with Calibre. Can someone try it out in an ereader app and tell me if it even opens?

=X=
04-25-2009, 06:28 PM
This file did not work on my BlackBerry storm
I also tested my files to make sure it was not just my device
It worked

I'm not sure why this file does not work.

I gave you ghe pml for the test file, can you build them with your tool? If so load it herw so I can test it

user_none
04-25-2009, 07:37 PM
Okay, try this one. I don't know what some of the record0 locations are supposed to contain so I'm just guessing based upon what I see in the test files. This one also guarantees images are PNG and not larger than the maximum.

=X=
04-26-2009, 01:19 AM
Okay I've looked at these books both with the eReader Desktop and the eReader for BB

The BB does not work with either files

The eReader crashes with the file in post #1 after you turn page 1
The eReader crashes on page 6

=X=

user_none
04-26-2009, 09:29 AM
Okay, try these three.

I gave you ghe pml for the test file, can you build them with your tool?
mytest.pdb is the output. I can't load the pml directly. The file is your test.pdb converted to epub then converted back to ereader pdb. The sidebar and footnotes should be text at the end of the file and not embedded like before.

=X=
04-26-2009, 01:15 PM
Okay, try these three.


mytest.pdb is the output. I can't load the pml directly. The file is your test.pdb converted to epub then converted back to ereader pdb. The sidebar and footnotes should be text at the end of the file and not embedded like before.

newsweek.pdb
hc.pdb
mytest.pdb


None of the above files work on the eReader desktop or the Blackberry Storm eReader.

I'm not sure how this "newsweek.pdb" differes from that of post #3 but this newsweek crashes the eReader on load. The latter crashed the eReader only on the last page.

Non have worked on the eReader BlackBerry

user_none
04-26-2009, 04:58 PM
3 more to try. Here is what's going on with these files.

ereader files are comprised of 3 basic parts, pdb header, ereader header and data. The pdb header which specifies the internal format type and the layout of the data sections. The first section is the ereader format header. It specifies a lot of information such as format version, compression. Most importantly it specifies what type of data is in each of the remaining sections. I know what 10 of the sections are and what kind of value they should have. However, there are 27 sections.

What I'm doing is inspecting the data in the 17 unknown sections in various files from different sources and I'm looking for patterns on the differences in their values. Some are easy to spot, they have the same value no matter what file or application built them. Some are harder. As I analyze the values I'm adjusting the output of my writer and hoping I get it right.

=X=
04-26-2009, 10:12 PM
Both mytest.pdb and newsweek.pdb hang the desktop eReader. hc.pdb generates an error message "This file is a format not recognized by eReader"

I did not try these on the BB storm

=X=

wallcraft
04-26-2009, 11:24 PM
3 more to try. All the eReader files I have seen start with the title of the ebook. These examples don't seem to do so.

DaleDe
04-27-2009, 11:39 AM
All the eReader files I have seen start with the title of the ebook. These examples don't seem to do so.

That is true of all PDB Palm files. The first 32 bytes holds a zero terminated title for the database. All of this is documented in our wiki.

Dale

user_none
04-27-2009, 05:21 PM
All the eReader files I have seen start with the title of the ebook. These examples don't seem to do so.
That is true of all PDB Palm files. The first 32 bytes holds a zero terminated title for the database. All of this is documented in our wiki.
The files all have a 32 byte zero terminated title string at the beginning of the file. The string is empty because I haven't implemented writing metadata to the file yet.

JSWolf
04-28-2009, 08:45 AM
Maybe the 32 byte title string is not allowed to be empty.

user_none
04-28-2009, 05:26 PM
Maybe the 32 byte title string is not allowed to be empty.

It is. I've also tried it with text in it. As of right now due to the lack of information on the format eReader output is being put on hold for the time being.

DaleDe
04-28-2009, 07:38 PM
It is. I've also tried it with text in it. As of right now due to the lack of information on the format eReader output is being put on hold for the time being.

Did you look in the wiki? It is documented there.

Dale

user_none
04-28-2009, 07:43 PM
Did you look in the wiki? It is documented there.

The pdb header is documented on the wiki. Within the pdb container is the a header for the ereader format. It is record 0 of the pdb file. The ereader header is a 132 byte package that defines certain values regarding to the ereader format within the pdb container. The 132 byte ereader header is not defined in the wiki and I have not been able to find it defined fully anywhere.

DaleDe
04-28-2009, 08:51 PM
The pdb header is documented on the wiki. Within the pdb container is the a header for the ereader format. It is record 0 of the pdb file. The ereader header is a 132 byte package that defines certain values regarding to the ereader format within the pdb container. The 132 byte ereader header is not defined in the wiki and I have not been able to find it defined fully anywhere.

I think it starts out as a PalmDOC file and that header is defined in the wiki. Perhaps that will get you started. Almost all Palm reader programs have their roots in PalmDOC.

Once you figure it out please add it to the wiki.

Dale

nrapallo
04-28-2009, 09:26 PM
The pdb header is documented on the wiki. Within the pdb container is the a header for the ereader format. It is record 0 of the pdb file. The ereader header is a 132 byte package that defines certain values regarding to the ereader format within the pdb container. The 132 byte ereader header is not defined in the wiki and I have not been able to find it defined fully anywhere.

If you know how to read Perl code, this may help http://cpansearch.perl.org/src/AZED/EBook-Tools-0.4.4/lib/EBook/Tools/EReader.pm . It is part of EBook-Tools (http://www.mobileread.com/forums/showthread.php?t=31142) by Zed Pobre.

Here is an excerpt of the ParseRecord0 code: my $version; # EReader version
# Expected values are:
# 02 - PalmDoc Compression
# 10 - Inflate Compression
# >255 - data is in Record 1
my $headerdata; # used for holding temporary data segments
my $offset;
my %header;
my @list;

debug(1,"DEBUG: EReader Record 0 is ",length($data)," bytes");
$headerdata = substr($data,0,16);
@list = unpack('nnNnnnn',$headerdata);
$header{version} = $list[0]; # Bytes 0-1
$header{unknown2} = $list[1]; # Bytes 2-3
$header{unknown4} = $list[2]; # Bytes 4-7
$header{unknown8} = $list[3]; # Bytes 8-9
$header{unknown10} = $list[4]; # Bytes 10-11
$header{nontextoffset} = $list[5]; # Bytes 12-13
$header{nontextoffset2} = $list[5]; # Bytes 14-15

$headerdata = substr($data,16,16);
@list = unpack('nnNnnnn',$headerdata);
$header{unknown16} = $list[0];
$header{unknown18} = $list[1];
$header{unknown20} = $list[2];
$header{unknown22} = $list[3];
$header{unknown24} = $list[4];
$header{footnoterecs} = $list[5];
$header{sidebarrecs} = $list[6];

$headerdata = substr($data,32,24);
@list = unpack('nnnnnnnnnnnn',$headerdata);
$header{bookmarkoffset} = $list[0];
$header{unknown34} = $list[1];
$header{nontextoffset3} = $list[2];
$header{unknown38} = $list[3];
$header{imagedataoffset} = $list[4];
$header{imagedataoffset2} = $list[5];
$header{metadataoffset} = $list[6];
$header{metadataoffset2} = $list[7];
$header{footnoteoffset} = $list[8];
$header{sidebaroffset} = $list[9];
$header{lastdataoffset} = $list[10];
$header{unknown54} = $list[11];

user_none
04-29-2009, 06:36 PM
If you know how to read Perl code, this may help http://cpansearch.perl.org/src/AZED/EBook-Tools-0.4.4/lib/EBook/Tools/EReader.pm . It is part of EBook-Tools (http://www.mobileread.com/forums/showthread.php?t=31142) by Zed Pobre.

Here is an excerpt of the ParseRecord0 code:
...
I have seen this. The issue is it only goes up to 54 bytes. There are records in the header after 54. the record 0 header totals 132 bytes. While most of the bytes after 54 are 0 two are not. Also, a number of those unknown sections have to have the right value. My first try at a writer only implemented the marked sections (they are all that's needed for reading) and it didn't work. Some of the unknown sections look to have the same value across files but a number have unique values. Thanks for your help though.

wallcraft
04-30-2009, 08:00 AM
It won't work for Linux users, unless the limitations of MakeBook are acceptable (or can Wine be used with DropBook?), but a simple work around for eReader output is to write out PML and let DropBook (http://www.ereader.com/ereader/help/dropbook/download.htm) do this rest. This is the approach taken in ereader2ereader in two steps (http://www.mobileread.com/forums/showthread.php?t=43690). There are other advantages of doing this, such as simplifying debugging of conversion issues (experts on PML can see the document source). Also, what Calibre's eReader output should look like is either identically PML -> DropBook or that plus some extra frills. So developing direct eReader output will be simplified if the PML option is in place.

user_none
04-30-2009, 03:48 PM
It won't work for Linux users, unless the limitations of MakeBook are acceptable (or can Wine be used with DropBook?), but a simple work around for eReader output is to write out PML and let DropBook (http://www.ereader.com/ereader/help/dropbook/download.htm) do this rest. This is the approach taken in ereader2ereader in two steps (http://www.mobileread.com/forums/showthread.php?t=43690). There are other advantages of doing this, such as simplifying debugging of conversion issues (experts on PML can see the document source). Also, what Calibre's eReader output should look like is either identically PML -> DropBook or that plus some extra frills. So developing direct eReader output will be simplified if the PML option is in place.

Bundling MakeBook would not be acceptable. It is are written using Java and would require Calibre to bundle the JVM in addition to the Python VM. DropBook would be even worse because it would require testing against Wine, and bundling Wine in addition to the Windows JVM.

The conversion framework for 0.6 is Input -> OEB -> output. The eReader output would be Input -> OEB -> (html -> pml -> complied into an eReader formatted pdb file). Everything except the last step of the output is working. Since the OEB -> pml is working I'm going to modify it to be an output format (at some point before the 0.6 release). So you will be able to create an eReader file using DropBook from another input format with the help of Calibre. However, you will have to do that step manually.

wallcraft
04-30-2009, 04:16 PM
Since the OEB -> pml is working I'm going to modify it to be an output format (at some point before the 0.6 release). So you will be able to create an eReader file using DropBook from another input format with the help of Calibre. However, you will have to do that step manually. Thanks - this was what I was asking for. I understand that it isn't a good idea to bundle MakeBook or DropBook into Calibre.

Having already imposed on you for PML output, recall that =X= asked for PML input: On an aside, it would be nice if the tool could convert a zip file containing the (PML source and images) to any format. It isn't your job, necessarily, to make life easier for DRM circumvention, but if DRM-free eReader .pdb's are added as input to Calibre the process for makeing an ePub will be DRM-ed eReader -> PML (ereader2pml.py) -> DRM-free eReader (DropBook) -> ePub (Calibre). If Calibre supported PML input then the DropBook step could be skipped for input.

user_none
04-30-2009, 04:34 PM
Having already imposed on you for PML output, recall that =X= asked for PML input...

I haven't forgotten and it is on my todo list once I finish turning all of the device drivers into plugins. The main thing I need look into for PML input is how to best support images. I'm thinking of making it a sub-type of the zip input where images and PML files will be required to be in a zip file. While this is a bit more cumbersome than just feeding it the PML file it would allow for more flexibility (multiple pml files making up one book for instance).

user_none
05-10-2009, 10:02 AM
I just wanted to let every one know about the progress I've made with pml and ereader output.

PML input and output are both supported in 0.6. A zip file with the extension pmlz with pml and images is produced with output. A zip file with the extension pmlz can be used for input as well as a stand alone pml file. However, a stand alone pml file will have referenced images ignored. Use pmlz if you want images included.

The big news is, I can create working text based ereader books. Images are not yet working but I hope to have that fixed by the end of today. Once I get everything working I will be adding the format information I've figured out to the wiki.

The plan is to get a beta for 0.6 out later this month once it's released testing would be much appreciated.

user_none
05-10-2009, 01:15 PM
eReader output is finished and working. I've put the information regarding the file format on the wiki under the eReader section.

=X=
05-11-2009, 12:00 AM
Very cool thank you for the update. Let me know when/if you need some testers.

=X=