04-04-2007, 02:09 AM | #16 |
Addict
Posts: 303
Karma: 187
Join Date: Dec 2006
Device: Sony Reader
|
Ashkulz, it is most important right now to create or plan for a framework that can be easily extended or repurposed for new things and by new applications. Anyway, all it'd be is just three executables. And they'll be doing the same things the monolithic thing is doing now. there'll just be a bit more work talking through a defined interface, but it'll pay off through much greater flexibility. Flexibility to change pieces
how about we discuss a spec? ok, so, the rasterizer exe would expose some of the things ghostscript, etc should be able to do: -- input - pdf file or list of files -- output - output folder and filename -- output size in pixels and format (8bit, gray, color) -- autocropping, explicit cropbox -- (opt) output file type (png, jpg, bmp, raw) -- (opt) rotation -- (opt) device-specific features (eg ghostscript's font-rendering modes) this exe prints out the names of the files it processes so that these could be piped or saved to a variable (or to a file). The other exes should be able to accept input filenames piped in (and maybe from a file). the processing exe would be: -- input/output filenames -- output resolution, format -- (opt) fit (centered, upper-left, stretched) -- (impl-specific, opt) dilate factor -- (impl-specific, opt) eg sharpen or other filter parameters collating exe would just take a list of files and bind them into a format for some specific device. it would also accept a TOC as a file or something. (people could write new .exe's to add support for new/old devices and file formats) misc ideas- overcropping... option to crop not at the first black pixel but only after, say, a few dozen (so dust, dots, or lines don't mess up autocropping) output filenames... imagemagic etc can take output filename as eg "fileA%02d.png" and produce fileA01.png, fileA02.png I think a standalone app would be used more than an integrated one. Personally, i just use sd cards and never sony connect. Also, a standalone app can focus better on adding support to do all the things that could give the best results. Maybe doing it in qt will make it more difficult to do something fancy that lets you preview, crop, rotate, etc. I don't know, but i know that manually cropping in acrobat is very, very helpful. However, I've never found a free alternative to do manual cropping. |
04-04-2007, 12:25 PM | #17 |
Junior Member
Posts: 9
Karma: 10
Join Date: Jul 2006
|
Sounds like good stuff is happening. I'm swamped closing out my last semester of school, so I won't be able to contribute for a bit.
Just wanted to point out two bits of code from my work that may be the most useful: 1) overcropping is already implemented - check the trimNoise function. Big help for scanned PDFS (such as Google Books). 2) proper centering of images. Related code is found in trimNoise as well as the main processing function. |
04-04-2007, 03:42 PM | #18 | |||
Addict
Posts: 350
Karma: 705
Join Date: Dec 2006
Location: Mumbai, India
Device: Kindle 1/REB 1200
|
Quote:
So #4 will be essentially a replacement for what we currently have. Apps who want to use a part (or any combination of the pipeline) will essentially use #3 directly. So ideally, we should have very "thin" glue code in #3 and #4, with most of the logic being in #1 and #2. Also, trying out new approaches is very painless, as it is easy to add a new plugin and introduce in the pipeline via #3. As an example, the current process can be represented as Code:
filesrc location=input.pdf ! pdftops ! gsrasterize dpi=300 ! autocrop ! dilate ! resize width=565 height=784 ! makelrf author=XYZ title=foo | libprs500-send I don't know if you're familiar with electronics/IC design, but that's what essentially what you do there. It would make development MUCH easier and make the whole process much more easier to tweak for everyone (once the initial bump is past, of course). So let's say I want to use xpdf for rasterizing (it's much smaller than gs on win32), I replace gsrasterize with xpdfrasterize (which is the only thing I need to write) and then recreate/rerun the pipeline. Quote:
Quote:
As an aside, we should call it something other than PDFRead or PDFRasterFarian: the above is not merely a tool, it is a ebook conversion framework. I mean, I can imagine that html being a source plugin sometime in the future, so this could be a standard way of interacting with ebook formats, devices and whatnot. |
|||
04-04-2007, 11:17 PM | #19 |
Addict
Posts: 303
Karma: 187
Join Date: Dec 2006
Device: Sony Reader
|
what exactly do you mean by plugins? Do you mean the "rasterizer" and "post-processing" components that i'm talking about would themselves be composed of smaller pieces?
"the above is not merely a tool, it is a ebook conversion framework. I mean, I can imagine that html being a source plugin sometime in the future, so this could be a standard way of interacting with ebook formats, devices and whatnot." Right now I was just thinking about a framework that handled image-based ebooks. For html, and indeed for a larger audience, you would need to support native-text formats (although i dunno.. native text would never look as good as dilated and processed images). To handle native-text you would need to create an intermediary text format with formatting and embeded links that could carry HTML, pdf, rtf, etc and then be reprocessed into lrf, pdf, starebook, etc. is... ambitious. And it'd have to work perfectly (ie just as well as a direct html->lrf conversion). If we just stick to working with images (and even claim that's the suprior way to do things) I think it makes things much simpler (and much easier to get right). We can omit things like sophisticated pads that keep track of their own dependencies. Simply moving images from one folder to another would be fine and would even make it easier for other developers to hook in. (It's still the same spirit as the pads, but just a simpler implementation.) however, let's ask the question: if say we only work with images, what things could/would/would-want-to be done by others? Are there things that can't be done by a 3-layer framework of Create images, Reprocess images, Bind images (provided each layer exposes enough features)? What are the usage scenarios? |
04-05-2007, 12:28 AM | #20 | |||||
Addict
Posts: 350
Karma: 705
Join Date: Dec 2006
Location: Mumbai, India
Device: Kindle 1/REB 1200
|
Quote:
From the point of view of the calling tool, there would be only one executable which would allow one to choose and setup the pipeline. All the plugins and other low-level details will be in code, and not exposed to the user. Quote:
Quote:
I disagree about the folder-to-folder thing -- that's a poor solution, as that means we have to create and maintain that many folders. Why communicate over the filesystem when you can communicate much more clearly via code? Also, you get around that in PDFRasterFarian by fixing the stages upfront and pre-creating folders in the installation directory. That is not feasible on other platforms, plus it implicitly means you can run only 1 instance of PDFRasterFarian at 1 time. PDFRead has no such limitation, and I think that supporting (simultaneous) batch processing is very important. Quote:
Usage scenarios are simple:
On the whole, I think the most compelling argument would be the transparency and simplicity from the user/tool writer point of view. It will also make the code much more modular and easier to maintain. |
|||||
04-09-2007, 06:09 AM | #21 |
Addict
Posts: 303
Karma: 187
Join Date: Dec 2006
Device: Sony Reader
|
"Why communicate over the filesystem when you can communicate much more clearly via code?"
Using folders as pads is a bit dirty (especially for concurrent conversions... although those should really be batched and run sequentially anyway) but it is _somewhat_ elegant and, above all, _very_ easy to hook into and extend. Say I have a program that can be told from the command-line to accept some input files and create some output files. How would I integrate it into your framework? "PDFRead has no such limitation, and I think that supporting (simultaneous) batch processing is very important." Actually, I think batching serially rather than concurrently makes more sense. You get your first output quicker and there is no problem if you want to convert an obscene number of files. (Even a few dozen concurrent conversions would kill the ram). "If each layer exposes enough features to turn on/off features individually, the command line options for it will grow quite a bit (see PDFRead). " Well, the command line options wouldn't be for the user to use but for the developer writing a wrapper. Surely it'll be much easier on (and give more freedom to) a developer to code a long command line in his script than to output a custom pipeline file? In the end, though, there are two questions: Can a sophisticated framework of which you speak be implemented in theory (ie is the concept compatible with being very flexible and easy to extend)? And: Will such a framework be actually implemented by us (ie will it be too much work)? The folders approach, I think, has both points going for it. I must say, however, I like the cut of jib. Last edited by alex_d; 04-09-2007 at 06:19 AM. |
04-25-2007, 08:08 AM | #22 |
Addict
Posts: 350
Karma: 705
Join Date: Dec 2006
Location: Mumbai, India
Device: Kindle 1/REB 1200
|
Okay, I've implemented the ideas which I mentioned here in the 1.6 release. You can look at the code at
http://pdfread.svn.sourceforge.net/v...pdfread/trunk/ Please see the PDFRead 1.6 thread for other features added in this release. |
06-02-2007, 10:40 AM | #23 |
Member
Posts: 20
Karma: 10
Join Date: Dec 2006
|
Any progress? Can I try something?
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
PRS-500 PDFrasterFarian v2.0 beta | alex_d | Sony Reader Dev Corner | 165 | 10-29-2012 02:57 PM |
PDFRead on Mac OS X -- PDFRasterFarian for OS X! | sammykrupa | 12 | 11-07-2009 09:18 PM | |
PRS-500 PDFrasterFarian - makes A4/Letter PDFs usable | alex_d | Sony Reader Dev Corner | 120 | 09-10-2007 01:41 PM |
PDFRasterFarian Installation | fatalfunnel | Sony Reader | 2 | 04-01-2007 10:07 PM |
Making DJVUs readable using Acrobat Professional and PDFrasterFarian | jenia | Sony Reader | 1 | 01-19-2007 10:27 AM |