View Full Version : doc to epub


Laura81
01-31-2010, 09:46 AM
Hi all,

Does anyone know of a free program that can convert doc to epub? Calibre will only do so if I save the doc file as an rtf first and I have hundreds of doc files so do not want to have to do this. Someone here helped me with a program that will convert them to mobi and now I'd like one that will do epub so I can compare. Thanks!

Catire
01-31-2010, 02:00 PM
Yout could try Atlantis Word Processor (http://www.mobileread.com/forums/showthread.php?t=48775)

jackie_w
01-31-2010, 02:39 PM
Hi Laura,

Can't help you with a FREE doc to epub program, but I believe the Atlantis Word Processor will read DOC and write EPUB for a reasonable price ($35 I think). I haven't used it myself but I believe you can try it for free.

http://www.mobileread.com/forums/showthread.php?t=48775


(Sorry, cross-post ...)


If you decide to stick with the long-winded-but-free option of MSWord I would recommend you convert your DOCs to HTML(Webpage-Filtered) rather than RTF before importing to Calibre. I think you will find more of the formatting in your source DOC will be correctly converted to the EPUB e.g. centre, right-align, hard-line-breaks to name but a few.

frabjous
01-31-2010, 06:14 PM
What platform are you on?

On linux, I'd just use something like AbiWord from the command line (http://opensource.weblogsinc.com/2005/06/29/use-abiword-to-convert-filetypes-on-the-command-line/) to convert .doc to .html (the link has instructions on batch conversion) and then to convert to ePub with calibre. It probably works on other platforms too. I've never tried.

This definitely can be done for free, in a variety of ways.

Laura81
01-31-2010, 11:38 PM
Thanks for the info all. I am on Windows so can't do that. Also I have hundreds of doc files so I really want to avoid having to convert them to something else first and then to epub. I just want to directly convert the docs to epub without anything else inbetween.

I tried the free version of Atlantis but it only seems to convert half the document.

DoctorOhh
01-31-2010, 11:43 PM
Does anyone know of a free program that can convert doc to epub? Calibre will only do so if I save the doc file as an rtf first and I have hundreds of doc files so do not want to have to do this. Someone here helped me with a program that will convert them to mobi and now I'd like one that will do epub so I can compare. Thanks!

If you have converted them to Mobi then just use the mobi files for the input and convert them to epub using Calibre.

But the best way to go from doc to epub is via html(filtered html if you save from Word). More info here (http://www.web-books.com/Publishing/Word2EPUB.htm).

frabjous
02-01-2010, 01:16 AM
Thanks for the info all. I am on Windows so can't do that.

Why not? There are Windows versions of both programs I mentioned.

Laura81
02-01-2010, 03:11 AM
If you have converted them to Mobi then just use the mobi files for the input and convert them to epub using Calibre.

But the best way to go from doc to epub is via html(filtered html if you save from Word). More info here (http://www.web-books.com/Publishing/Word2EPUB.htm).

I've only converted a few to mobi. My PP can read html so maybe I'll just change them to html and leave it at that. I really don't want to have to save into one format first and then convert. As I said I have hundreds of documents and it would take forever doing it that way.

Laura81
02-01-2010, 03:12 AM
Why not? There are Windows versions of both programs I mentioned.

Oh right. Will have a look then but again if it's changing it to html first and then to epub I may as well just leave it as html and be done with it.

awp
02-01-2010, 05:57 AM
I tried the free version of Atlantis but it only seems to convert half the document.

Atlantis Word Processor always converts the entire document to EPUB. Please send your DOC file to support@AtlantisWordProcessor.com, or attach to your forum post. Or post the corresponding screen captures.

Laura81
02-01-2010, 06:25 AM
Atlantis Word Processor always converts the entire document to EPUB. Please send your DOC file to support@AtlantisWordProcessor.com, or attach to your forum post. Or post the corresponding screen captures.

Is there a way I can view my epub document in Atlantis? It shows up only as half the document on my reader but maybe it's something wrong with the reader and not the conversion.

awp
02-01-2010, 07:44 AM
I cannot say if the EPUB file is OK without having the original DOC file. As I understand, you cannot email or attach it.

Sorry, but the present version of Atlantis Word Processor cannot be used for reading EPUB books.

jackie_w
02-01-2010, 08:13 AM
Can't you view the EPUB using the Ebook-Viewer in Calibre?

Laura81
02-01-2010, 08:26 AM
Jackie - I hadn't thought of that so I installed Adobe Digital Edition to view it and it is all there from the conversion so it must have been the transfer or something that went poorly. I transfered it again and it shows up great.

So AWP, thanks for your help but all is well and Atlantis worked perfectly. Atlantis doesn't do batch convert by chance does it?

awp
02-01-2010, 09:01 AM
You could also test your EPUBs under your Windows in Sony Reader (http://ebookstore.sony.com/download/) and Mobipocket Reader (http://www.mobipocket.com/en/DownloadSoft/ProductDetailsReader.asp).

awp
02-01-2010, 09:03 AM
Atlantis Word Processor does not offer batch conversion. But converting multiple documents to EPUB in Atlantis is fairly easy.

First, assign a hot key to the "Save as eBook" command of Atlantis. Choose the "Tools | Hot Keys..." menu command of Atlantis, then:

1) Click the "File" category.

2) Click the "Save as eBook" item from the "Commands" box.

3) Click in the "New hot key" box, and press the desired key combination (F11, for example).

4) Click the "Assign" button.

5) Click the "OK" button.

awp
02-01-2010, 09:05 AM
You can open up to 100 documents in Atlantis simultaneously. In the "File | Open..." window of Atlantis you can use Ctrl+Click or Shift+Click to select multiple documents. Or select the desired documents in Windows Explorer, and open them in Atlantis with drag & drop (capture the selected files in Windows Explorer with the mouse, then drag to the Atlantis window, and drop onto the Atlantis toolbars, document bar, or status bar).

When you have a document open in Atlantis, you can press the following keys and key combinations to convert it to EPUB, and switch to another open document:

1) Press F11 (or another hot key that you assigned to the "Save as eBook" command).
2) Press Enter to confirm the filename for your EPUB.
3) Press Enter to "click" the "OK" button in the "Save as eBook" window.
4) Press Ctrl+W to close the active document of Atlantis.

So only 4 key presses are needed to save the active document as EPUB, and close it:

F11
Enter
Enter
Ctrl+W

Just repeat the above 4 steps (4 key combinations), and you could convert hundreds of documents in minutes.

frabjous
02-01-2010, 09:29 AM
My God, what is with all this awp spamming their for-pay product which doesn't even do the thing the OP wants to do?

You can batch convert with a simple script using calibre alongside any number of Word to html converters, such as that provided by abiword. If you don't want to use abiword, I'm sure there are other options.

If you want help writing the .bat file, perhaps I'll write it when I have a spare moment to boot into my Windows partition to test.

awp
02-01-2010, 10:09 AM
1) I am not spamming. I am suggesting a solution. It is up to Laura to decide if it is OK with her.

2) Not everyone is a programmer. Your "simple solutions" might not be so simple to ordinary users.

3) Developing commercial software never was a crime. If not paying for software makes you happier, I am glad for you. But not everyone thinks in the same way as you do.

frabjous
02-01-2010, 11:04 AM
If someone posts to a list asking for free software that will batch convert doc to epub, and you respond by promoting software that is not free and that does not batch convert doc to epub, you are spamming.

I am not a programmer either. No programming is necessary unless you consider writing a two line batch file a "program".

I have no objection to paying for software in any circumstance. The original poster asked for free software. It is therefore the only thing which it is relevant to discuss.

But I'm rather not have a flame war. I'll be back with the necessary batch file soon.

awp
02-01-2010, 11:35 AM
Please reread this entire thread. I never suggested Atlantis Word Processor as a "free solution".

Posts #2, #3, and #5 of this thread all mention Atlantis Word Processor, and they are not mine. My first post to this thread is #10. When someone says that Atlantis Word Processor generates an invalid EPUB file, why I cannot ask for details? When Laura said that "Atlantis worked perfectly", and asked for batch conversion "in Atlantis", what's wrong with replying with MY solution?

Laura81
02-01-2010, 11:41 AM
awp, thanks for your suggestions but honestly, to me, I'm looking for a simpler solution. Maybe nothing exists to simply batch my doc files to epub.

frabjous, thanks for your suggestions a well but as far as I understand what you've said I would first have to save my doc files in html format and then convert the html format to epub. I simply just want a way to directly convert my doc files to epub with no middle process. Just like calibre can do with any other format bar doc. But as I said above, maybe nothing like this exists.

frabjous
02-01-2010, 11:50 AM
Laura,

No, what I'm suggesting is a single batch file that will both batch convert your .doc files to .html and then batch convert the .html files to .epub files.

As I noted, the batch file is two lines long:


for %%I in (*.doc) do "C:\Program Files (x86)\AbiWord\bin\AbiWord.exe" --to=html "%%I"
for %%I in (*.html) do "C:\Program Files (x86)\Calibre2\ebook-convert.exe" "%%I" "%%~nI.epub"


Here's what you need to do:

1. Download and install AbiWord (http://abisource.com). It's completely free.
2. Download and install calibre (http://calibre-ebook.com). It's completely free.
3. Copy and paste the two lines above into Notepad, and save a file named "doc2epub.bat" or anything else whose name you'll remember ending in .bat. Save it in the same folder where your .doc files are.

The lines assume you're using a 64-bit operating system. If not, then use this instead:


for %%I in (*.doc) do "C:\Program Files\AbiWord\bin\AbiWord.exe" --to=html "%%I"
for %%I in (*.html) do "C:\Program Files\Calibre2\ebook-convert.exe" "%%I" "%%~nI.epub"


The only difference is where the .exe files are located. You'll probably want to double check that the programs got installed in the same directories that mine were. I cannot test on a 32-bit version of windows, since I don't have access to one.


4. Navigate in Windows explorer to the folder where your .doc files are, and double click on "doc2epub".

The batch file will convert all the .doc files in the folder to html, and then convert all the .html files to .epub.

There's a lot of tweaking that can be done there to make the script better, especially with regard to how it handles filenames, metadata, etc., but it's hard for me to predict without knowing what your files are like. You might just want to cut and paste the FIRST line, and run that. That will batch convert your files to HTML. You then might want to use the graphical user interface in calibre to batch convert the HTML to ePub. That's really up to you.

Laura81
02-01-2010, 12:20 PM
Wow frabjous! Thanks a lot! I've created the .bat file and am ready to try it... I have other files in that folder that are not .doc files... is that ok? Or should only .doc files be in that folder?

frabjous
02-01-2010, 12:50 PM
It doesn't matter whether other files are in there, although make sure you don't have any .epub or .html files with the same names as the .doc files, or they'll get overwritten. (I think... actually, I'm not sure what will happen, but it won't be good.)

Also, any other .html files in the folder will get converted to .epub along with the .doc files, but I doubt that's a problem.

P.S. Sorry, didn't see your note about the characters not converting. I doubt using calibre's GUI would help. The thing would be to try to find out where the problem is caused. Open the intermediate .html files and see if they're OK inside there. If they're OK, the problem is calibre's. If not, the problem is AbiWord's.

Laura81
02-01-2010, 12:50 PM
Ok, went ahead and did it, worked great! Thank you so much! The only problem I've found so far is that some of the characters don't seem to be turning out in the epub versions so maybe just using the first line and then using calibre directly would solve that.

But thanks so much! Being able to batch convert all the doc files to html first makes it a hell of a lot easier!

Laura81
02-01-2010, 12:52 PM
Oh, I meant to say the only issue I have is that now I have a heap of files appearing where there weren't any before... like dll files showing up... I'll try to get a screen shot of it.

Laura81
02-01-2010, 01:01 PM
Here's an attachment with what is showing up on my hard drive... are these from the conversions? They weren't there until I did them.

Laura81
02-01-2010, 01:07 PM
P.S. Sorry, didn't see your note about the characters not converting. I doubt using calibre's GUI would help. The thing would be to try to find out where the problem is caused. Open the intermediate .html files and see if they're OK inside there. If they're OK, the problem is calibre's. If not, the problem is AbiWord's.

I checked... the problem was on the original word doc so the script you wrote seems to have worked perfectly as I checked another one and compared all 3 versions and the html and epub were both perfect!

So again, thanks so much! I imagine I can just paste that dat file anywhere I like and use it the same way.

Now I just have to figure out where the wierd .dll files are coming from and if I can delete them or not.

EDIT: I am wondering if all those files could have actually been put there when I installed AbiWord... maybe I should just delete them and then see if AbiWord still works. For now though they can stay and I'm going to bed. Will deal with them in the afternoon when I get up.

frabjous
02-01-2010, 01:25 PM
Here's an attachment with what is showing up on my hard drive... are these from the conversions? They weren't there until I did them.

Those .dll and .txt files aren't the result of the script, though they may be the result of installing either Abiword or calibre. I looked at my Windows partition and they were there too. If I had to guess, I'd say they're a byproduct from using the .msi installation routine, and/or using Internet Explorer to download the programs to install, since the EULA files are Microsoft-related. Typical microsoft messiness.

I went ahead and deleted them, and everything still worked just fine, including my conversion script.

Yeah, you can copy or move the .bat file from one folder to another, and run it whereever.

DoctorOhh
02-01-2010, 09:30 PM
Please reread this entire thread. I never suggested Atlantis Word Processor as a "free solution".

Posts #2, #3, and #5 of this thread all mention Atlantis Word Processor, and they are not mine. My first post to this thread is #10. When someone says that Atlantis Word Processor generates an invalid EPUB file, why I cannot ask for details? When Laura said that "Atlantis worked perfectly", and asked for batch conversion "in Atlantis", what's wrong with replying with MY solution?

Thank you for answering direct queries from users on this forum. It is nice to know someone from AWP is watching and ready to assist as needed.

It is also worthy of note that you stepped up to help even though you knew she was using the free version of your software. :thumbsup:


As I noted, the batch file is two lines long:

Here's what you need to do:

1. Download and install AbiWord (http://abisource.com). It's completely free.
2. Download and install calibre (http://calibre-ebook.com). It's completely free.
3. Copy and paste the two lines above into Notepad, and save a file named "doc2epub.bat" or anything else whose name you'll remember ending in .bat. Save it in the same folder where your .doc files are.

The lines assume you're using a 64-bit operating system. If not, then use this instead:


for %%I in (*.doc) do "C:\Program Files\AbiWord\bin\AbiWord.exe" --to=html "%%I"
for %%I in (*.html) do "C:\Program Files\Calibre2\ebook-convert.exe" "%%I" "%%~nI.epub"


This is a very nice solution. I'm glad you stopped whining about another forum member answering questions and trying to help someone long enough to provide a working solution. :)

I'll have to save this post for future use. :thumbsup:

Karma to both of you.

Laura81
02-02-2010, 01:04 AM
Those .dll and .txt files aren't the result of the script, though they may be the result of installing either Abiword or calibre. I looked at my Windows partition and they were there too. If I had to guess, I'd say they're a byproduct from using the .msi installation routine, and/or using Internet Explorer to download the programs to install, since the EULA files are Microsoft-related. Typical microsoft messiness.

I went ahead and deleted them, and everything still worked just fine, including my conversion script.

Yeah, you can copy or move the .bat file from one folder to another, and run it whereever.

Thanks for your help in this too! I will just delete it all then. And now for some massive conversion time today!

Oh and I actually prefer using your bat command versus even Calibre because it annoys me that Calibre creates a new area to put my ebooks when I want them arranged a certain way and I can't do that with Calibre. Ever think about writing more command lines for converting other things? Like PDF and LIT files?

frabjous
02-02-2010, 01:28 AM
Thanks.

Yeah I pretty much do all my calibre conversions using scripts or the command line, partly to avoid Calibre's GUI messing with my file structure. Calibre is still doing the conversion, though, so let's give credit where it's due.

Actually, the modifications you'd need to do batch converting with .PDF or .LIT should be pretty straightforward. If you want, e.g., to convert all the .pdf's in a folder to *.epub, using calibre alone, just use my second line above, but change (*.html) to (*.pdf). Or change it to (*.lit) to convert .lit files to .epub. If you want .lit files as output, just change the "%%~nI.epub" at the end to "%%~nI.lit"

Similarly, if you want to use it for converting files that AbiWord handles but calibre doesn't (WordPerfect, or Works, or Word 2007), keep the first line, and change (*.doc) to (*.docx) or (*.wpd) or (*.wps), or whatever. Keep to=html at the end, and (*.html) in the second line.

If you want to get serious about this, though, you should read up on calibre's ebook-convert command line program (http://calibre-ebook.com/user_manual/cli/ebook-convert.html) and the many options it offers, as well as a batch guide for Windows such as this one (http://commandwindows.com/batch.htm), or if you use another OS, try a bash script (http://tldp.org/LDP/Bash-Beginners-Guide/html/index.html).

Laura81
02-02-2010, 01:49 AM
Thanks frabjous, I doubt I'd get fully serious about it though as I'm not sure I would fully understand it all. Changing your first command line to use Calibre that way though seems pretty straight-forward so I think I'll try that.

Laura81
02-02-2010, 04:11 AM
Hey frabjous, I tried changing the .html to a .lit as you said above to convert those to epubs but it didn't work for me. The file I saved just saved as a .dat and didn't change into a windows batch file like the other one did.

DoctorOhh
02-02-2010, 04:25 AM
Hey frabjous, I tried changing the .html to a .lit as you said above to convert those to epubs but it didn't work for me. The file I saved just saved as a .dat and didn't change into a windows batch file like the other one did.

The other one didn't change into a batch file, you named it that way. If you opened the previous bat file in notepad and made the changes frabjous
for %%I in (*.html) do "C:\Program Files\Calibre2\ebook-convert.exe" "%%I" "%%~nI.lit"suggested then just save the file as html2lit.bat and it will be a batch file.

Good Luck.

Laura81
02-02-2010, 04:34 AM
This is the change I'm trying to make:

for %%I in (*.lit) do "C:\Program Files\Calibre2\ebook-convert.exe" "%%I" "%%~nI.epub"

I then saved the above as lit2epub.dat but it does not turn into a windows batch file as the other one did. It just stays the dat file.

DoctorOhh
02-02-2010, 04:41 AM
This is the change I'm trying to make:

for %%I in (*.lit) do "C:\Program Files\Calibre2\ebook-convert.exe" "%%I" "%%~nI.epub"

I then saved the above as lit2epub.dat but it does not turn into a windows batch file as the other one did. It just stays the dat file.

Change the D to a B it is called a batch file because we name it such. change .dat to .bat

Save it as lit2epub.bat

Laura81
02-02-2010, 05:17 AM
Change the D to a B it is called a batch file because we name it such. change .dat to .bat

Save it as lit2epub.bat

Oi vey, do I feel like an idiot! How much more obvious can it get! Excuse my idiocy and thanks!

DoctorOhh
02-02-2010, 05:20 AM
Oi vey, do I feel like an idiot! How much more obvious can it get! Excuse my idiocy and thanks!

ABCD is the extent of my knowledge, glad I could help.

Laura81
02-02-2010, 06:03 AM
Well I tried it and while it did create the batch file properly it didn't work. Not sure why. I pasted it in the folder with the lit files but the black command screen just flashed up really quickly and then closed again.

Solicitous
02-02-2010, 06:38 AM
Well I tried it and while it did create the batch file properly it didn't work. Not sure why. I pasted it in the folder with the lit files but the black command screen just flashed up really quickly and then closed again.

You could add a third line to the batch file
sleep 30
This will keep the window open for 30 seconds (replace 30 with the number of preferred seconds). Will allow you to see any error messages.

Laura81
02-02-2010, 08:19 AM
You could add a third line to the batch file
sleep 30
This will keep the window open for 30 seconds (replace 30 with the number of preferred seconds). Will allow you to see any error messages.

Thanks for the suggestion but I would have no clue how to do that. I don't know how to write any code at all.


EDIT: I worked out what I was doing wrong with the code... using the 32bit line instead of the 64bit. Another smack upside my head needed. All working perfectly now.

frabjous
02-02-2010, 03:16 PM
Glad to hear it!

gastan
03-02-2010, 01:16 PM
Thank you, frabjous, for the script you supplied. I came here looking for a solution to convert an html file to epub. I used only the one html2epub line and it worked perfectly.

Needless to say, I'm going to save the bat file and bookmark this thread.

:thanks: :thanks: :thanks: :thanks: :thanks: :thanks:

frabjous
03-02-2010, 03:02 PM
No problem, but if you're just converting HTML to ePub, you could run calibre and drag the file or file(s) into the calibre window, and use the rather more user-friendly conversion menus in there. Not that there's anything wrong with using my script... it's that most people prefer graphical user interfaces to command-line shell scripts.

adikira
03-04-2010, 04:58 PM
frabjous, thanks, this is a useful method - i don't need to use sony's reader library for this anymore (I still need to check if abiword supports docx though).

by the way, the content of the doc2epub.bat file can be replaced with:

for %%I in (*.doc) do "%PROGRAMFILES%\AbiWord\bin\AbiWord.exe" --to=html "%%I"
for %%I in (*.html) do "%PROGRAMFILES%\Calibre2\ebook-convert.exe" "%%I" "%%~nI.epub"

so that one doesn't need to worry if it's a 64 or 32 system

frabjous
03-04-2010, 06:22 PM
That's a nice tip, thanks. I don't usually use Windows, so I was trying to figure out how to do that on the fly. (I'm used to executables being put in my "path".) Anyway, thanks.

AbiWord supports .docx -- at least any recent version does. (Of course, you'll need to change (*.doc) to (*.docx) or (*.doc *docx) in the batch file.)

If installing on Windows, be sure to do a custom install and check to install the extra import and export plugins. I think you get .docx even without installing the extras, but that way you even get relatively arcane things, like latex.

adikira
03-05-2010, 11:43 AM
Welcome. I use a lot of Unix shell scripting at work but at home I still use Windows. When it comes to command line scripting Windows completely sucks! I may give powershell a try though.

Anyway, I updated the batch file to give more flexibility. It can take two arguments the file extension that you are converting from and the extension you are converting to (which defaults to epub if missing). If the extension you are converting from is doc or docx it first runs AbiWord to convert to html first otherwise it just uses Caliber's convertor.

It should be saved to a batch file such as "convert.bat." It's recommended that this file (convert.bat) is saved in the Windows directory so that it's in the PATH variable and can be called up from any folder. Otherwise the full path the the convert.bat needs to be specified or it needs to be in the current folder (of course).


@ECHO OFF
if "%1"=="" GOTO USAGE
if "%1"=="doc" GOTO ABI
if "%1"=="docx" GOTO ABI
SET doAbi=no

:CALIBRE
if "%2"=="" SET ext2=epub
if "%2"!==!"" SET ext2=%2
if "%doAbi%"=="yes" SET ext1=html
if "%doAbi%"=="no" SET ext1=%1
echo Converting "%ext1%" to "%ext2%" using Calibre
for %%I in (*.%ext1%) do "%PROGRAMFILES%\Calibre2\ebook-convert.exe" "%%I" "%%~nI.%ext2%"
GOTO END

:ABI
SET doAbi=yes
echo Converting "%1" to "html" using AbiWord
for %%I in (*.%1) do "%PROGRAMFILES%\AbiWord\bin\AbiWord.exe" --to=html "%%I"
GOTO CALIBRE

:USAGE
echo "Usage %0 <first_extension> <second_extension>"
echo "Example: %0 doc epub"
echo "If <second_extension> is not specified it defaults to 'epub'"

:END

frabjous
03-05-2010, 12:26 PM
Excellent.

I'm learning stuff about Windows batch files by looking at it. Since, like you, I normally just write Unix/linux shell scripts (e.g. bash), writing batch files for Windows usually only comes up for me when I want to help someone here and need a way to translate something I already know how to do in linux to Windows. This will be very helpful.

I guess the only downside to your method, however, is that since it needs to be passed arguments, you can't tell someone just to cut and paste into Notepad, save and double click the icon -- the people actually have to go to the command prompt and navigate to the appropriate folder. Not really a difficult concept for those of us who remember DOS, or use UNIX/linux where this is relatively commonplace, but a number of Windows users who come here looking for help are often intimidated by doing anything through a command line interface, unfortunately.

(Though I suppose you could just make two batch files, with one calling the other if need be.)

awp
03-05-2010, 06:08 PM
Atlantis doesn't do batch convert by chance does it?

The latest beta version of Atlantis Word Processor has the "batch conversion" command:

http://www.mobileread.com/forums/showthread.php?p=817673#post817673

DoctorOhh
03-05-2010, 11:38 PM
The latest beta version of Atlantis Word Processor has the "batch conversion" command:

http://www.mobileread.com/forums/showthread.php?p=817673#post817673

Thanks for letting us know. It's nice to have a company read the threads on here and adjust their product according to people's needs.

Fith
03-09-2010, 01:16 PM
Glad I found this as it's proving most useful, Thanks!

Any Idea how to get "Author - Title.doc" from the file name?

awp
03-09-2010, 02:10 PM
Did you mean how to get "author" and "title" from a DOC filename when using the Batch Conversion (http://atlantiswordprocessor.blogspot.com/2010/03/batch-conversion.html) command of Atlantis Word Processor? If so, the "Batch conversion" command of the latest version of Atlantis retrieves this information from the source document in the following way.

It checks the document Properties (you can view them in Atlantis if you open a document in Atlantis, then choose the "File | Properties..." menu command of Atlantis).
If the "Title" property is not blank, Atlantis uses it as the "Title" EPUB metadata item. If the "Title" property is blank, Atlantis uses the document filename without the file path and extension.
If the "Author" property of the source document is not blank, Atlantis uses it as the "Author" EPUB metadata item. Otherwise the "User name" from the "General" tab of the "Tools | Options..." window of Atlantis is used.

So Atlantis currently does not retrieve the Author name from the filename of the source document. If everyone named document files in "your way" (I mean "<author name> - <book title>.<file extension>"), it would be fairly easy to adjust the "Batch conversion" feature of Atlantis to retrieve this information from names of source document files.

Fith
03-09-2010, 05:26 PM
Thanks for that response AWP:thumbsup:

How about frabjous and adikira, I know that batch files can (but can't remember how:chinscratch:) parse the files name, but can you pass that to Calibre to enter the Author and Title Fields automatically.:2thumbsup

frabjous
03-09-2010, 06:35 PM
My knowledge of Windows batch files is very limited -- hopefully adikira will chime in, but I *think* something along these lines should work to parse Author-File.doc to the metadata:

(For 64 bit Windows, just converting .doc to .epub)


for %%I in (*.doc) do (
for /f "tokens=1,2 delims=-" %%a in ("%%~nI") do (
"C:\Program Files (x86)\AbiWord\bin\AbiWord.exe" --to=html "%%I"
"C:\Program Files (x86)\Calibre2\ebook-convert.exe" "%%~nI.html" "%%~nI.epub" --authors="%%a" --title="%%b"
)
)


For 32 bit Windows you can use Adikira's trick of %PROGRAMFILES%:


for %%I in (*.doc) do (
for /f "tokens=1,2 delims=-" %%a in ("%%~nI") do (
"%PROGRAMFILES%\AbiWord\bin\AbiWord.exe" --to=html "%%I"
"%PROGRAMFILES%\Program Files (x86)\Calibre2\ebook-convert.exe" "%%~nI.html" "%%~nI.epub" --authors="%%a" --title="%%b"
)
)


This did not bypass the need for a different file for 32 bit and 64 bit, since I found that %PROGRAMFILES% expands to "C:\Program Files\" even on 64 bit systems, and hence, this wouldn't work if your executables are under "C:\Program Files (x86)\".

No clue what that will do with files that don't have dashes in them. Too lazy to check.

This would be very easy on linux. :)

Fith
03-10-2010, 04:21 AM
:thanks:
That worked great. Saved me a lot of (immediate) reading

I use batch/dos so rarely any more that I have to learn all over again when I need it. Its so useful and quick, when you know how. E.G concatenating text files...

I should now be able to adapt your scripts to my needs. Always easier from a working example. :thumbsup:

abjdiat
04-22-2010, 04:16 PM
thanx frabjous, batching make life much easier
i tried first using the one line command but it didn't work,
adding sleep 30 did the magic,
much appreciated

moon face
07-05-2010, 05:20 AM
frabjous,

i did the same what you say to Laura81

Laura,

No, what I'm suggesting is a single batch file that will both batch convert your .doc files to .html and then batch convert the .html files to .epub files.

As I noted, the batch file is two lines long:


Code:
for %%I in (*.doc) do "C:\Program Files (x86)\AbiWord\bin\AbiWord.exe" --to=html "%%I"
for %%I in (*.html) do "C:\Program Files (x86)\Calibre2\ebook-convert.exe" "%%I" "%%~nI.epub"Here's what you need to do:

1. Download and install AbiWord. It's completely free.
2. Download and install calibre. It's completely free.
3. Copy and paste the two lines above into Notepad, and save a file named "doc2epub.bat" or anything else whose name you'll remember ending in .bat. Save it in the same folder where your .doc files are.

The lines assume you're using a 64-bit operating system. If not, then use this instead:


Code:
for %%I in (*.doc) do "C:\Program Files\AbiWord\bin\AbiWord.exe" --to=html "%%I"
for %%I in (*.html) do "C:\Program Files\Calibre2\ebook-convert.exe" "%%I" "%%~nI.epub"The only difference is where the .exe files are located. You'll probably want to double check that the programs got installed in the same directories that mine were. I cannot test on a 32-bit version of windows, since I don't have access to one.


4. Navigate in Windows explorer to the folder where your .doc files are, and double click on "doc2epub".

The batch file will convert all the .doc files in the folder to html, and then convert all the .html files to .epub.

There's a lot of tweaking that can be done there to make the script better, especially with regard to how it handles filenames, metadata, etc., but it's hard for me to predict without knowing what your files are like. You might just want to cut and paste the FIRST line, and run that. That will batch convert your files to HTML. You then might want to use the graphical user interface in calibre to batch convert the HTML to ePub. That's really up to you.
__________________
DRM = ☹

--------------------------------------------------------------------------------
Last edited by frabjous; 02-01-2010 at 03:57 PM.

I leave it for more than 2 day, only 4 files from 128 are converted to epub.. :(
>>>>>>>>>>>>>>>>>>>

Laura81,


how mony days it take from you to converte your docs??

frabjous
07-05-2010, 11:59 AM
frabjous,

how mony days it take from you to converte your docs??

What? Two days? Why on Earth would you let it run for two days? It shouldn't take more than 15 minutes, unless these are super-long documents, or your computer is positively ancient.

Obviously something has gone wrong. Itís impossible for me to tell whatís gone wrong, because you didnít provide any information about what happened.

Were the html files created for all the docs?

moon face
07-18-2010, 07:12 PM
It take 5to10min from docs to HTML

After that I takes so Mach time to converte to epub :(

Ummmm, it's an Arabic docs

Please help me.

frabjous
07-18-2010, 08:47 PM
It's hard to help when I have so little information to go on.

I'm also not an expert. I've never actually done this myself; I was just describing how I would do it if I ever had a reason to try.

Have you tried opening the HTML files in a web browser? If so, do they look OK?

Have you tried dragging the HTML files into Calibre's GUI and proceed from there? Maybe that'll give you an idea what's going wrong. (You can batch convert through there too.)

If you still have trouble, try asking in the calibre forum, since if the HTML conversions wine fine, the issue is with calibre.

Futfanatico
03-26-2013, 11:04 AM
When I convert from DOC to EPUB via Calibre, I first save the DOC as an ODT. I find that the ODT-to-EPUB conversion keeps the formatting much better. This is especially true if you're anal about justifying text and paragraphs like me.