11-01-2010, 02:02 AM | #1 |
Member
Posts: 10
Karma: 10
Join Date: Oct 2010
Device: sony reader
|
PDF to WORD/HTML conversion, "special characters and marks" errors
Hi all,
when I convert a PDF to a WORD or HTML flie, some error happens. for example, “Sanskritization” becomes ⋚Sanskritization&& when converting the pdf to word or html format. Why double quotation marks shows as special marks like"⋚" and "&&" ? i am really begging a solution or any better convertion tool which can handle this kind special characters problem when convert pdf to another format. For anyone who can help me,thanks so much. |
11-01-2010, 01:09 PM | #2 |
Enthusiast
Posts: 30
Karma: 42
Join Date: Oct 2010
Location: Finland
Device: iRiver Story, iPad 2
|
Without knowing all the details, I would say it is because of encoding of your PDF.
It is different than the produced HTML is. For example BIG-5 vs. UTF-8 or things like that. Check what is the encoding of PDF and set HTML the same encoding. Can you copy that word with quotes and paste it on a word processor / plain text editor? Does it look weird, too? By the way. Those characters look so special that simple "replace string"-function in any text editor would do the trick hands down fast comparing the effort figuring out the the problem and possible solution. Note, check also the font used. If possible use same font. |
Advert | |
|
11-01-2010, 11:04 PM | #3 |
Member
Posts: 10
Karma: 10
Join Date: Oct 2010
Device: sony reader
|
Dear Hernep, i am really thank you so much,
and i am a freshman in this field. can you tell me how can i found the "encoding" of a pdf. i would appreciate your help so much, for i really want to learn this. Thanks again, |
11-06-2010, 12:43 AM | #4 |
Reading and reading
Posts: 582
Karma: 8250144
Join Date: Oct 2010
Device: Infibeam Pi, iPod Touch 4G, iPad Air 2, iPad mini 2, Oneplus One
|
I downloaded, used and uninstalled one software which converted pdf to html without line breaks. I forgot from where I got it on mobileread and need it now. It says "X pages parsed" at end of conversion and creates html in same folder of pdf.
Someone plz redirect me. |
Tags |
conversion, marks, pdf, special chacracters |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
MS Word "crap" at beginning of html files | PatNY | Sigil | 23 | 10-21-2010 06:22 PM |
Kindle DX optimal "page" size - PDF or Word template | guiyoforward | Amazon Kindle | 12 | 09-28-2010 07:05 PM |
PDF conversion > spaces become "?" | Tango | Calibre | 3 | 07-16-2010 02:23 PM |
HTML special characters - tip for the unwary | John Everett | Sigil | 2 | 06-02-2010 01:40 PM |
Conversion error: CHM to PDF or "anything else" conversions FAIL. Help. | mm07 | Calibre | 3 | 05-16-2010 09:41 AM |