Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 11-01-2010, 02:02 AM   #1
chengyibo
Member
chengyibo began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Oct 2010
Device: sony reader
PDF to WORD/HTML conversion, "special characters and marks" errors

Hi all,

when I convert a PDF to a WORD or HTML flie, some error happens.

for example, “Sanskritization” becomes ⋚Sanskritization&& when converting the pdf to word or html format. Why double quotation marks shows as special marks like"⋚" and "&&" ?

i am really begging a solution or any better convertion tool which can handle this kind special characters problem when convert pdf to another format.

For anyone who can help me,thanks so much.
chengyibo is offline   Reply With Quote
Old 11-01-2010, 01:09 PM   #2
hernep
Enthusiast
hernep began at the beginning.
 
Posts: 30
Karma: 42
Join Date: Oct 2010
Location: Finland
Device: iRiver Story, iPad 2
Without knowing all the details, I would say it is because of encoding of your PDF.
It is different than the produced HTML is. For example BIG-5 vs. UTF-8 or things like that. Check what is the encoding of PDF and set HTML the same encoding.

Can you copy that word with quotes and paste it on a word processor / plain text editor?
Does it look weird, too?

By the way. Those characters look so special that simple "replace string"-function in any text editor would do the trick hands down fast comparing the effort figuring out the the problem and possible solution.

Note, check also the font used. If possible use same font.
hernep is offline   Reply With Quote
Advert
Old 11-01-2010, 11:04 PM   #3
chengyibo
Member
chengyibo began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Oct 2010
Device: sony reader
Dear Hernep, i am really thank you so much,

and i am a freshman in this field. can you tell me how can i found the "encoding" of a pdf.
i would appreciate your help so much, for i really want to learn this. Thanks again,
chengyibo is offline   Reply With Quote
Old 11-06-2010, 12:43 AM   #4
Nexutix
Reading and reading
Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.
 
Nexutix's Avatar
 
Posts: 582
Karma: 8250144
Join Date: Oct 2010
Device: Infibeam Pi, iPod Touch 4G, iPad Air 2, iPad mini 2, Oneplus One
I downloaded, used and uninstalled one software which converted pdf to html without line breaks. I forgot from where I got it on mobileread and need it now. It says "X pages parsed" at end of conversion and creates html in same folder of pdf.

Someone plz redirect me.
Nexutix is offline   Reply With Quote
Reply

Tags
conversion, marks, pdf, special chacracters

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
MS Word "crap" at beginning of html files PatNY Sigil 23 10-21-2010 06:22 PM
Kindle DX optimal "page" size - PDF or Word template guiyoforward Amazon Kindle 12 09-28-2010 07:05 PM
PDF conversion > spaces become "?" Tango Calibre 3 07-16-2010 02:23 PM
HTML special characters - tip for the unwary John Everett Sigil 2 06-02-2010 01:40 PM
Conversion error: CHM to PDF or "anything else" conversions FAIL. Help. mm07 Calibre 3 05-16-2010 09:41 AM


All times are GMT -4. The time now is 05:48 AM.


MobileRead.com is a privately owned, operated and funded community.