Character Incompatibilities

Twisted_Woody

Full Member
Scout
Joined
Jan 18, 2004
Messages
8,860
This is from The_Lizard in the newbies so credit to him for this work. He tried posting but got an internal server error so asked me to try.

Postings to the blog, like this one:

https://www.redcafe.net/blog/2009/08/where-are-the-goals-going-to-come-from-p
art-2/

appear also on the forum pages:

https://www.redcafe.net/f6/where-goals-going-come-part-2-a-263083/#post670503
0

But I noticed the apostrophes had all turned to question marks. Clearly,
there's some problem with the coding of the text that isn't compatible
between the blog and the forum.

The source text for the RedCafe page alerts the web browser to the proper
character set, like this:

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

http://en.wikipedia.org/wiki/ISO/IEC_8859-1

But the blog page is encoded differently:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

http://en.wikipedia.org/wiki/UTF-8
 
UTF-8 is trying to become the Internet standard, since it can code almost
every language there is, but not everyone uses it yet, and therein lies a
problem.

Among the many possible incompatibilities between the blog and the forum
encodings is the ever-loving apostrophe. There's other ever-loving
characters that don't make it through the transfer between character sets,
but somehow the blessed apostrophe is the one we see most obviously and most
often.

Explanation:

http://www.sitepoint.com/article/guide-web-character-encoding/2/


" Many editors under Windows will use Windows-1252 as the default (or only!)
encoding. If you save files as Windows-1252 and declare the encoding to be
ISO 8859-1, it usually works. This is because the two are very similar.

But if you use certain literal characters, like typographically correct
quotation marks, dashes, ellipses, and so on, you'll run into trouble. These
characters are not part of ISO 8859-1. In Windows-1252, they're located in
the range that the ISO encoding reserves for C1 control characters -- in
other words, those code points are invalid in ISO 8859-1. Copying from
another Windows application, like Word, is a particularly likely cause of
problems."

So my reading of the situation is this: the blog entries are being
composed in a Windows text editor that most probably uses character set
Windows-1252. The text is copied and pasted into a text-entry field on the
blog's authoring page. The blog then publishes that text to the world,
alerting the world's browsers to use the character set UTF-8.

The blog entry then appears on the RedCafe forum, where the pages tell
the world's browsers to use the character set ISO 8859-1. The basic text
survives this transfer, but some of the special characters, like the
apostrophe, do not survive. When the browser can't understand the character
it is seeing, it substitutes a question mark for it. So what appears as
proper apostrophes in the blog, appear as question marks in the forum.

Fixing this will involve someone who has administrative rights to the
blog and the forums to set them both to use compatible character sets.

Any wonk out there who does XML in his/her sleep is welcome to rip
into me for this, but only after you've ripped into the original problem
first!
 
That took 3 tries to post due to "internal server errors" that I received.

Anyway, as I said, credit for this post goes to "The_Lizard".
 
I thought this was gonna be a thread about Bellamy, Adebayor, Robinho, Santa Cruz and Tevez being a recipe for ego-disaster..