Unicode (UTF-8) browser display

Original Article
Last edited and revised: 23 July, 2008 20:41

The formal DTD ENTITIES converted to readable tables by Oscar van Vlijmen. Extracted and amended by Terry Whetstone Harmon, M.D. , which will be indicated by cells highlighted in yellow (my preferred coding is in bold).


For a discussion of the Romanian (limba română) letters Ă, ă, Ş, ş, Ţ, ţ, see Wikipedia article: Discuţie Wikipedia:Diacritice, also Turkish. Wikipedia has an article on UTF-8 here. The most accepted coding is in bold type.

WARNING: Be aware that I.E. besmirches / mangles, if it recognizes the code at all or substitutes a “?”, [for] many of the codes.


Portions © International Organization for Standardization 1986: Permission to copy in any form is granted for use with conforming SGML systems and applications as defined in ISO 8879, provided this notice is included in all copies.

Latin 1 characters

by name char by the number char Unicode description standard
nbsp     U+00A0 no-break space = non-breaking space ISOnum
iexcl ¡ ¡ ¡ U+00A1 inverted exclamation mark ISOnum
cent ¢ ¢ ¢ U+00A2 cent sign ISOnum
pound £ £ £ U+00A3 pound sign ISOnum
curren ¤ ¤ ¤ U+00A4 currency sign ISOnum
yen ¥ ¥ ¥ U+00A5 yen sign = yuan sign ISOnum
brvbar ¦ ¦ ¦ U+00A6 broken bar = broken vertical bar ISOnum
sect § § § U+00A7 section sign ISOnum
uml ¨ ¨ ¨ U+00A8 diaeresis = spacing diaeresis ISOdia
copy © © © U+00A9 copyright sign ISOnum
ordf ª ª ª U+00AA feminine ordinal indicator ISOnum
laquo « « « U+00AB left-pointing double angle quotation mark = left pointing guillemet ISOnum
not ¬ ¬ ¬ U+00AC not sign = discretionary hyphen ISOnum
shy ­ ­ ­ U+00AD soft hyphen = discretionary hyphen ISOnum
reg ® ® ® U+00AE registered sign = registered trade mark sign ISOnum
macr ¯ ¯ ¯ U+00AF macron = spacing macron = overline = APL overbar ISOdia
deg ° ° ° U+00B0 degree sign ISOnum
plusmn ± ± ± U+00B1 plus-minus sign = plus-or-minus sign ISOnum
sup2 ² ² ² U+00B2 superscript two = superscript digit two = squared ISOnum
sup3 ³ ³ ³ U+00B3 superscript three = superscript digit three = cubed ISOnum
acute ´ ´ ´ U+00B4 acute accent = spacing acute ISOdia
micro µ µ µ U+00B5 micro sign ISOnum
para ¶ U+00B6 pilcrow sign = paragraph sign ISOnum
middot · · · U+00B7 middle dot = Georgian comma = Greek middle dot ISOnum
cedil ¸ ¸ ¸ U+00B8 cedilla = spacing cedilla or comma ISOdia
sup1 ¹ ¹ ¹ U+00B9 superscript one = superscript digit one ISOnum
ordm º º º U+00BA masculine ordinal indicator ISOnum
raquo » » » U+00BB right-pointing double angle quotation mark = right pointing guillemet ISOnum
frac14 ¼ ¼ ¼ U+00BC vulgar fraction one quarter = fraction one quarter ISOnum
frac12 ½ ½ ½ U+00BD vulgar fraction one half = fraction one half ISOnum
frac34 ¾ ¾ ¾ U+00BE vulgar fraction three quarters = fraction three quarters ISOnum
iquest ¿ ¿ ¿ U+00BF inverted question mark = turned question mark ISOnum
Agrave À À À U+00C0 latin capital letter A with grave = latin capital letter A grave ISOlat1
Aacute Á Á Á U+00C1 latin capital letter A with acute ISOlat1
Abreve Ă Ă Ă   latin capital letter A with breve  
Acirc    U+00C2 latin capital letter A with circumflex ISOlat1
Atilde à à à U+00C3 latin capital letter A with tilde ISOlat1
Auml Ä Ä Ä U+00C4 latin capital letter A with diaeresis ISOlat1
Aring Å Å Å U+00C5 latin capital letter A with ring above = latin capital letter A ring ISOlat1
AElig Æ Æ Æ U+00C6 latin capital letter AE = latin capital ligature AE ISOlat1
Ccedil Ç Ç Ç U+00C7 latin capital letter C with cedilla or comma ISOlat1
Egrave È È È U+00C8 latin capital letter E with grave ISOlat1
Eacute É É É U+00C9 latin capital letter E with acute ISOlat1
Ecirc Ê Ê Ê U+00CA latin capital letter E with circumflex ISOlat1
Euml Ë Ë Ë U+00CB latin capital letter E with diaeresis ISOlat1
Gbreve Ğ Ğ or Ğ Ğ; Turkish uppercase G with breve accent
Idot İ İ or İ İ &8212; Turkish uppercase dotted I
Igrave Ì Ì Ì U+00CC latin capital letter I with grave ISOlat1
Iacute Í Í Í U+00CD latin capital letter I with acute ISOlat1
Icirc Î Î Î U+00CE latin capital letter I with circumflex ISOlat1
Iuml Ï Ï Ï U+00CF latin capital letter I with diaeresis ISOlat1
ETH Ð Ð Ð U+00D0 latin capital letter ETH ISOlat1
Ntilde Ñ Ñ Ñ U+00D1 latin capital letter N with tilde ISOlat1
Ograve Ò Ò Ò U+00D2 latin capital letter O with grave ISOlat1
Oacute Ó Ó Ó U+00D3 latin capital letter O with acute ISOlat1
Ocirc Ô Ô Ô U+00D4 latin capital letter O with circumflex ISOlat1
Otilde Õ Õ Õ U+00D5 latin capital letter O with tilde ISOlat1
Ouml Ö Ö Ö U+00D6 latin capital letter O with diaeresis ISOlat1
times × × × U+00D7 multiplication sign ISOnum
Oslash Ø Ø Ø U+00D8 latin capital letter O with stroke = latin capital letter O slash ISOlat1
Scedil Ș or Ş Ș or Ş Ș latin capital letter S with cedilla or comma
Tcedil Ț or Ţ Ț or Ţ Ț latin capital letter T with cedilla or comma
Ugrave Ù Ù Ù U+00D9 latin capital letter U with grave ISOlat1
Uacute Ú Ú Ú U+00DA latin capital letter U with acute ISOlat1
Ucirc Û Û Û U+00DB latin capital letter U with circumflex ISOlat1
Uuml Ü Ü Ü U+00DC latin capital letter U with diaeresis ISOlat1
Yacute Ý Ý Ý U+00DD latin capital letter Y with acute ISOlat1
THORN Þ Þ Þ U+00DE latin capital letter THORN ISOlat1
szlig ß ß ß U+00DF latin small letter sharp s = ess-zed ISOlat1
agrave à à à U+00E0 latin small letter a with grave = latin small letter a grave ISOlat1
aacute á á á U+00E1 latin small letter a with acute ISOlat1
abreve ă ă ă latin small letter a with breve
acirc â â â U+00E2 latin small letter a with circumflex ISOlat1
atilde ã ã ã U+00E3 latin small letter a with tilde ISOlat1
auml ä ä ä U+00E4 latin small letter a with diaeresis ISOlat1
aring å å å U+00E5 latin small letter a with ring above = latin small letter a ring ISOlat1
aelig æ æ æ U+00E6 latin small letter ae = latin small ligature ae ISOlat1
ccedil ç ç ç U+00E7 latin small letter c with cedilla or comma ISOlat1
egrave è è è U+00E8 latin small letter e with grave ISOlat1
eacute é é é U+00E9 latin small letter e with acute ISOlat1
ecirc ê ê ê U+00EA latin small letter e with circumflex ISOlat1
euml ë ë ë U+00EB latin small letter e with diaeresis ISOlat1
gbreve ğ ğ or ğ ğ Turkish lowercase “g” with a breve accent
‘idotless’ ı ı or ı ı Turkish lowercase dotless “i”
igrave ì ì ì U+00EC latin small letter i with grave ISOlat1
iacute í í í U+00ED latin small letter i with acute ISOlat1
icirc î î î U+00EE latin small letter i with circumflex ISOlat1
iuml ï ï ï U+00EF latin small letter i with diaeresis ISOlat1
eth ð ð ð U+00F0 latin small letter eth ISOlat1
ntilde ñ ñ ñ U+00F1 latin small letter n with tilde ISOlat1
ograve ò ò ò U+00F2 latin small letter o with grave ISOlat1
oacute ó ó ó U+00F3 latin small letter o with acute ISOlat1
ocirc ô ô ô U+00F4 latin small letter o with circumflex ISOlat1
otilde õ õ õ U+00F5 latin small letter o with tilde ISOlat1
ouml ö ö ö U+00F6 latin small letter o with diaeresis ISOlat1
divide ÷ ÷ ÷ U+00F7 division sign ISOnum
oslash ø ø ø U+00F8 latin small letter o with stroke = latin small letter o slash ISOlat1
scedil ș or ş ș or ş ș latin small letter s with cedilla or comma
tcedil ț or ţ ț or ţ ț latin small letter t with cedilla or comma
ugrave ù ù ù U+00F9 latin small letter u with grave ISOlat1
uacute ú ú ú U+00FA latin small letter u with acute ISOlat1
ucirc û û û U+00FB latin small letter u with circumflex ISOlat1
uuml ü ü ü U+00FC latin small letter u with diaeresis ISOlat1
yacute ý ý ý U+00FD latin small letter y with acute ISOlat1
thorn þ þ þ U+00FE latin small letter thorn with ISOlat1
yuml ÿ ÿ ÿ U+00FF latin small letter y with diaeresis ISOlat1



C0 Controls and Basic Latin

The apostrophe mark is not defined in the HTML 4 entities; appears for the first time in XHTML 1.0.

by name char by number char Unicode description standard
quot " " " U+0022 quotation mark = APL quote ISOnum
amp & & & U+0026 ampersand ISOnum
apos ' ' ' U+0027 apostrophe mark ISOnum
lt < &#60; < U+003C less-than sign ISOnum
gt > &#62; > U+003E greater-than sign ISOnum



Latin Extended-A

* oelig: ligature is a misnomer, this is a separate character in some languages

by name char by number char Unicode description standard
OElig Œ &#338; Œ U+0152 latin capital ligature OE ISOlat2
oelig œ &#339; œ U+0153 latin small ligature oe ISOlat2
Scaron Š &#352; Š U+0160 latin capital letter S with caron ISOlat2
scaron š &#353; š U+0161 latin small letter s with caron ISOlat2
Yuml Ÿ &#376; Ÿ U+0178 latin capital letter Y with diaeresis ISOlat2




Spacing Modifier Letters

by name char by number char Unicode description standard
circ ˆ &#710; ˆ U+02C6 modifier letter circumflex accent ISOpub
tilde ˜ &#732; ˜ U+02DC small tilde ISOdia




General Punctuation

* lsaquo is proposed but not yet ISO standardized
* rsaquo is proposed but not yet ISO standardized

by name char by number char Unicode description standard
ensp &#8194; U+2002 en space ISOpub
emsp &#8195; U+2003 em space ISOpub
thinsp &#8201; U+2009 thin space ISOpub
zwnj &#8204; U+200C zero width non-joiner NEW RFC 2070
zwj &#8205; U+200D zero width joiner NEW RFC 2070
lrm &#8206; U+200E left-to-right mark NEW RFC 2070
rlm &#8207; U+200F right-to-left mark NEW RFC 2070
ndash &#8211; U+2013 en dash ISOpub
mdash &#8212; U+2014 em dash ISOpub
lsquo &#8216; U+2018 left single quotation mark ISOnum
rsquo &#8217; U+2019 right single quotation mark ISOnum
sbquo &#8218; U+201A single low-9 quotation mark NEW
ldquo &#8220; U+201C left double quotation mark ISOnum
rdquo &#8221; U+201D right double quotation mark ISOnum
bdquo &#8222; U+201E double low-9 quotation mark NEW
dagger &#8224; U+2020 dagger ISOpub
Dagger &#8225; U+2021 double dagger ISOpub
permil &#8240; U+2030 per mille sign ISOtech
lsaquo &#8249; U+2039 single left-pointing angle quotation mark ISOproposed
rsaquo &#8250; U+203A single right-pointing angle quotation mark ISOproposed
euro &#8364; U+20AC euro sign NEW

Mathematical, Greek and Symbolic characters for HTML

Latin Extended-B

by name char by number char Unicode description standard
fnof ƒ &#402; ƒ U+0192 latin small f with hook = function = florin ISOtech



Greek

*there is no Sigmaf, nor U+03A2 character

by name char by number char Unicode description standard
Alpha Α &#913; Α U+0391 greek capital letter alpha
Beta Β &#914; Β U+0392 greek capital letter beta
Gamma Γ &#915; Γ U+0393 greek capital letter gamma ISOgrk3
Delta Δ &#916; Δ U+0394 greek capital letter delta ISOgrk3
Epsilon Ε &#917; Ε U+0395 greek capital letter epsilon
Zeta Ζ &#918; Ζ U+0396 greek capital letter zeta
Eta Η &#919; Η U+0397 greek capital letter eta
Theta Θ &#920; Θ U+0398 greek capital letter theta ISOgrk3
Iota Ι &#921; Ι U+0399 greek capital letter iota
Kappa Κ &#922; Κ U+039A greek capital letter kappa
Lambda Λ &#923; Λ U+039B greek capital letter lambda ISOgrk3
Mu Μ &#924; Μ U+039C greek capital letter mu
Nu Ν &#925; Ν U+039D greek capital letter nu
Xi Ξ &#926; Ξ U+039E greek capital letter xi ISOgrk3
Omicron Ο &#927; Ο U+039F greek capital letter omicron
Pi Π &#928; Π U+03A0 greek capital letter pi ISOgrk3
Rho Ρ &#929; Ρ U+03A1 greek capital letter rho
Sigma Σ &#931; Σ U+03A3 greek capital letter sigma ISOgrk3
Tau Τ &#932; Τ U+03A4 greek capital letter tau
Upsilon Υ &#933; Υ U+03A5 greek capital letter upsilon ISOgrk3
Phi Φ &#934; Φ U+03A6 greek capital letter phi ISOgrk3
Chi Χ &#935; Χ U+03A7 greek capital letter chi
Psi Ψ &#936; Ψ U+03A8 greek capital letter psi ISOgrk3
Omega Ω &#937; Ω U+03A9 greek capital letter omega ISOgrk3
alpha α &#945; α U+03B1 greek small letter alpha ISOgrk3
alpha acute ά &#940; ά greek small letter alpha with acute accent
beta β &#946; β U+03B2 greek small letter beta ISOgrk3
gamma γ &#947; γ U+03B3 greek small letter gamma ISOgrk3
delta δ &#948; δ U+03B4 greek small letter delta ISOgrk3
epsilon ε &#949; ε U+03B5 greek small letter epsilon ISOgrk3
zeta ζ &#950; ζ U+03B6 greek small letter zeta ISOgrk3
eta η &#951; η U+03B7 greek small letter eta ISOgrk3
theta θ &#952; θ U+03B8 greek small letter theta ISOgrk3
iota ι &#953; ι U+03B9 greek small letter iota ISOgrk3
kappa κ &#954; κ U+03BA greek small letter kappa ISOgrk3
lambda λ &#955; λ U+03BB greek small letter lambda ISOgrk3
mu μ &#956; μ U+03BC greek small letter mu ISOgrk3
nu ν &#957; ν U+03BD greek small letter nu ISOgrk3
xi ξ &#958; ξ U+03BE greek small letter xi ISOgrk3
omicron ο &#959; ο U+03BF greek small letter omicron NEW
pi π &#960; π U+03C0 greek small letter pi ISOgrk3
rho ρ &#961; ρ U+03C1 greek small letter rho ISOgrk3
sigmaf ς &#962; ς U+03C2 greek small letter final sigma ISOgrk3
sigma σ &#963; σ U+03C3 greek small letter sigma ISOgrk3
tau τ &#964; τ U+03C4 greek small letter tau ISOgrk3
upsilon υ &#965; υ U+03C5 greek small letter upsilon ISOgrk3
phi φ &#966; φ U+03C6 greek small letter phi ISOgrk3
chi χ &#967; χ U+03C7 greek small letter chi ISOgrk3
psi ψ &#968; ψ U+03C8 greek small letter psi ISOgrk3
omega ω &#969; ω U+03C9 greek small letter omega ISOgrk3
thetasym ϑ &#977; ϑ U+03D1 greek small letter theta symbol NEW
upsih ϒ &#978; ϒ U+03D2 greek upsilon with hook symbol NEW
piv ϖ &#982; ϖ U+03D6 greek pi symbol ISOgrk3



General Punctuation

* bullet is NOT the same as bullet operator, U+2219

by name char by number char Unicode description standard
bull &#8226; U+2022 bullet = black small circle ISOpub
hellip &#8230; U+2026 horizontal ellipsis = three dot leader ISOpub
prime &#8242; U+2032 prime = minutes = feet ISOtech
Prime &#8243; U+2033 double prime = seconds = inches ISOtech
oline &#8254; U+203E overline = spacing overscore NEW
frasl &#8260; U+2044 fraction slash NEW



Letterlike Symbols

* alef symbol is NOT the same as hebrew letter alef, U+05D0 although the same glyph could be used to depict both characters

by name char by number char Unicode description standard
weierp &#8472; U+2118 script capital P = power set = Weierstrass p ISOamso
image &#8465; U+2111 blackletter capital I = imaginary part ISOamso
real &#8476; U+211C blackletter capital R = real part symbol ISOamso
trade &#8482; U+2122 trade mark sign ISOnum
alefsym &#8501; U+2135 alef symbol = first transfinite cardinal NEW



Arrows

* Unicode does not say that lArr is the same as the 'is implied by' arrow but also does not have any other character for that function. So ? lArr can be used for 'is implied by' as ISOtech suggests
* Unicode does not say rArr is the 'implies' character but does not have another character with this function so ? rArr can be used for 'implies' as ISOtech suggests

by name char by number char Unicode description standard
larr &#8592; U+2190 leftwards arrow ISOnum
uarr &#8593; U+2191 upwards arrow ISOnum
rarr &#8594; U+2192 rightwards arrow ISOnum
darr &#8595; U+2193 downwards arrow ISOnum
harr &#8596; U+2194 left right arrow ISOamsa
crarr &#8629; U+21B5 downwards arrow with corner leftwards = carriage return NEW
lArr &#8656; U+21D0 leftwards double arrow ISOtech
uArr &#8657; U+21D1 upwards double arrow ISOamsa
rArr &#8658; U+21D2 rightwards double arrow ISOtech
dArr &#8659; U+21D3 downwards double arrow ISOamsa
hArr &#8660; U+21D4 left right double arrow ISOamsa



Mathematical Operators

* should there be a more memorable name than 'ni'?
* prod is NOT the same character as U+03A0 'greek capital letter pi' though the same glyph might be used for both
* sum is NOT the same character as U+03A3 'greek capital letter sigma' though the same glyph might be used for both
* sim: tilde operator is NOT the same character as the tilde, U+007E, although the same glyph might be used to represent both
* note that nsup, 'not a superset of, U+2283' is not covered by the Symbol font encoding and is not included. Should it be, for symmetry? It is in ISOamsn
* sdot: dot operator is NOT the same character as U+00B7 middle dot

by name char by number char Unicode description standard
forall &#8704; U+2200 for all ISOtech
part &#8706; U+2202 partial differential ISOtech
exist &#8707; U+2203 there exists ISOtech
empty &#8709; U+2205 empty set = null set = diameter ISOamso
nabla &#8711; U+2207 nabla = backward difference ISOtech
isin &#8712; U+2208 element of ISOtech
notin &#8713; U+2209 not an element of ISOtech
ni &#8715; U+220B contains as member ISOtech
prod &#8719; U+220F n-ary product = product sign ISOamsb
sum &#8721; U+2211 n-ary sumation ISOamsb
minus &#8722; U+2212 minus sign ISOtech
lowast &#8727; U+2217 asterisk operator ISOtech
radic &#8730; U+221A square root = radical sign ISOtech
prop &#8733; U+221D proportional to ISOtech
infin &#8734; U+221E infinity ISOtech
ang &#8736; U+2220 angle ISOamso
and &#8743; U+2227 logical and = wedge ISOtech
or &#8744; U+2228 logical or = vee ISOtech
cap &#8745; U+2229 intersection = cap ISOtech
cup &#8746; U+222A union = cup ISOtech
int &#8747; U+222B integral ISOtech
there4 &#8756; U+2234 therefore ISOtech
sim &#8764; U+223C tilde operator = varies with = similar to ISOtech
cong &#8773; U+2245 approximately equal to ISOtech
asymp &#8776; U+2248 almost equal to = asymptotic to ISOamsr
ne &#8800; U+2260 not equal to ISOtech
equiv &#8801; U+2261 identical to ISOtech
le &#8804; U+2264 less-than or equal to ISOtech
ge &#8805; U+2265 greater-than or equal to ISOtech
sub &#8834; U+2282 subset of ISOtech
sup &#8835; U+2283 superset of ISOtech
nsub &#8836; U+2284 not a subset of ISOamsn
sube &#8838; U+2286 subset of or equal to ISOtech
supe &#8839; U+2287 superset of or equal to ISOtech
oplus &#8853; U+2295 circled plus = direct sum ISOamsb
otimes &#8855; U+2297 circled times = vector product ISOamsb
perp &#8869; U+22A5 up tack = orthogonal to = perpendicular ISOtech
sdot &#8901; U+22C5 dot operator ISOamsb



Miscellaneous Technical

* lang is NOT the same character as U+003C 'less than' or U+2039 'single left-pointing angle quotation mark'
* rang is NOT the same character as U+003E 'greater than' or U+203A 'single right-pointing angle quotation mark'

by name char by number char Unicode description standard
lceil &#8968; U+2308 left ceiling = apl upstile ISOamsc
rceil &#8969; U+2309 right ceiling ISOamsc
lfloor &#8970; U+230A left floor = apl downstile ISOamsc
rfloor &#8971; U+230B right floor ISOamsc
lang &#9001; U+2329 left-pointing angle bracket = bra ISOtech
rang &#9002; U+232A right-pointing angle bracket = ket ISOtech



Geometric Shapes

by name char by number char Unicode description standard
loz &#9674; U+25CA lozenge ISOpub

Miscellaneous Symbols

* black here seems to mean filled as opposed to hollow

by name char by number char Unicode description standard
spades &#9824; U+2660 black spade suit ISOpub
clubs &#9827; U+2663 black club suit = shamrock ISOpub
hearts &#9829; U+2665 black heart suit = valentine ISOpub
diams &#9830; U+2666 black diamond suit ISOpub

Portions © International Organization for Standardization 1986:
Permission to copy in any form is granted for use with conforming SGML systems and applications as defined in ISO 8879, provided this notice is included in all copies.


Converted to tables by Oscar van Vlijmen, 1999-11-21
URL-alias of this page: http://ovv.club.tip.nl/EntitiesXHTML1.html
Last modification date: 2000-12-22