unicode – Giant Geek Blog

Remove language packs from Windows 7

I often have to install different languages/locales on Windows 7 to perform testing in different languages, unfortuately adding all of them into a single installation can take a lot of space, particularly when using a virtual machine.

Using the usual method to ‘remove installed software’ will remove updates, but leaves the languages in place, to completely remove them you must open a command prompt and execute the following:

LPKSETUP

Select the languages you wish to remove, and click continue… it will take a while, but the languages will be removed one at a time.

REFERENCES:

Sample Tomcat7 setup

There are a few steps that I generally take to setup a new Tomcat server instance, this enables the following:

The manager console
HTTP compression
UTF-8 encoding

Steps:

tomcat-users.xml – add to bottom:
<role rolename="manager-gui"/> <user username="tomcat" password="s3cr3t" roles="manager-gui"/>
server.xml – add compression and URIEncoding, change port if desired:
<Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" compression="on" URIEncoding="UTF-8" />
server.xml – relocate webapps by adding ../ to appBase
<Host name="localhost" appBase="../webapps" unpackWARs="true" autoDeploy="true">
Restart your server, on Ubuntu use:
sudo service tomcat7 restart

Browser performance impact of charset/codepage assignment

Most developers (myself included) are often unaware of the performance impact of the Content-Type / charset of a web page. Ideally you should set this as an HTTP Header vs. using META http-equiv. It’s often though that this only helps with the transport and display of data, however, the browser also makes use of it when parsing CSS & JS assets. Tags related to those provide an optional ‘charset‘ attribute should they ever need to vary from your content.

General guidance is to set this at the very top of the <head> before <title>; and within the first 1024 bytes, though there are reports that Firefox will look at the first 2048 bytes of the page for this META information.

Not doing so may cause the browser to do a codepage restart to re-parse assets that were interpreted in the potentially incorrect codepage.

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

REFERENCES:

nbsp; and other common entities do not validate as HTML5!

The only built-in entities in XML are &, <, >, " and ' XHTML added the others via a DTD that is not a part of HTML5. As such, validators will report them as errors.

Safe replacements are the decimal notation:   or the character itself U+00A0;

Quite a few other common symbols are not available without similar changes.

< = <
> = >
& = &
' = '
" = "
  –  
© = ©
® = ®
™ = ™

REFERENCES:

Defining word-break and word-wrap in CSS

I recently found a case where WebKit (Chromium, and Safari) was acting as if ‘overflow-x:visible;‘ was set in cases where text could not be wrapped inside a DIV due to a lack of spaces or hyphenation as it was a java stack trace. In this case I had to explicitly set the ‘word-wrap:break-word;‘ attribute for the problematic DIV.

.breakword { word-wrap: break-word; }

Also, for Unicode languages where there are other rules to complex to describe here…
.wordbreak { word-break: keep-all; }

Rupee

I’ve done a lot of Internationalization(I18N) and Localization (L10N) work in my various development positions. One particularly troubling area is currency support. Support of number formats is generally well supported (or can be accomplished with some trivial input translation). However, the tricky area come with support for currency symbols, western currencies such as USD (US$) and CAD(C$) and the Euro (EUR or €) are well supported across character sets and fonts some are not. One particular item is for the Indian Rupee (INR). Ubuntu 10.10 is the first operating system to ship with a font that supports this character ₹

Unicode = ₹

JavaScript TextNode for special characters

It can be difficult to create or output some characters as JavaScript TextNodes. Typically you might try to use the ampersand notation, unfortunately the ampersand itself gets escaped to & as such becomes   Use of the Unicode notation can easily resolve this issue.

Example for NBSP:
var nsbptextnode = document.createTextNode('\u00A0');

Eclipse ResourceBundle Editor

I typically use the open-source Eclipse IDE for most of my Java and PHP work. For my corporate work, this means that I use IBM‘s packaged RAD and WSAD offerings that are based on various versions of the Eclipse framework.

When working on Internationalized (I18n) applications, most experienced Java architects rely on ResourceBundles to store the various text that is needed for different languages, problem is that editing these files becomes problematic, especially when dealing with multi-byte character sets as are often used in Unicode (non Latin-1, aka ISO-8859-1) languages.

The best editor I’ve found for this case is, as you may have guessed, free for download.

Here’s the links:

Cheers!

Downloadable WebFonts

To maintain accessibility and SEO (Search Engine Optimization), there’s often a need to be creative with fonts. This is sometimes due to aesthetics, but often to meet technical needs like foreign non-Latin languages that have unique characters/glyphs not normally installed on workstations. Producing images for each character would be very time consuming, bandwidth intensive and destroy search engine rankings.

Create embedded fonts using one of 2 available formats:

1. Portable Font Resources (.pfr): TrueDoc technology was developed by Bitstream and licensed by Netscape. It can be viewed by Navigator 4.0+ and Explorer 4.0+ on Windows, Mac, and Unix platforms.

<link rel = “fontdef” src=”myfont.pfr” />

2. Embeddable Open Type (.eot): Compatible only with Explorer 4.0+ on the Windows platform. Create .eot files using Microsoft’s free Web Embedding Font Tool (WEFT).

<style type=”text/css”>
<–!
@font-face {
src:url(/fonts/myfont.eot);
}
–>
</style>

References:

Tooling:

Tutorials:

Cheers!

UTF-8 (BOM) prevents java compilation

I had an adventure tracking this one down lately, it seems that if your IDE saves files as UTF-8, the java compiler can’t always resolve the files.

Here’s the errors from the console output:

[INFO] ————————————————————————
[ERROR] BUILD FAILURE
[INFO] ————————————————————————
[INFO] Compilation failure

C:\Sandbox\Jars\example.jar\src\main\java\com\giantgeek\Example.java:[1,0] ‘class’ or ‘interface’ expected

C:\Sandbox\Jars\example.jar\src\main\java\com\giantgeek\Example.java:[1,1] illegal character: \187

C:\Sandbox\Jars\example.jar\src\main\java\com\giantgeek\Example.java:[1,2] illegal character: \191

Those character codes (\187 \191) may look a little familiar to some people, as they represent the Byte Order Mark (BOM) that prefixes a UTF-8 formatted file. If you look at them in a file editor (or text editor that doesn’t interpret UTF-8) they will look odd.

They look like “an i (two dots over), double right arrow, upside down question mark”.

Simple solution is to re-edit and save the file as ISO-8859-1.

An alternate approach that is available in some instances is to use the arguments to javac to allow the file encoding.

References:

Cheers!

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30