OW2 Consortium
Search OW2 Mail Archive: 

Advanced Search - Powered by Google

Mail Archive Home | ops-users List | September 2012 Index

<--  Date Index  --> <--  Thread Index  -->

[ops-users] Specify character encoding for URL-encoded UTF-8 query strings / "Illegal HTML character: decimal 128"

We're encountering a character encoding issue when reading a UTF-8 query 
string. An separate outside application is constructs links to our Orbeon 
application such as:

- http://localhost:8080/ops/encoding-test/?message=hello%20world
- http://localhost:8080/ops/encoding-test/?message=it%E2%80%99s%20a%20message

Our application's model reading the query string with the oxf:request 
processor, and then displaying the string in a view. In the first case above, 
the application displays "hello world" correctly without problems. In the 
second test case, "%E2%80%99" is the URL encoding for a UTF-8 apostrophe, and 
causes the application to error with:

> 2012-09-13 12:21:43,383 ERROR XSLTTransformer  - Error at line 174 of 
> oxf:/config/theme-examples.xsl:
> Illegal HTML character: decimal 128
> 2012-09-13 12:21:43,384 ERROR ProcessorService  - Exception at line 174 of 
> oxf:/config/theme-examples.xsl
> ; SystemID: oxf:/config/theme-examples.xsl; Line#: 174; Column#: -1
> org.orbeon.saxon.trans.XPathException: Illegal HTML character: decimal 128

- Full log output: https://gist.github.com/3716033
- Application test-case source: https://gist.github.com/3716159 (also 
attached as encoding-test.zip)

The error is referencing the %80 in the second byte of the multi-byte 
encoding of the apostrophe. Note that in the log not only does the theme 
raise an exception, but the xforms inspector does as well.

It appears like the URL is being decoded as Latin1 instead of UTF-8, as the 
debug processor lists "it???s a message" with three characters for the 
apostrophe. In my research so far, it doesn't appear that HTTP has a way to 
specify the encoding of the query string itself.

1. Is there a way to specify the encoding of a query string when read with 
oxf:request? I didn't see a configuration property for the processor or 
anything relevant in properties-local.xml that would set a default.
2. If not, is there a way to force the associated encoding of the string? I 
suspect this could be done with XSLT, but was unable to find an example. I 
believe I want something equivalent to ruby's String#force_encoding.
3. If not, is there any other suggested way to work around the error? My 
current worst-case hack-fix here is to just strip out any offending 
characters using mod_rewrite before it hits the servlet.

Any guidance and assistance is appreciated!

Attachment: encoding-test.zip
Description: Zip archive

<--  Date Index  --> <--  Thread Index  -->

Reply via email to:

Powered by MHonArc.

Copyright © 2006-2007, OW2 Consortium | contact | webmaster.