|
|
|
|
||||||
| alt.apache.configuration Apache web server configuration issues. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hi,
I'm having problems with a RewriteRule that's applied to url's with the % character in them, hope someone can . The % character is a result of url-encoding non-ASCII words, as in the example below: 1. the word "sécurité" comes out of my db 2. I construct the following link, using the php urlencode function: <a href="/search/s%C3%A9curit%C3%A9">sécurité</a> 3. the url created should be interpreted by a RewriteRule: RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1 [QSA,L] However the RewriteRule doesn't match on my url, and I see this in the RewriteLog: init rewrite engine with requested uri /search/sécurité So it seems like some kind of decoding is going on so that the RewriteRule never even sees the % character. I have set everything I can think of (MySql SET NAMES, Apache AddDefaultCharset) to utf-8. Any ideas? TIA, JON |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
"Jon Maz" <pparker.removethis@gmx.removethistoo.net> schreef in bericht
news:eq5kcj$9q9$1@aioe.org... > I'm having problems with a RewriteRule that's applied to url's with the % > character in them, hope someone can . The % character is a result of > url-encoding non-ASCII words, as in the example below: > > 1. the word "sécurité" comes out of my db > > 2. I construct the following link, using the php urlencode function: > <a href="/search/s%C3%A9curit%C3%A9">sécurité</a> > > 3. the url created should be interpreted by a RewriteRule: > RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1 > [QSA,L] > > However the RewriteRule doesn't match on my url, and I see this in the > RewriteLog: > > init rewrite engine with requested uri /search/sécurité > > So it seems like some kind of decoding is going on so that the RewriteRule > never even sees the % character. I have set everything I can think of > (MySql SET NAMES, Apache AddDefaultCharset) to utf-8. > So php has encoded the url to some ISO8859 variant and apache is decoding those to some utf ... next to wonder is the charset used by your OS to store the file name ... In general, just forget diacritial, language specific, fancy characters and just use 'securite' for filename. It keeps you from dozens of cross-platform and cross-language traps, easing migration of a website ten fold. http://czyborra.com/charsets/iso8859.html 'The ISO 8859 Alphabet Soup' HansH |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Hi Hans,
Thanks for your answer. I guess I'm best off just avoiding the whole thing. What got me wondering was the fact that my php application can cope fine when this encoded word is passed in the query string: /pages/search.php?word=s%C3%A9curit%C3%A9 But perhaps it's simply that different rules apply to a url and a query string parameter? Thanks, JON |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
On 5 Feb, 00:31, "Jon Maz" <pparker.removet...@gmx.removethistoo.net>
wrote: > Hi Hans, > > Thanks for your answer. I guess I'm best off just avoiding the whole thing. > > What got me wondering was the fact that my php application can cope fine > when this encoded word is passed in the query string: > > /pages/search.php?word=s%C3%A9curit%C3%A9 > > But perhaps it's simply that different rules apply to a url and a query > string parameter? > > Thanks, > > JON (I'm using google groups to submit to alt.apache.configuration so who knows what the character representation will be when you see it - however google's hot at this stuff!) IMHO your rewrite is working, and that its not the fault of the encoding utf-8 url thingy - it's just that the rewrite is matching the characters you've spcified. If you use RewriteRule ^search/(.+) /pages/search.php?word=$1 [QSA,L] (which obviously will need tweaking for your use) it will work RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1 [QSA,L] doesnt quite work because you are only matching the characters a-z, A- Z, 0-9, -, + and % whereas Apache internal "knows" what the characters actually are, that é is %C3%A9 and so Apache is expecting you to match against the actual characters, rather than the url you see in the browser (I hope thats clear) In other words by the time the matching is being done, Apache is looking for you to match against the utf8 characters, not their urlencoded representation. (IMHO) so when I tested your rewrite (A2.2.4) it worked but only caught the s at the start. I know what you are trying to, match in utf8 - for that though I guess you would have to use a regular expression that matched ranges of hex values, rather than each ascii value in turn, that would work, if you could be bothered to look up the hex equivalents for the characters you allow! The rewrite log is a "gotcha", Apche logs in your OS as 8bit ISO-8859-15 so you were seeing: init rewrite engine with requested uri /search/sécurité where sécurité is the 8bit representation of the utf-8 encoded word sécurité so I believe apache is seeing your utf-8 url, but logging it to a file in the other format. Now if I'm wrong, I apologise, this one took me a while to figure out (and test all the cases - I used Ubuntu and XP pro), so thanks for putting it here. Looking at the apache docs, they appear to have pretty rigorous encoding functions for all the different transformations, impressive if opaque! |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
"Jon Maz" <pparker.removethis@gmx.removethistoo.net> wrote in message news:eq5kcj$9q9$1@aioe.org... > Hi, > > I'm having problems with a RewriteRule that's applied to url's with the % > character in them, hope someone can . The % character is a result of > url-encoding non-ASCII words, as in the example below: > > 1. the word "sécurité" comes out of my db > > 2. I construct the following link, using the php urlencode function: > <a href="/search/s%C3%A9curit%C3%A9">sécurité</a> How do you get s%C3%A9curit%C3%A9 from sécurité sécurité, url encoded, is s%E9curit%E9 s%C3%A9curit%C3%A9 decoded is sécurité as is correctly reported in your rewrite log. > > 3. the url created should be interpreted by a RewriteRule: > RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1 [QSA,L] a hyphen in a character class specifies a range unless it's the first or last character in the class what range are you looking for with 9-+ > > However the RewriteRule doesn't match on my url, and I see this in the > RewriteLog: > > init rewrite engine with requested uri /search/sécurité The rewrite rule works correctly, the uri contains à and ©. The regex doesn't allow for these. > > So it seems like some kind of decoding is going on so that the RewriteRule > never even sees the % character. I have set everything I can think of > (MySql SET NAMES, Apache AddDefaultCharset) to utf-8. > The uri is decoded before the server tries to resolve it, why would it not? Why are you trying to do the heavy lifting with mod rewrite? just pass the search term to the script and validate it there, you should validate all user input in your scripts. RewriteRule ^search/(.+)$ /pages/search.php?word=$1 [QSA,L] Rich |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
On Feb 5, 6:15 am, "rh" <disposable12...@cableone.net> wrote:
> "Jon Maz" <pparker.removet...@gmx.removethistoo.net> wrote in message > > news:eq5kcj$9q9$1@aioe.org... > > > Hi, > > > I'm having problems with a RewriteRule that's applied to url's with the% > > character in them, hope someone can . The % character is a result of > > url-encoding non-ASCII words, as in the example below: > > > 1. the word "sécurité" comes out of my db > > > 2. I construct the following link, using the php urlencode function: > > <a href="/search/s%C3%A9curit%C3%A9">sécurité</a> > > How do you get s%C3%A9curit%C3%A9 from sécurité > > sécurité, url encoded, is s%E9curit%E9 sécurité as ISO-8859-15 when encoded is indeed s%E9curit%E9, but Jon is using utf8 > > s%C3%A9curit%C3%A9 decoded is sécurité as is correctly reportedin your rewrite log. urlencoded utf8 version of sécurité when decoded to 8bit ascii is sécurité but this is just a freak of logging, it isnt used anywhere else. If the logging were to occur in utf8, and we looked at the logs in a utf8 aware viewer we would see sécurité > > > > 3. the url created should be interpreted by a RewriteRule: > > RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1 [QSA,L] Youre right, and it is, but it depends what you mean. The actual utf8 version is what is matched internally by Apache, since it is utf8 aware. The url is encoded for transport, and then Apache correctly uses the "real" encoding. > a hyphen in a character class specifies a range unless it's the first or last character in > the class > > what range are you looking for with 9-+ >From the context APache knows that a range is not intended, and so - and + are matched, of course + comes back into the app as a space. % is slightly different, it cannot be present by itself, but when it is present in conjunction with characters which result in a utf8 character being represented, Apache sees the three characters as one utf8 character and so doesnt match it as it isnt present in the rewrite rules character set as given by Jon. > The rewrite rule works correctly, the uri contains à and ©. The regexdoesn't allow for > these. Actually those characters arent present in the URI as those characters, they are present as the ascii equivalent of the utf8 character é This means that if you want to specify your rewrite rules, you can do so in either of the following two ways: ([a-zA-Z0-9é]+) ([a-zA-Z0-9é]+) however if you use the first you have to make sure that do it using a utf8 encoded conf file. If you usually use ascii conf files, then use the second rule, then to see what I mean, change the encoding of your editor to utf8 and you will see the first rule. Both are equivalent but is whatever editor you use, you must select the correct encoding. > > > So it seems like some kind of decoding is going on so that the RewriteRule > > never even sees the % character. I have set everything I can think of > > (MySql SET NAMES, Apache AddDefaultCharset) to utf-8. > > The uri is decoded before the server tries to resolve it, why would it not? this is right, but when the thing is decoded the é is the only char matched against. > > Why are you trying to do the heavy lifting with mod rewrite? just pass the search term to > the script and validate it there, you should validate all user input in your scripts. > > RewriteRule ^search/(.+)$ /pages/search.php?word=$1 [QSA,L] > > Rich Some people like me use apaches powerful rewriting capabilities as an "application filtering" proxy, so that before the app evens sees the URI, it has been parsed by the reg exps of apache. If youre objective is to simply make the URLs look nice, then you might as well just use PHP, take a look at the mb string functions when you do as they are utf8 aware. Finally its my understnading that it will not be until php6 that php- mysql data can travel on the wire as utf8, at the moment php has to be forced to "consider" the data to be in utf8, so make sure your database is indeed storing in the right format. |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
On Sun, 4 Feb 2007 21:49:08 -0000
"Jon Maz" <pparker.removethis@gmx.removethistoo.net> wrote: > So it seems like some kind of decoding is going on so that the > RewriteRule never even sees the % character. I have set everything I > can think of (MySql SET NAMES, Apache AddDefaultCharset) to utf-8. No you haven't. The expression in your RewriteRule is firmly in ASCII, so it fails to match the non-ASCII characters in the URL. > Any ideas? Don't faff about with mod_rewrite like that. Or if you really must, fix your regexp. Or as someone else said, stick to ASCII. -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/ |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
Thanks to everybody for their on this one!
|
|
![]() |
| Outils de la discussion | |
|
|