PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Forums Hébergement > Forum Serveur - Sécurité et techniques > alt.apache.configuration > Problem with RewriteRule when url contains percent character
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
alt.apache.configuration Apache web server configuration issues.

Problem with RewriteRule when url contains percent character

Réponse
 
LinkBack Outils de la discussion
Vieux 04/02/2007, 21h49   #1
Jon Maz
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Problem with RewriteRule when url contains percent character

Hi,

I'm having problems with a RewriteRule that's applied to url's with the %
character in them, hope someone can . The % character is a result of
url-encoding non-ASCII words, as in the example below:

1. the word "sécurité" comes out of my db

2. I construct the following link, using the php urlencode function:
<a href="/search/s%C3%A9curit%C3%A9">sécurité</a>

3. the url created should be interpreted by a RewriteRule:
RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1 [QSA,L]

However the RewriteRule doesn't match on my url, and I see this in the
RewriteLog:

init rewrite engine with requested uri /search/sécurité

So it seems like some kind of decoding is going on so that the RewriteRule
never even sees the % character. I have set everything I can think of
(MySql SET NAMES, Apache AddDefaultCharset) to utf-8.

Any ideas?

TIA,

JON


  Réponse avec citation
Vieux 04/02/2007, 22h34   #2
HansH
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Problem with RewriteRule when url contains percent character

"Jon Maz" <pparker.removethis@gmx.removethistoo.net> schreef in bericht
news:eq5kcj$9q9$1@aioe.org...
> I'm having problems with a RewriteRule that's applied to url's with the %
> character in them, hope someone can . The % character is a result of
> url-encoding non-ASCII words, as in the example below:
>
> 1. the word "sécurité" comes out of my db
>
> 2. I construct the following link, using the php urlencode function:
> <a href="/search/s%C3%A9curit%C3%A9">sécurité</a>
>
> 3. the url created should be interpreted by a RewriteRule:
> RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1
> [QSA,L]
>
> However the RewriteRule doesn't match on my url, and I see this in the
> RewriteLog:
>
> init rewrite engine with requested uri /search/sécurité
>
> So it seems like some kind of decoding is going on so that the RewriteRule
> never even sees the % character. I have set everything I can think of
> (MySql SET NAMES, Apache AddDefaultCharset) to utf-8.
>

So php has encoded the url to some ISO8859 variant and apache is decoding
those to some utf ... next to wonder is the charset used by your OS to
store the file name ...

In general, just forget diacritial, language specific, fancy characters and
just use 'securite' for filename.
It keeps you from dozens of cross-platform and cross-language traps, easing
migration of a website ten fold.

http://czyborra.com/charsets/iso8859.html 'The ISO 8859 Alphabet Soup'

HansH



  Réponse avec citation
Vieux 05/02/2007, 00h31   #3
Jon Maz
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Problem with RewriteRule when url contains percent character

Hi Hans,

Thanks for your answer. I guess I'm best off just avoiding the whole thing.

What got me wondering was the fact that my php application can cope fine
when this encoded word is passed in the query string:

/pages/search.php?word=s%C3%A9curit%C3%A9

But perhaps it's simply that different rules apply to a url and a query
string parameter?

Thanks,

JON


  Réponse avec citation
Vieux 05/02/2007, 05h52   #4
shimmyshack
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Problem with RewriteRule when url contains percent character

On 5 Feb, 00:31, "Jon Maz" <pparker.removet...@gmx.removethistoo.net>
wrote:
> Hi Hans,
>
> Thanks for your answer. I guess I'm best off just avoiding the whole thing.
>
> What got me wondering was the fact that my php application can cope fine
> when this encoded word is passed in the query string:
>
> /pages/search.php?word=s%C3%A9curit%C3%A9
>
> But perhaps it's simply that different rules apply to a url and a query
> string parameter?
>
> Thanks,
>
> JON


(I'm using google groups to submit to alt.apache.configuration so who
knows what the character representation will be when you see it -
however google's hot at this stuff!)

IMHO your rewrite is working, and that its not the fault of the
encoding utf-8 url thingy - it's just that the rewrite is matching the
characters you've spcified.

If you use
RewriteRule ^search/(.+) /pages/search.php?word=$1 [QSA,L]

(which obviously will need tweaking for your use) it will work

RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1
[QSA,L]
doesnt quite work because you are only matching the characters a-z, A-
Z, 0-9, -, + and %
whereas Apache internal "knows" what the characters actually are, that
é is %C3%A9 and so Apache is expecting you to match against the actual
characters, rather than the url you see in the browser (I hope thats
clear) In other words by the time the matching is being done, Apache
is looking for you to match against the utf8 characters, not their
urlencoded representation. (IMHO)

so when I tested your rewrite (A2.2.4) it worked but only caught the s
at the start.
I know what you are trying to, match in utf8 - for that though I guess
you would have to use a regular expression that matched ranges of hex
values, rather than each ascii value in turn, that would work, if you
could be bothered to look up the hex equivalents for the characters
you allow!

The rewrite log is a "gotcha", Apche logs in your OS as 8bit
ISO-8859-15 so you were seeing:

init rewrite engine with requested uri /search/sécurité

where
sécurité
is the 8bit representation of the utf-8 encoded word
sécurité
so I believe apache is seeing your utf-8 url, but logging it to a file
in the other format.

Now if I'm wrong, I apologise, this one took me a while to figure out
(and test all the cases - I used Ubuntu and XP pro), so thanks for
putting it here. Looking at the apache docs, they appear to have
pretty rigorous encoding functions for all the different
transformations, impressive if opaque!

  Réponse avec citation
Vieux 05/02/2007, 06h15   #5
rh
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Problem with RewriteRule when url contains percent character


"Jon Maz" <pparker.removethis@gmx.removethistoo.net> wrote in message
news:eq5kcj$9q9$1@aioe.org...
> Hi,
>
> I'm having problems with a RewriteRule that's applied to url's with the %
> character in them, hope someone can . The % character is a result of
> url-encoding non-ASCII words, as in the example below:
>
> 1. the word "sécurité" comes out of my db
>
> 2. I construct the following link, using the php urlencode function:
> <a href="/search/s%C3%A9curit%C3%A9">sécurité</a>


How do you get s%C3%A9curit%C3%A9 from sécurité

sécurité, url encoded, is s%E9curit%E9

s%C3%A9curit%C3%A9 decoded is sécurité as is correctly reported in your rewrite log.

>
> 3. the url created should be interpreted by a RewriteRule:
> RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1 [QSA,L]


a hyphen in a character class specifies a range unless it's the first or last character in
the class

what range are you looking for with 9-+

>
> However the RewriteRule doesn't match on my url, and I see this in the
> RewriteLog:
>
> init rewrite engine with requested uri /search/sécurité


The rewrite rule works correctly, the uri contains à and ©. The regex doesn't allow for
these.

>
> So it seems like some kind of decoding is going on so that the RewriteRule
> never even sees the % character. I have set everything I can think of
> (MySql SET NAMES, Apache AddDefaultCharset) to utf-8.
>


The uri is decoded before the server tries to resolve it, why would it not?

Why are you trying to do the heavy lifting with mod rewrite? just pass the search term to
the script and validate it there, you should validate all user input in your scripts.

RewriteRule ^search/(.+)$ /pages/search.php?word=$1 [QSA,L]


Rich


  Réponse avec citation
Vieux 05/02/2007, 07h30   #6
shimmyshack
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Problem with RewriteRule when url contains percent character

On Feb 5, 6:15 am, "rh" <disposable12...@cableone.net> wrote:
> "Jon Maz" <pparker.removet...@gmx.removethistoo.net> wrote in message
>
> news:eq5kcj$9q9$1@aioe.org...
>
> > Hi,

>
> > I'm having problems with a RewriteRule that's applied to url's with the%
> > character in them, hope someone can . The % character is a result of
> > url-encoding non-ASCII words, as in the example below:

>
> > 1. the word "sécurité" comes out of my db

>
> > 2. I construct the following link, using the php urlencode function:
> > <a href="/search/s%C3%A9curit%C3%A9">sécurité</a>

>
> How do you get s%C3%A9curit%C3%A9 from sécurité
>
> sécurité, url encoded, is s%E9curit%E9


sécurité as ISO-8859-15 when encoded is indeed s%E9curit%E9, but Jon
is using utf8

>
> s%C3%A9curit%C3%A9 decoded is sécurité as is correctly reportedin your rewrite log.


urlencoded utf8 version of sécurité when decoded to 8bit ascii is
sécurité
but this is just a freak of logging, it isnt used anywhere else.
If the logging were to occur in utf8, and we looked at the logs in a
utf8 aware viewer we would see
sécurité

>
>
> > 3. the url created should be interpreted by a RewriteRule:
> > RewriteRule ^search/([a-zA-Z0-9-+%]+)$ /pages/search.php?word=$1 [QSA,L]


Youre right, and it is, but it depends what you mean. The actual utf8
version is what is matched internally by Apache, since it is utf8
aware. The url is encoded for transport, and then Apache correctly
uses the "real" encoding.


> a hyphen in a character class specifies a range unless it's the first or last character in
> the class
>
> what range are you looking for with 9-+


>From the context APache knows that a range is not intended, and so -

and + are matched, of course + comes back into the app as a space. %
is slightly different, it cannot be present by itself, but when it is
present in conjunction with characters which result in a utf8
character being represented, Apache sees the three characters as one
utf8 character and so doesnt match it as it isnt present in the
rewrite rules character set as given by Jon.

> The rewrite rule works correctly, the uri contains à and ©. The regexdoesn't allow for
> these.


Actually those characters arent present in the URI as those
characters, they are present as the ascii equivalent of the utf8
character é

This means that if you want to specify your rewrite rules, you can do
so in either of the following two ways:
([a-zA-Z0-9é]+)
([a-zA-Z0-9é]+)

however if you use the first you have to make sure that do it using a
utf8 encoded conf file.
If you usually use ascii conf files, then use the second rule, then to
see what I mean, change the encoding of your editor to utf8 and you
will see the first rule. Both are equivalent but is whatever editor
you use, you must select the correct encoding.


>
> > So it seems like some kind of decoding is going on so that the RewriteRule
> > never even sees the % character. I have set everything I can think of
> > (MySql SET NAMES, Apache AddDefaultCharset) to utf-8.

>
> The uri is decoded before the server tries to resolve it, why would it not?


this is right, but when the thing is decoded the é is the only char
matched against.

>
> Why are you trying to do the heavy lifting with mod rewrite? just pass the search term to
> the script and validate it there, you should validate all user input in your scripts.
>
> RewriteRule ^search/(.+)$ /pages/search.php?word=$1 [QSA,L]
>
> Rich


Some people like me use apaches powerful rewriting capabilities as an
"application filtering" proxy, so that before the app evens sees the
URI, it has been parsed by the reg exps of apache. If youre objective
is to simply make the URLs look nice, then you might as well just use
PHP, take a look at the mb string functions when you do as they are
utf8 aware.

Finally its my understnading that it will not be until php6 that php-
mysql data can travel on the wire as utf8, at the moment php has to be
forced to "consider" the data to be in utf8, so make sure your
database is indeed storing in the right format.


  Réponse avec citation
Vieux 05/02/2007, 13h07   #7
Nick Kew
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Problem with RewriteRule when url contains percent character

On Sun, 4 Feb 2007 21:49:08 -0000
"Jon Maz" <pparker.removethis@gmx.removethistoo.net> wrote:

> So it seems like some kind of decoding is going on so that the
> RewriteRule never even sees the % character. I have set everything I
> can think of (MySql SET NAMES, Apache AddDefaultCharset) to utf-8.


No you haven't. The expression in your RewriteRule is firmly in
ASCII, so it fails to match the non-ASCII characters in the URL.

> Any ideas?


Don't faff about with mod_rewrite like that. Or if you
really must, fix your regexp. Or as someone else said,
stick to ASCII.

--
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/
  Réponse avec citation
Vieux 06/02/2007, 23h56   #8
Jon Maz
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Problem with RewriteRule when url contains percent character

Thanks to everybody for their on this one!


  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 18h45.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,21746 seconds with 16 queries