PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > comp.databases.mysql > *finding* and replacing bad characters
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
*finding* and replacing bad characters

Réponse
 
LinkBack Outils de la discussion
Vieux 11/02/2008, 20h56   #1
lawpoop@gmail.com
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut *finding* and replacing bad characters

Hello all -

I have a database with a text field that has bad characters in it. I
think what happened was that a user pasted 'curly-quotes' from MS Word
into an HTML text form. Anyway, all I know is that there are bad
characters in our database, and I need to find and replace them.

The problem is, because they are so weird, my terminal program doesn't
copy them properly, so I can't properly search for them. Here's the
bad characters encapsulating the word 'good':

ââ'¬Å"goodââ'¬Â

But when I paste those girls into the putty terminal, I get something
else. Even though the characters on the screen are the same, they're
something else internally. For instance, this query

SELECT field FROM table WHERE field LIKE '%ââ'¬Â%'

returns results, whereas this one

SELECT field FROM table WHERE field LIKE '%ââ,¬Â%'

doesn't. the only differences is that the second one has been pasted
into Windows notepad.

So is there a way I can crawl through each character in a field and
find out exactly what it is and its encoding?

  Réponse avec citation
Vieux 11/02/2008, 21h37   #2
Rik Wasmus
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: *finding* and replacing bad characters

On Mon, 11 Feb 2008 21:56:47 +0100, <lawpoop@gmail.com> wrote:

> Hello all -
>
> I have a database with a text field that has bad characters in it. I
> think what happened was that a user pasted 'curly-quotes' from MS Word
> into an HTML text form. Anyway, all I know is that there are bad
> characters in our database, and I need to find and replace them.
>
> The problem is, because they are so weird, my terminal program doesn't
> copy them properly, so I can't properly search for them. Here's the
> bad characters encapsulating the word 'good':
>
> ââ'¬Å"goodââ'¬Â
>
> But when I paste those girls into the putty terminal, I get something
> else. Even though the characters on the screen are the same, they're
> something else internally. For instance, this query
>
> SELECT field FROM table WHERE field LIKE '%ââ'¬Â%'
>
> returns results, whereas this one
>
> SELECT field FROM table WHERE field LIKE '%ââ,¬Â%'
>
> doesn't. the only differences is that the second one has been pasted
> into Windows notepad.


Find out what the hexadecimal value is, and replace that?

Something like:
UPDATE tablename SET field = REPLACE(field,0x123456,'"');
.... where 0x123456 should be altered in the real value (don't know it
myself, maybe this page will give you some insight if you can't deduce it
directly from the database(value):
<http://64.233.183.104/search?q=cache:8MG9btDGDGEJ:weblogs.asp.net/sbehera/archive/2006/02/28/439299.aspx>).
--
Rik Wasmus
  Réponse avec citation
Vieux 11/02/2008, 21h44   #3
Peter H. Coffin
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: *finding* and replacing bad characters

On Mon, 11 Feb 2008 12:56:47 -0800 (PST), lawpoop@gmail.com wrote:
> Hello all -
>
> I have a database with a text field that has bad characters in it. I
> think what happened was that a user pasted 'curly-quotes' from MS Word
> into an HTML text form. Anyway, all I know is that there are bad
> characters in our database, and I need to find and replace them.
>
> The problem is, because they are so weird, my terminal program doesn't
> copy them properly, so I can't properly search for them. Here's the
> bad characters encapsulating the word 'good':
>
> ââ'¬Å"goodââ'¬Â
>
> But when I paste those girls into the putty terminal, I get something
> else. Even though the characters on the screen are the same, they're
> something else internally. For instance, this query
>
> SELECT field FROM table WHERE field LIKE '%ââ'¬Â%'
>
> returns results, whereas this one
>
> SELECT field FROM table WHERE field LIKE '%ââ,¬Â%'
>
> doesn't. the only differences is that the second one has been pasted
> into Windows notepad.
>
> So is there a way I can crawl through each character in a field and
> find out exactly what it is and its encoding?


Find out what it is? Probably not in an automated fashion. You can,
however, make an expereinced guess that this is actually UTF-8 data in
whatever character set your column is. That's what Windows typically
does when pasting multibyte characters. Unfortuantely, my terminal
ain't the same as your terminal, so I can't use my favorite gizmo[1] to
determine EXACTLY what contents are.

The first step for a fix for this is to make a copy of the table
structure, copy the affected records off, then use ALTER to change the
column first to binary, then to UTF-8.

[1]http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder

--
42. When I capture the hero, I will make sure I also get his dog, monkey,
ferret, or whatever sickeningly cute little animal capable of untying
ropes and filching keys happens to follow him around.
--Peter Anspach's list of things to do as an Evil Overlord
  Réponse avec citation
Vieux 11/02/2008, 22h02   #4
lawpoop@gmail.com
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: *finding* and replacing bad characters

On Feb 11, 3:37 pm, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:

>
> Find out what the hexadecimal value is, and replace that?


Is there a way that I can explode the field into characters to see the
hex values of each character? Right now, I have a hard time even
figuring out what the characters are. They look different in
mysqldumps versus phpMyAdmin.

>
> Something like:
> UPDATE tablename SET field = REPLACE(field,0x123456,'"');
> ... where 0x123456 should be altered in the real value (don't know it
> myself, maybe this page will give you some insight if you can't deduce it
> directly from the database(value):
> <http://64.233.183.104/search?q=cache:8MG9btDGDGEJ:weblogs.asp.net/sbe...>).
> --
> Rik Wasmus


  Réponse avec citation
Vieux 11/02/2008, 22h09   #5
Rik Wasmus
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: *finding* and replacing bad characters

On Mon, 11 Feb 2008 23:02:59 +0100, <lawpoop@gmail.com> wrote:
> On Feb 11, 3:37 pm, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
>> Find out what the hexadecimal value is, and replace that?

>
> Is there a way that I can explode the field into characters to see the
> hex values of each character? Right now, I have a hard time even
> figuring out what the characters are. They look different in
> mysqldumps versus phpMyAdmin.


Silly idea: can't just simply enter such a weird single quote character
with the same mechanism the others were entered, and HEX() on the field?

Trust the (hexedecimal) value in the mysqldump BTW, try to find a hex
editor if you can't see it. (I personnaly use ultraedit, is payware, but
worth it, and it comes with a trail period).

>> Something like:
>> UPDATE tablename SET field = REPLACE(field,0x123456,'"');
>> ... where 0x123456 should be altered in the real value (don't know it
>> myself, maybe this page will give you some insight if you can't deduce
>> it
>> directly from the database(value):
>> <http://64.233.183.104/search?q=cache:8MG9btDGDGEJ:weblogs.asp.net/sbe...>).

--
Rik Wasmus
  Réponse avec citation
Vieux 12/02/2008, 14h09   #6
lawpoop@gmail.com
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: *finding* and replacing bad characters

On Feb 11, 4:09 pm, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
>
> > Is there a way that I can explode the field into characters to see the
> > hex values of each character? Right now, I have a hard time even
> > figuring out what the characters are. They look different in
> > mysqldumps versus phpMyAdmin.

>
> Silly idea: can't just simply enter such a weird single quote character
> with the same mechanism the others were entered, and HEX() on the field?
>


No, that's a good idea, but we can't figure out who entered it or
exactly what character it was.

Anyway, our best guess it that it's an MS Word curly quote, but we
don't know if that curly quote was mangled in the HTML form.

Hey, that's an idea -- I could write an HTML form to search and
replace for specific characters.

  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 13h17.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,17836 seconds with 14 queries