|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hello all -
I have a database with a text field that has bad characters in it. I think what happened was that a user pasted 'curly-quotes' from MS Word into an HTML text form. Anyway, all I know is that there are bad characters in our database, and I need to find and replace them. The problem is, because they are so weird, my terminal program doesn't copy them properly, so I can't properly search for them. Here's the bad characters encapsulating the word 'good': ââ'¬Å"goodââ'¬Â But when I paste those girls into the putty terminal, I get something else. Even though the characters on the screen are the same, they're something else internally. For instance, this query SELECT field FROM table WHERE field LIKE '%ââ'¬Â%' returns results, whereas this one SELECT field FROM table WHERE field LIKE '%ââ,¬Â%' doesn't. the only differences is that the second one has been pasted into Windows notepad. So is there a way I can crawl through each character in a field and find out exactly what it is and its encoding? |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On Mon, 11 Feb 2008 21:56:47 +0100, <lawpoop@gmail.com> wrote:
> Hello all - > > I have a database with a text field that has bad characters in it. I > think what happened was that a user pasted 'curly-quotes' from MS Word > into an HTML text form. Anyway, all I know is that there are bad > characters in our database, and I need to find and replace them. > > The problem is, because they are so weird, my terminal program doesn't > copy them properly, so I can't properly search for them. Here's the > bad characters encapsulating the word 'good': > > ââ'¬Å"goodââ'¬Â > > But when I paste those girls into the putty terminal, I get something > else. Even though the characters on the screen are the same, they're > something else internally. For instance, this query > > SELECT field FROM table WHERE field LIKE '%ââ'¬Â%' > > returns results, whereas this one > > SELECT field FROM table WHERE field LIKE '%ââ,¬Â%' > > doesn't. the only differences is that the second one has been pasted > into Windows notepad. Find out what the hexadecimal value is, and replace that? Something like: UPDATE tablename SET field = REPLACE(field,0x123456,'"'); .... where 0x123456 should be altered in the real value (don't know it myself, maybe this page will give you some insight if you can't deduce it directly from the database(value): <http://64.233.183.104/search?q=cache:8MG9btDGDGEJ:weblogs.asp.net/sbehera/archive/2006/02/28/439299.aspx>). -- Rik Wasmus |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
On Mon, 11 Feb 2008 12:56:47 -0800 (PST), lawpoop@gmail.com wrote:
> Hello all - > > I have a database with a text field that has bad characters in it. I > think what happened was that a user pasted 'curly-quotes' from MS Word > into an HTML text form. Anyway, all I know is that there are bad > characters in our database, and I need to find and replace them. > > The problem is, because they are so weird, my terminal program doesn't > copy them properly, so I can't properly search for them. Here's the > bad characters encapsulating the word 'good': > > ââ'¬Å"goodââ'¬Â > > But when I paste those girls into the putty terminal, I get something > else. Even though the characters on the screen are the same, they're > something else internally. For instance, this query > > SELECT field FROM table WHERE field LIKE '%ââ'¬Â%' > > returns results, whereas this one > > SELECT field FROM table WHERE field LIKE '%ââ,¬Â%' > > doesn't. the only differences is that the second one has been pasted > into Windows notepad. > > So is there a way I can crawl through each character in a field and > find out exactly what it is and its encoding? Find out what it is? Probably not in an automated fashion. You can, however, make an expereinced guess that this is actually UTF-8 data in whatever character set your column is. That's what Windows typically does when pasting multibyte characters. Unfortuantely, my terminal ain't the same as your terminal, so I can't use my favorite gizmo[1] to determine EXACTLY what contents are. The first step for a fix for this is to make a copy of the table structure, copy the affected records off, then use ALTER to change the column first to binary, then to UTF-8. [1]http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder -- 42. When I capture the hero, I will make sure I also get his dog, monkey, ferret, or whatever sickeningly cute little animal capable of untying ropes and filching keys happens to follow him around. --Peter Anspach's list of things to do as an Evil Overlord |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
On Feb 11, 3:37 pm, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
> > Find out what the hexadecimal value is, and replace that? Is there a way that I can explode the field into characters to see the hex values of each character? Right now, I have a hard time even figuring out what the characters are. They look different in mysqldumps versus phpMyAdmin. > > Something like: > UPDATE tablename SET field = REPLACE(field,0x123456,'"'); > ... where 0x123456 should be altered in the real value (don't know it > myself, maybe this page will give you some insight if you can't deduce it > directly from the database(value): > <http://64.233.183.104/search?q=cache:8MG9btDGDGEJ:weblogs.asp.net/sbe...>). > -- > Rik Wasmus |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On Mon, 11 Feb 2008 23:02:59 +0100, <lawpoop@gmail.com> wrote:
> On Feb 11, 3:37 pm, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote: >> Find out what the hexadecimal value is, and replace that? > > Is there a way that I can explode the field into characters to see the > hex values of each character? Right now, I have a hard time even > figuring out what the characters are. They look different in > mysqldumps versus phpMyAdmin. Silly idea: can't just simply enter such a weird single quote character with the same mechanism the others were entered, and HEX() on the field? Trust the (hexedecimal) value in the mysqldump BTW, try to find a hex editor if you can't see it. (I personnaly use ultraedit, is payware, but worth it, and it comes with a trail period). >> Something like: >> UPDATE tablename SET field = REPLACE(field,0x123456,'"'); >> ... where 0x123456 should be altered in the real value (don't know it >> myself, maybe this page will give you some insight if you can't deduce >> it >> directly from the database(value): >> <http://64.233.183.104/search?q=cache:8MG9btDGDGEJ:weblogs.asp.net/sbe...>). -- Rik Wasmus |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
On Feb 11, 4:09 pm, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
> > > Is there a way that I can explode the field into characters to see the > > hex values of each character? Right now, I have a hard time even > > figuring out what the characters are. They look different in > > mysqldumps versus phpMyAdmin. > > Silly idea: can't just simply enter such a weird single quote character > with the same mechanism the others were entered, and HEX() on the field? > No, that's a good idea, but we can't figure out who entered it or exactly what character it was. Anyway, our best guess it that it's an MS Word curly quote, but we don't know if that curly quote was mangled in the HTML form. Hey, that's an idea -- I could write an HTML form to search and replace for specific characters. |
|
![]() |
| Outils de la discussion | |
|
|