PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > comp.lang.php > Stripping MS Word code from my forms once and for all.
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
Stripping MS Word code from my forms once and for all.

Réponse
 
LinkBack Outils de la discussion
Vieux 15/09/2007, 17h43   #1
FFMG
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Stripping MS Word code from my forms once and for all.


Hi,

I have a form that allows users to comment, add entries and so on.
But what a lot of them do is copy and paste directly from MS Word to my
forms.

almost all browsers will accept the post and give the impression that
everything is saved properly.

But, that is not the case when it comes time to displaying the message
in my page.

So how can I strip/replace all the MS Word invalid code from my
$_POSTs?

Thanks

FFMG


--

'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318

Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).

  Réponse avec citation
Vieux 16/09/2007, 01h48   #2
macca
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Stripping MS Word code from my forms once and for all.

I found this on php.net at http://uk2.php.net/strtr which may be of
some :




After battling with strtr trying to strip out MS word formatting from
things pasted into forms I ended up coming up with this..

it strips ALL non-standard ascii characters, preserving html codes and
such, but gets rid of all the characters that refuse to show in
firefox.

If you look at this page in firefox you will see a ton of "question
mark" characters and so it is not possible to copy and paste those to
remove them from strings.. (this fixes that issue nicely, though I
admit it could be done a bit better)

<?
function fixoutput($str){
$good[] = 9; #tab
$good[] = 10; #nl
$good[] = 13; #cr
for($a=32;$a<127;$a++){
$good[] = $a;
}
$len = strlen($str);
for($b=0;$b < $len+1; $b++){
if(in_array(ord($str[$b]), $good)){
$newstr .= $str[$b];
}//fi
}//rof
return $newstr;
}
?>

  Réponse avec citation
Vieux 16/09/2007, 03h48   #3
Sanders Kaufman
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Stripping MS Word code from my forms once and for all.

FFMG wrote:

> So how can I strip/replace all the MS Word invalid code from my
> $_POSTs?


I presume you're referring to all the MS Office XML markup.
That's actually good stuff, sometimes.

What you need to do is read the document as an XML file, then all the MS
crap will make sense... and more importantly, be easily stripped away.

Before you strip it away though, you might want to go through it because
you might find that some of the document properties are useful to your
application.
  Réponse avec citation
Vieux 16/09/2007, 15h19   #4
FFMG
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Stripping MS Word code from my forms once and for all.


Sanders Kaufman;92056 Wrote:
> FFMG wrote:
>
> > So how can I strip/replace all the MS Word invalid code from my
> > $_POSTs?

>
> I presume you're referring to all the MS Office XML markup.
> That's actually good stuff, sometimes.
>


No, sorry I was actually talking about some non standard characters
that MS Words inserts.

Some bowser will, (maybe wrongly), not display any invalid characters
in the textarea itself giving the user the impression that everything
is fine.

But when I then try to display the comment/entry I get a bunch of
questions marks for the characters that were invalid.

FFMG


--

'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318

Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).

  Réponse avec citation
Vieux 17/09/2007, 13h10   #5
Sanders Kaufman
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Stripping MS Word code from my forms once and for all.

FFMG wrote:
> Sanders Kaufman;92056 Wrote:
>> FFMG wrote:
>>
>>> So how can I strip/replace all the MS Word invalid code from my
>>> $_POSTs?

>> I presume you're referring to all the MS Office XML markup.
>> That's actually good stuff, sometimes.
>>

>
> No, sorry I was actually talking about some non standard characters
> that MS Words inserts.
>
> Some bowser will, (maybe wrongly), not display any invalid characters
> in the textarea itself giving the user the impression that everything
> is fine.
>
> But when I then try to display the comment/entry I get a bunch of
> questions marks for the characters that were invalid.


Ah, so. You're having a character set problem.
Rather than have a big old off-topic thread about it here, you should
probably take the question to an Office or HTML group.
PHP won't you much.
  Réponse avec citation
Vieux 17/09/2007, 14h09   #6
FFMG
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Stripping MS Word code from my forms once and for all.


Sanders Kaufman;92237 Wrote:
>
> > No, sorry I was actually talking about some non standard characters
> > that MS Words inserts.
> >
> > Some bowser will, (maybe wrongly), not display any invalid

> characters
> > in the textarea itself giving the user the impression that

> everything
> > is fine.
> >
> > But when I then try to display the comment/entry I get a bunch of
> > questions marks for the characters that were invalid.

>
> Ah, so. You're having a character set problem.
> Rather than have a big old off-topic thread about it here, you should
> probably take the question to an Office or HTML group.
> PHP won't you much.[/color]

No I am not, read the question again, carefully this time.
Textareas of most browsers will, (wrongly), accept MS Word pasted
code.

By the time it gets to my server I have to clean it up.
My PHP code must handle it.

Is that on topic enough for you?

FFMG


--

'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318

Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).

  Réponse avec citation
Vieux 18/09/2007, 01h51   #7
Jerry Stuckle
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Stripping MS Word code from my forms once and for all.

FFMG wrote:
> Sanders Kaufman;92237 Wrote:
>>> No, sorry I was actually talking about some non standard characters
>>> that MS Words inserts.
>>>
>>> Some bowser will, (maybe wrongly), not display any invalid

>> characters
>>> in the textarea itself giving the user the impression that

>> everything
>>> is fine.
>>>
>>> But when I then try to display the comment/entry I get a bunch of
>>> questions marks for the characters that were invalid.

>> Ah, so. You're having a character set problem.
>> Rather than have a big old off-topic thread about it here, you should
>> probably take the question to an Office or HTML group.
>> PHP won't you much.

>
> No I am not, read the question again, carefully this time.
> Textareas of most browsers will, (wrongly), accept MS Word pasted
> code.
>
> By the time it gets to my server I have to clean it up.
> My PHP code must handle it.
>
> Is that on topic enough for you?
>
> FFMG
>
>[/color]

Yes, this has been asked before - but I don't remember what the answer was.

The easiest way would be to check for non-alphanumeric chars using a
regex. If you find any, tell the user to use plain text editor.

You could use a regex to strip non-alphanumeric characters, but this
might have some problems. For instance, what happens if you have a
control sequence which happens to contain a character - i.e. 0x010231?
The 0x42 would be taken as the character '1', even though it's part of a
control sequence. But you could clean it up fairly well this way.

Try googling this newsgroup for something like "MS WORD". It's been a
few months.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
  Réponse avec citation
Vieux 18/09/2007, 03h46   #8
Sanders Kaufman
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Stripping MS Word code from my forms once and for all.

FFMG wrote:
> Sanders Kaufman;92237 Wrote:


>> Ah, so. You're having a character set problem.
>> Rather than have a big old off-topic thread about it here, you should
>> probably take the question to an Office or HTML group.
>> PHP won't you much.

>
> No I am not, read the question again, carefully this time.
> Textareas of most browsers will, (wrongly), accept MS Word pasted
> code.


There is nothing in the HTML specification requiring HTML to reject MS
Word, Open Office, or any other format. That would be a bug, not a feature.


> By the time it gets to my server I have to clean it up.
> My PHP code must handle it.
>
> Is that on topic enough for you?


No, and it won't likely be topic(al) enough for most of the other folks
here in the PHP group, either.

While you are indeed trying to process the data through PHP, you appear
to be perfectly capable of programming in PHP, and thus need very little
with PHP.

Instead, you need to identify the correct character set to use in
interpreting the Office document, and to apply that character set to the
data retrieved through the HTML FORM element.

That means that the you need is with Office and HTML, not PHP.
  Réponse avec citation
Vieux 18/09/2007, 12h04   #9
FFMG
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Stripping MS Word code from my forms once and for all.


Sanders Kaufman;92371 Wrote:
> FFMG wrote:
> > Sanders Kaufman;92237 Wrote:

>
> >> Ah, so. You're having a character set problem.
> >> Rather than have a big old off-topic thread about it here, you

> should
> >> probably take the question to an Office or HTML group.
> >> PHP won't you much.

> >
> > No I am not, read the question again, carefully this time.
> > Textareas of most browsers will, (wrongly), accept MS Word pasted
> > code.

>
> There is nothing in the HTML specification requiring HTML to reject MS
> Word, Open Office, or any other format. That would be a bug, not a
> feature.
>


Great, one more reason to strip MS Word characters.

Sanders Kaufman;92371 Wrote:
>
>
> > By the time it gets to my server I have to clean it up.
> > My PHP code must handle it.
> >
> > Is that on topic enough for you?

>
> No, and it won't likely be topic(al) enough for most of the other
> folks
> here in the PHP group, either.
>
> While you are indeed trying to process the data through PHP, you
> appear
> to be perfectly capable of programming in PHP, and thus need very
> little
> with PHP.
>
> Instead, you need to identify the correct character set to use in
> interpreting the Office document, and to apply that character set to
> the
> data retrieved through the HTML FORM element.
>
> That means that the you need is with Office and HTML, not PHP.


Well, I tend to disagree.
Because I am trying to process data in PHP I think that asking fellow
programmers on the PHP group for input is not as off-topic as you
think.

Is your suggestion to convert to an MS Office charset, (even if the
user did not use MS Word), and then convert it back as needed?
Would stripping the MS chars not be faster/better?

FFMG


--

'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318

Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).

  Réponse avec citation
Vieux 18/09/2007, 14h00   #10
Sanders Kaufman
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Stripping MS Word code from my forms once and for all.

FFMG wrote:
> Sanders Kaufman;92371 Wrote:


>> That means that the you need is with Office and HTML, not PHP.

>
> Well, I tend to disagree.
> Because I am trying to process data in PHP I think that asking fellow
> programmers on the PHP group for input is not as off-topic as you
> think.


How's that workin' out for ya, champ?
Have you noticed the roar of silence in response to your original request?

Seriously - you'll get a better response in an HTML or MS Office group.


> Is your suggestion to convert to an MS Office charset, (even if the
> user did not use MS Word), and then convert it back as needed?
> Would stripping the MS chars not be faster/better?


There are no such things as "MS characters" or an MS Office Character Set.
  Réponse avec citation
Vieux 22/09/2007, 15h00   #11
FFMG
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Stripping MS Word code from my forms once and for all.


Sanders Kaufman;92428 Wrote:
> FFMG wrote:
> > Sanders Kaufman;92371 Wrote:

>
> >> That means that the you need is with Office and HTML, not PHP.

> >
> > Well, I tend to disagree.
> > Because I am trying to process data in PHP I think that asking

> fellow
> > programmers on the PHP group for input is not as off-topic as you
> > think.

>
> How's that workin' out for ya, champ?
> ...
>


Read the thread, the answer was given.

I see you could not answer the question so you have to start using
abusive language.

Shame.

FFMG


--

'webmaster forum' (http://www.httppoint.com) | 'Free Blogs'
(http://www.journalhome.com/) | 'webmaster Directory'
(http://www.webhostshunter.com/)
'Recreation Vehicle insurance'
(http://www.insurance-owl.com/other/car_rec.php) | 'Free URL
redirection service' (http://urlkick.com/)
------------------------------------------------------------------------
FFMG's Profile: http://www.httppoint.com/member.php?userid=580
View this thread: http://www.httppoint.com/showthread.php?t=20318

Message Posted via the webmaster forum http://www.httppoint.com, (Ad revenue sharing).

  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 23h07.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,23078 seconds with 19 queries