PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > comp.lang.php > HTTP Request, character encoding and fsockopen
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
HTTP Request, character encoding and fsockopen

Réponse
 
LinkBack Outils de la discussion
Vieux 19/01/2008, 08h18   #1
Vladimir Ghetau
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut HTTP Request, character encoding and fsockopen

Hi guys,

This is a weird problem, and I'm not sure if I got it right.

Just a practical example, that will describe my problem:

I'm connecting to google.com host on port 80 using fsock open, and I
send a regular GET header without any specific HTTP headers regarding
the type of encoding accepted, , accepted charset, conditional
headers etc

What happens, is after sending the headers to this stream opened using
fsockopen, I start grabbing the headers, and then, comes the body of
the web page, everything seems logic until this point.

The problem is, just after the headers are received, the body of the
page, contains few odd alphanumeric values , about 4 elements in
length, and it seems it's a hexa value. e.g.. 2A, or two values
maybe: 8c9d... then comes the regular HTML code of the page if any.

At the end of the grabbed content, there's also one of these
alphanumeric groups, or a "0" (zero).

For some reason I tend to believe the characters right after the
headers are sent are used by browsers to identify the type of the
encoding of the stream? e.g. bytes that decide that my page is going
to come as UTF-8 encoding?

Anyways, the problem is, how to make sure I get the page right, and
why the file_Get_contents (url_goes_here) doesn't grab those
alphanumeric characters, considering they're stripping the returned
headers of the request already.

I am still thinking it's some sort of "stream's first byte" that
informs the app about the encoding of the content, but I'm here to
hear your input and solution on this.

Thank you,

Vladimir Ghetau

http://www.Vladimirated.com/
  Réponse avec citation
Vieux 19/01/2008, 08h56   #2
petersprc
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: HTTP Request, character encoding and fsockopen

Hi,

You could try using HTTP/1.0 or simply leaving off the HTTP version.

HTTP/1.1 clients must be able to handle "chunked transfer coding",
which is the encoding you're seeing. Each segment is preceded by it's
size in hex.

Details:

http://www.w3.org/Protocols/rfc2616/....html#sec3.6.1

Peace,
John Peters

On Jan 19, 3:18 am, Vladimir Ghetau <vladi...@pixeltomorrow.com>
wrote:
> Hi guys,
>
> This is a weird problem, and I'm not sure if I got it right.
>
> Just a practical example, that will describe my problem:
>
> I'm connecting to google.com host on port 80 using fsock open, and I
> send a regular GET header without any specific HTTP headers regarding
> the type of encoding accepted, , accepted charset, conditional
> headers etc
>
> What happens, is after sending the headers to this stream opened using
> fsockopen, I start grabbing the headers, and then, comes the body of
> the web page, everything seems logic until this point.
>
> The problem is, just after the headers are received, the body of the
> page, contains few odd alphanumeric values , about 4 elements in
> length, and it seems it's a hexa value. e.g.. 2A, or two values
> maybe: 8c9d... then comes the regular HTML code of the page if any.
>
> At the end of the grabbed content, there's also one of these
> alphanumeric groups, or a "0" (zero).
>
> For some reason I tend to believe the characters right after the
> headers are sent are used by browsers to identify the type of the
> encoding of the stream? e.g. bytes that decide that my page is going
> to come as UTF-8 encoding?
>
> Anyways, the problem is, how to make sure I get the page right, and
> why the file_Get_contents (url_goes_here) doesn't grab those
> alphanumeric characters, considering they're stripping the returned
> headers of the request already.
>
> I am still thinking it's some sort of "stream's first byte" that
> informs the app about the encoding of the content, but I'm here to
> hear your input and solution on this.
>
> Thank you,
>
> Vladimir Ghetau
>
> http://www.Vladimirated.com/


  Réponse avec citation
Vieux 19/01/2008, 19h09   #3
Manuel Lemos
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: HTTP Request, character encoding and fsockopen

Hello,

on 01/19/2008 06:18 AM Vladimir Ghetau said the following:
> I'm connecting to google.com host on port 80 using fsock open, and I
> send a regular GET header without any specific HTTP headers regarding
> the type of encoding accepted, , accepted charset, conditional
> headers etc
>
> What happens, is after sending the headers to this stream opened using
> fsockopen, I start grabbing the headers, and then, comes the body of
> the web page, everything seems logic until this point.
>
> The problem is, just after the headers are received, the body of the
> page, contains few odd alphanumeric values , about 4 elements in
> length, and it seems it's a hexa value. e.g.. 2A, or two values
> maybe: 8c9d... then comes the regular HTML code of the page if any.
>
> At the end of the grabbed content, there's also one of these
> alphanumeric groups, or a "0" (zero).
>
> For some reason I tend to believe the characters right after the
> headers are sent are used by browsers to identify the type of the
> encoding of the stream? e.g. bytes that decide that my page is going
> to come as UTF-8 encoding?
>
> Anyways, the problem is, how to make sure I get the page right, and
> why the file_Get_contents (url_goes_here) doesn't grab those
> alphanumeric characters, considering they're stripping the returned
> headers of the request already.
>
> I am still thinking it's some sort of "stream's first byte" that
> informs the app about the encoding of the content, but I'm here to
> hear your input and solution on this.


Those are chunked transfer encoding blocks. You need to decode and
assemble the blocks. They are useful to know when the server response
has ended for responses with unpredicted length, like for instance those
generated by dynamically generated pages with PHP.

You may want to take a look at this HTTP client class to learn how to
decode them:

http://www.phpclasses.org/httpclient


--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 07h39.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,10211 seconds with 11 queries