PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > comp.lang.php > Automated web browing
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
Automated web browing

Réponse
 
LinkBack Outils de la discussion
Vieux 17/01/2008, 21h52   #1
mr_marcin
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Automated web browing

Hi

Does anybody have some idea how to input some text into inputbox on
one page, than press some button on that page, that will load another
page, and finally read the responde? Suppose I want to write a price
comparision engine, where I would like to parse shops website for
price each time user wants.

I have found similar feature in Symfony framework, called sfBrowser
(or sfTestBrowser). These are made for automated functional testing,
but should provide the functinality I am requesting.

The question is: will this be efficient enough? Maybe there are other
ways to achieve this? Of course I can always try to make it more
manually - look for some pattern in url (search is usually done via
GET), and parse output html.

Thanks for
Marcin
  Réponse avec citation
Vieux 17/01/2008, 22h09   #2
Manuel Lemos
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

on 01/17/2008 07:52 PM mr_marcin said the following:
> Hi
>
> Does anybody have some idea how to input some text into inputbox on
> one page, than press some button on that page, that will load another
> page, and finally read the responde? Suppose I want to write a price
> comparision engine, where I would like to parse shops website for
> price each time user wants.
>
> I have found similar feature in Symfony framework, called sfBrowser
> (or sfTestBrowser). These are made for automated functional testing,
> but should provide the functinality I am requesting.
>
> The question is: will this be efficient enough? Maybe there are other
> ways to achieve this? Of course I can always try to make it more
> manually - look for some pattern in url (search is usually done via
> GET), and parse output html.


You may want to try this HTTP client class. Basically it acts like a
browser accessing pages, submitting forms, collecting , handling
redirection, etc. which seems what you need to retrieve the pages with
the prices you want to grab.

http://www.phpclasses.org/httpclient


--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
  Réponse avec citation
Vieux 18/01/2008, 00h15   #3
Jerry Stuckle
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

mr_marcin wrote:
> Hi
>
> Does anybody have some idea how to input some text into inputbox on
> one page, than press some button on that page, that will load another
> page, and finally read the responde? Suppose I want to write a price
> comparision engine, where I would like to parse shops website for
> price each time user wants.
>
> I have found similar feature in Symfony framework, called sfBrowser
> (or sfTestBrowser). These are made for automated functional testing,
> but should provide the functinality I am requesting.
>
> The question is: will this be efficient enough? Maybe there are other
> ways to achieve this? Of course I can always try to make it more
> manually - look for some pattern in url (search is usually done via
> GET), and parse output html.
>
> Thanks for
> Marcin
>


cURL will allow you to get or post to pages, and will return the data.
I much prefer it over the HTTPClient class. It's more flexible.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

  Réponse avec citation
Vieux 18/01/2008, 09h42   #4
Marlin Forbes
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

mr_marcin wrote:
> Hi
>
> Does anybody have some idea how to input some text into inputbox on
> one page, than press some button on that page, that will load another
> page, and finally read the responde? Suppose I want to write a price
> comparision engine, where I would like to parse shops website for
> price each time user wants.


Hi there,

SimpleTest has a class included called SimpleBrowser, which does what
you want, with a very intuitive API. It's not too fast tho...

SimpleTest: http://www.lastcraft.com/simple_test.php

Or, you can interactively setup browsing sessions with the Selenium IDE
and then use the PHP client for the Selenium Remote Control to run them...

Selenium IDE: http://www.openqa.org/selenium-ide/
Selenium RC: http://www.openqa.org/selenium-rc/
PHP Client for Selenium: http://pear.php.net/package/Testing_Selenium

Misc:
http://blog.thinkphp.de/archives/133...-Selenium.html


Regards,
Marlin Forbes
Freelance Developer
Data Shaman
datashaman.com
+27 (0)82 501-6647
  Réponse avec citation
Vieux 18/01/2008, 12h22   #5
mr_marcin
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

> Or, you can interactively setup browsing sessions with the Selenium IDE
> and then use the PHP client for the Selenium Remote Control to run them...
>
> Selenium IDE:http://www.openqa.org/selenium-ide/
> Selenium RC:http://www.openqa.org/selenium-rc/
> PHP Client for Selenium:http://pear.php.net/package/Testing_Selenium


This sounds like a quite easy to use package, but will this be
efficient enough? I will check all options next week.
  Réponse avec citation
Vieux 18/01/2008, 12h23   #6
mr_marcin
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

> cURL will allow you to get or post to pages, and will return the data.
> I much prefer it over the HTTPClient class. It's more flexible.
>


I guess this approach requires some manual job, but you are right -
thats the most flexible and probably most effective way.
  Réponse avec citation
Vieux 18/01/2008, 13h09   #7
R. Rajesh Jeba Anbiah
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

On Jan 18, 2:52 am, mr_marcin <mar...@cme.pl> wrote:
> Hi
>
> Does anybody have some idea how to input some text into inputbox on
> one page, than press some button on that page, that will load another
> page, and finally read the responde? Suppose I want to write a price
> comparision engine, where I would like to parse shops website for
> price each time user wants.
>
> I have found similar feature in Symfony framework, called sfBrowser
> (or sfTestBrowser). These are made for automated functional testing,
> but should provide the functinality I am requesting.
>
> The question is: will this be efficient enough? Maybe there are other
> ways to achieve this? Of course I can always try to make it more
> manually - look for some pattern in url (search is usually done via
> GET), and parse output html.


1. If you're looking for client tools http://www.iopus.com/imacros/firefox/
2. Web scraping with cURL or HTTPClient class
3. Look for the Web services (SOAP, XML, etc)

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/
  Réponse avec citation
Vieux 18/01/2008, 16h40   #8
Manuel Lemos
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

Hello,

on 01/17/2008 10:15 PM Jerry Stuckle said the following:
>> Does anybody have some idea how to input some text into inputbox on
>> one page, than press some button on that page, that will load another
>> page, and finally read the responde? Suppose I want to write a price
>> comparision engine, where I would like to parse shops website for
>> price each time user wants.
>>
>> I have found similar feature in Symfony framework, called sfBrowser
>> (or sfTestBrowser). These are made for automated functional testing,
>> but should provide the functinality I am requesting.
>>
>> The question is: will this be efficient enough? Maybe there are other
>> ways to achieve this? Of course I can always try to make it more
>> manually - look for some pattern in url (search is usually done via
>> GET), and parse output html.
>>
>> Thanks for
>> Marcin
>>

>
> cURL will allow you to get or post to pages, and will return the data. I
> much prefer it over the HTTPClient class. It's more flexible.


I wonder which HTTP client you are talking about. The HTTP client I
mentioned wraps around Curl or socket functions depending on which is
more convinient to use in each PHP setup. This is the HTTP client class
I meant:

http://www.phpclasses.org/httpclient

As for Curl being flexible, I wonder what you are talking about.

Personally I find it very odd that you cannot read retrieved pages with
Curl in small chunks at a time without having to use callbacks. This is
bad because it makes very difficult to retrieve and process large pages
without using external files nor exceeding the PHP memory limits.

--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
  Réponse avec citation
Vieux 19/01/2008, 01h46   #9
Jerry Stuckle
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

Manuel Lemos wrote:
> Hello,
>
> on 01/17/2008 10:15 PM Jerry Stuckle said the following:
>>> Does anybody have some idea how to input some text into inputbox on
>>> one page, than press some button on that page, that will load another
>>> page, and finally read the responde? Suppose I want to write a price
>>> comparision engine, where I would like to parse shops website for
>>> price each time user wants.
>>>
>>> I have found similar feature in Symfony framework, called sfBrowser
>>> (or sfTestBrowser). These are made for automated functional testing,
>>> but should provide the functinality I am requesting.
>>>
>>> The question is: will this be efficient enough? Maybe there are other
>>> ways to achieve this? Of course I can always try to make it more
>>> manually - look for some pattern in url (search is usually done via
>>> GET), and parse output html.
>>>
>>> Thanks for
>>> Marcin
>>>

>> cURL will allow you to get or post to pages, and will return the data. I
>> much prefer it over the HTTPClient class. It's more flexible.

>
> I wonder which HTTP client you are talking about. The HTTP client I
> mentioned wraps around Curl or socket functions depending on which is
> more convinient to use in each PHP setup. This is the HTTP client class
> I meant:
>
> http://www.phpclasses.org/httpclient
>


The same one.

> As for Curl being flexible, I wonder what you are talking about.
>


I can do virtually anything with it that I can do with a browser, with
the exception of client side scripting. Also much less overhead than
the httpclient class.

> Personally I find it very odd that you cannot read retrieved pages with
> Curl in small chunks at a time without having to use callbacks. This is
> bad because it makes very difficult to retrieve and process large pages
> without using external files nor exceeding the PHP memory limits.
>


So? I never needed to. First of all, I have no need to retrieve huge
pages. The larges I've ever downloaded (a table with lots of info) was
a little over 3MB and Curl and PHP handled it just fine.

But if the text were split, you need to do additional processing to
handle splits at inconvenient locations. Much easier to add everything
to a temporary file and read it back in the way I need to so it.

But that's one of the advantages of cURL - it gives me the option of
doing the callbacks or not.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

  Réponse avec citation
Vieux 19/01/2008, 03h17   #10
Manuel Lemos
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

Hello,

on 01/18/2008 11:46 PM Jerry Stuckle said the following:
>>>> Does anybody have some idea how to input some text into inputbox on
>>>> one page, than press some button on that page, that will load another
>>>> page, and finally read the responde? Suppose I want to write a price
>>>> comparision engine, where I would like to parse shops website for
>>>> price each time user wants.
>>>>
>>>> I have found similar feature in Symfony framework, called sfBrowser
>>>> (or sfTestBrowser). These are made for automated functional testing,
>>>> but should provide the functinality I am requesting.
>>>>
>>>> The question is: will this be efficient enough? Maybe there are other
>>>> ways to achieve this? Of course I can always try to make it more
>>>> manually - look for some pattern in url (search is usually done via
>>>> GET), and parse output html.
>>>>
>>>> Thanks for
>>>> Marcin
>>>>
>>> cURL will allow you to get or post to pages, and will return the data. I
>>> much prefer it over the HTTPClient class. It's more flexible.

>>
>> I wonder which HTTP client you are talking about. The HTTP client I
>> mentioned wraps around Curl or socket functions depending on which is
>> more convinient to use in each PHP setup. This is the HTTP client class
>> I meant:
>>
>> http://www.phpclasses.org/httpclient
>>

>
> The same one.
>
>> As for Curl being flexible, I wonder what you are talking about.
>>

>
> I can do virtually anything with it that I can do with a browser, with
> the exception of client side scripting. Also much less overhead than
> the httpclient class.


In practice the real overhead is in the network access.

Anyway, as I mentioned above the HTTP client class uses curl library
functions for SSL if you are running an older version than PHP 4.3.0.
From PHP 4.3.0 with OpenSSL enabled it uses PHP fsockopen, fread, fwrite
functions.

If your hosting company does not have Curl enabled, at least with the
HTTP client class you are not stuck. I think this is more flexible than
relying on curl library availability.


>> Personally I find it very odd that you cannot read retrieved pages with
>> Curl in small chunks at a time without having to use callbacks. This is
>> bad because it makes very difficult to retrieve and process large pages
>> without using external files nor exceeding the PHP memory limits.
>>

>
> So? I never needed to. First of all, I have no need to retrieve huge
> pages. The larges I've ever downloaded (a table with lots of info) was
> a little over 3MB and Curl and PHP handled it just fine.


That is because 3MB is below the PHP 8MB limits. You are talking
specifically of your needs. People with higher needs will not be able to
handle it with Curl functions.


> But if the text were split, you need to do additional processing to
> handle splits at inconvenient locations. Much easier to add everything
> to a temporary file and read it back in the way I need to so it.
>
> But that's one of the advantages of cURL - it gives me the option of
> doing the callbacks or not.


With the HTTP client class you do not need callbacks. You just need to
read response in small chunks and process them on demand.

The ability to stream data in limited size chunks is not a less
important feature. For instance, Cesar Rodas used the HTTP client class
wrote a cool stream wrapper class that lets you store and retrieve files
of any size in Amazon S3 service:

http://www.phpclasses.org/gs3

Same thing for SVN client stream wrapper:

http://www.phpclasses.org/svnclient

Another interesting use of the stream wrapper streaming capabilities is
the Print IPP class. It lets you print any documents sending them
directly to a networked printer. IPP is a protocol that works on top of
HTTP. IPP is the protocol used by CUPS (printing system for Linux and
Unix systems). Nowadays there are many networked printers (especially
the wireless ones) that have IPP support built-in.

http://www.phpclasses.org/printipp

Anyway, streaming capabilities is just one feature that the HTTP client
class provides flexibility.

The HTTP client was not developed to compete with the curl functions,
but rather to provide a solution that complements the curl HTTP access
or even replace it when it is not enabled.

If you browse the HTTP client class forum, you may find people that had
difficulties when they tried the curl library functions but they succeed
with the HTTP client class.

http://www.phpclasses.org/discuss/package/3/

Maybe it is not your case now, but maybe one day you will stumble in one
of those difficulties that prevents you from using curl functions. In
that case feel free to use the HTTP client class. ;-)

--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
  Réponse avec citation
Vieux 19/01/2008, 03h28   #11
Jerry Stuckle
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

Manuel Lemos wrote:
> Hello,
>
> on 01/18/2008 11:46 PM Jerry Stuckle said the following:
>>>>> Does anybody have some idea how to input some text into inputbox on
>>>>> one page, than press some button on that page, that will load another
>>>>> page, and finally read the responde? Suppose I want to write a price
>>>>> comparision engine, where I would like to parse shops website for
>>>>> price each time user wants.
>>>>>
>>>>> I have found similar feature in Symfony framework, called sfBrowser
>>>>> (or sfTestBrowser). These are made for automated functional testing,
>>>>> but should provide the functinality I am requesting.
>>>>>
>>>>> The question is: will this be efficient enough? Maybe there are other
>>>>> ways to achieve this? Of course I can always try to make it more
>>>>> manually - look for some pattern in url (search is usually done via
>>>>> GET), and parse output html.
>>>>>
>>>>> Thanks for
>>>>> Marcin
>>>>>
>>>> cURL will allow you to get or post to pages, and will return the data. I
>>>> much prefer it over the HTTPClient class. It's more flexible.
>>> I wonder which HTTP client you are talking about. The HTTP client I
>>> mentioned wraps around Curl or socket functions depending on which is
>>> more convinient to use in each PHP setup. This is the HTTP client class
>>> I meant:
>>>
>>> http://www.phpclasses.org/httpclient
>>>

>> The same one.
>>
>>> As for Curl being flexible, I wonder what you are talking about.
>>>

>> I can do virtually anything with it that I can do with a browser, with
>> the exception of client side scripting. Also much less overhead than
>> the httpclient class.

>
> In practice the real overhead is in the network access.
>
> Anyway, as I mentioned above the HTTP client class uses curl library
> functions for SSL if you are running an older version than PHP 4.3.0.
> From PHP 4.3.0 with OpenSSL enabled it uses PHP fsockopen, fread, fwrite
> functions.
>


Which means it has move overhead than using cURL directly. It's another
layer on top of cURL.

> If your hosting company does not have Curl enabled, at least with the
> HTTP client class you are not stuck. I think this is more flexible than
> relying on curl library availability.
>


I only use VPS's and dedicated servers. But even when I was using
shared hosting, I was able to find hosting companies who either had cURL
enabled or would do it for you.

OTOH, I've found more who won't allow fsockopen() than cURL.

But either way, if your hosting company won't provide what you need,
there's an easy answer.

>
>>> Personally I find it very odd that you cannot read retrieved pages with
>>> Curl in small chunks at a time without having to use callbacks. This is
>>> bad because it makes very difficult to retrieve and process large pages
>>> without using external files nor exceeding the PHP memory limits.
>>>

>> So? I never needed to. First of all, I have no need to retrieve huge
>> pages. The larges I've ever downloaded (a table with lots of info) was
>> a little over 3MB and Curl and PHP handled it just fine.

>
> That is because 3MB is below the PHP 8MB limits. You are talking
> specifically of your needs. People with higher needs will not be able to
> handle it with Curl functions.
>


Exactly how many pages do you know which are larger than 8MB? And BTW -
8MB is only the default. On some servers where I have customers with
needs for large amounts of data, I raise it as high as 128 MB.

But again - you can do it with even 1MB by providing the appropriate
callback functions. And it's not hard at all to do.

>
>> But if the text were split, you need to do additional processing to
>> handle splits at inconvenient locations. Much easier to add everything
>> to a temporary file and read it back in the way I need to so it.
>>
>> But that's one of the advantages of cURL - it gives me the option of
>> doing the callbacks or not.

>
> With the HTTP client class you do not need callbacks. You just need to
> read response in small chunks and process them on demand.
>


So - what's the problem with callbacks? They're quick and easy. And
they give you much more control over what's going on.

For instance - you may not be interested in everything. It's very easy
for the callback to throw away what you don't want. You can't do that
with the HTTP client class.

> The ability to stream data in limited size chunks is not a less
> important feature. For instance, Cesar Rodas used the HTTP client class
> wrote a cool stream wrapper class that lets you store and retrieve files
> of any size in Amazon S3 service:
>
> http://www.phpclasses.org/gs3
>
> Same thing for SVN client stream wrapper:
>
> http://www.phpclasses.org/svnclient
>
> Another interesting use of the stream wrapper streaming capabilities is
> the Print IPP class. It lets you print any documents sending them
> directly to a networked printer. IPP is a protocol that works on top of
> HTTP. IPP is the protocol used by CUPS (printing system for Linux and
> Unix systems). Nowadays there are many networked printers (especially
> the wireless ones) that have IPP support built-in.
>
> http://www.phpclasses.org/printipp
>


Which has absolutely nothing to do with this conversation. Please limit
your comments to the topic at hand.


> Anyway, streaming capabilities is just one feature that the HTTP client
> class provides flexibility.
>


No problem with that. But it is still less flexible than cURL.

> The HTTP client was not developed to compete with the curl functions,
> but rather to provide a solution that complements the curl HTTP access
> or even replace it when it is not enabled.
>


Fine. No problem. My only comment was that I prefer cURL because it is
more flexible. You challenged that. Now you're arguing completely
different topics to try to "prove" that the httpclient class is "better".

> If you browse the HTTP client class forum, you may find people that had
> difficulties when they tried the curl library functions but they succeed
> with the HTTP client class.
>
> http://www.phpclasses.org/discuss/package/3/
>


Sure. And there are people who have had problems with the httpclient
class and found the cURL functions work. That proves nothing.

> Maybe it is not your case now, but maybe one day you will stumble in one
> of those difficulties that prevents you from using curl functions. In
> that case feel free to use the HTTP client class. ;-)
>


Nope. I've tried the httpclient class. I find it too limiting with
excessive overhead for my needs.

But as I said above - you tell me they don't compete. But then you keep
trying to tell my how the httpclient class is "better". Which is it?

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

  Réponse avec citation
Vieux 19/01/2008, 20h32   #12
Manuel Lemos
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

Hello,

on 01/19/2008 01:28 AM Jerry Stuckle said the following:
>>>> As for Curl being flexible, I wonder what you are talking about.
>>>>
>>> I can do virtually anything with it that I can do with a browser, with
>>> the exception of client side scripting. Also much less overhead than
>>> the httpclient class.

>>
>> In practice the real overhead is in the network access.
>>
>> Anyway, as I mentioned above the HTTP client class uses curl library
>> functions for SSL if you are running an older version than PHP 4.3.0.
>> From PHP 4.3.0 with OpenSSL enabled it uses PHP fsockopen, fread, fwrite
>> functions.
>>

>
> Which means it has move overhead than using cURL directly. It's another
> layer on top of cURL.


If you mean the class PHP code execution overhead, that is negligible.
What is a few microseconds executing PHP code when you have to wait
seconds for the data to be sent or received from remote Web servers?


>> If your hosting company does not have Curl enabled, at least with the
>> HTTP client class you are not stuck. I think this is more flexible than
>> relying on curl library availability.
>>

>
> I only use VPS's and dedicated servers. But even when I was using
> shared hosting, I was able to find hosting companies who either had cURL
> enabled or would do it for you.


I found users complaining in the HTTP client class forum that they could
not use the curl library functions in their PHP setup.


> OTOH, I've found more who won't allow fsockopen() than cURL.


That is another aspect that using the HTTP client class is more
flexible. If curl support is missing, the class will use fsockopen and
vice-versa.



> But either way, if your hosting company won't provide what you need,
> there's an easy answer.


Many developers do not have a choise of hosting company because it is up
to their clients to decide and often they do not want to move.



>>>> Personally I find it very odd that you cannot read retrieved pages with
>>>> Curl in small chunks at a time without having to use callbacks. This is
>>>> bad because it makes very difficult to retrieve and process large pages
>>>> without using external files nor exceeding the PHP memory limits.
>>>>
>>> So? I never needed to. First of all, I have no need to retrieve huge
>>> pages. The larges I've ever downloaded (a table with lots of info) was
>>> a little over 3MB and Curl and PHP handled it just fine.

>>
>> That is because 3MB is below the PHP 8MB limits. You are talking
>> specifically of your needs. People with higher needs will not be able to
>> handle it with Curl functions.
>>

>
> Exactly how many pages do you know which are larger than 8MB? And BTW -


It is very easy to find people that need to download or upload files via
HTTP that are larger than 8MB.


> 8MB is only the default. On some servers where I have customers with
> needs for large amounts of data, I raise it as high as 128 MB.


Many shared hosting clients cannot change php.ini options.



> But again - you can do it with even 1MB by providing the appropriate
> callback functions. And it's not hard at all to do.


I wonder if you really tried using callbacks to stream data to send or
receive from the HTTP server using callbacks.

Last time that I tried it seems your callbacks have to manually craft
HTTP requests and interpret raw HTTP responses, basically implement an
HTTP client inside the callback functions. It seemed that you would have
to know the whole HTTP protocol to sort the data you need to send or
receive.

Basically that is what the HTTP client class does without requiring that
you learn and implement the HTTP protocol by hand.


>>> But if the text were split, you need to do additional processing to
>>> handle splits at inconvenient locations. Much easier to add everything
>>> to a temporary file and read it back in the way I need to so it.
>>>
>>> But that's one of the advantages of cURL - it gives me the option of
>>> doing the callbacks or not.

>>
>> With the HTTP client class you do not need callbacks. You just need to
>> read response in small chunks and process them on demand.
>>

>
> So - what's the problem with callbacks? They're quick and easy. And
> they give you much more control over what's going on.


Other than the complexity of dealing with raw HTTP data, the main
problem that I see is that callbacks do not pass control to your
application. You need to do something with the data and return control
to the curl library.

For instance, if you want to download a large data block retrieved with
one HTTP request, and then upload it to another server with another HTTP
request, it does not seem you can do it passing small chunks of data
using curl callbacks.


> For instance - you may not be interested in everything. It's very easy
> for the callback to throw away what you don't want. You can't do that
> with the HTTP client class.


I do not want to deal with raw HTTP protocol data. I developed the class
precisely for it to do that for me.

If callbacks were useful for me, I would have added support in the class
to invoke whatever callback functions.


>> The ability to stream data in limited size chunks is not a less
>> important feature. For instance, Cesar Rodas used the HTTP client class
>> wrote a cool stream wrapper class that lets you store and retrieve files
>> of any size in Amazon S3 service:
>>
>> http://www.phpclasses.org/gs3
>>
>> Same thing for SVN client stream wrapper:
>>
>> http://www.phpclasses.org/svnclient
>>
>> Another interesting use of the stream wrapper streaming capabilities is
>> the Print IPP class. It lets you print any documents sending them
>> directly to a networked printer. IPP is a protocol that works on top of
>> HTTP. IPP is the protocol used by CUPS (printing system for Linux and
>> Unix systems). Nowadays there are many networked printers (especially
>> the wireless ones) that have IPP support built-in.
>>
>> http://www.phpclasses.org/printipp
>>

>
> Which has absolutely nothing to do with this conversation. Please limit
> your comments to the topic at hand.


On the contrary, this has all to do with what I am explaining to you.

For instance, with the classes above that use the HTTP client class
streaming capabilities, you copy large files without exceeding your PHP
memory limits just using this:

copy('svn://server/file', 's3:/bucket/file');


>> The HTTP client was not developed to compete with the curl functions,
>> but rather to provide a solution that complements the curl HTTP access
>> or even replace it when it is not enabled.
>>

>
> Fine. No problem. My only comment was that I prefer cURL because it is
> more flexible. You challenged that. Now you're arguing completely
> different topics to try to "prove" that the httpclient class is "better".


Jerry, relax. There seems to be a misunderstanding here. I did not
challenge you. I was just curious to know what relevant issues did you
find it more flexible to use curl than the HTTP client class.

The class has evolved according to the needs of users that found
limitations on it and told me about it. So I wanted to understand what
you are talking about.

So far you keep telling me about that curl is more flexible, but I have
yet to see where is the flexibility.


>> If you browse the HTTP client class forum, you may find people that had
>> difficulties when they tried the curl library functions but they succeed
>> with the HTTP client class.
>>
>> http://www.phpclasses.org/discuss/package/3/
>>

>
> Sure. And there are people who have had problems with the httpclient
> class and found the cURL functions work. That proves nothing.


Like for instance?

Please understand that I am not here to prove anything, even less to
compete with your arguments.

I just want to learn which are the relevant limitations that people have
found in the HTTP client, so I can work on them. That is ful for me
because addressing other people's needs I will be eventually addressing
my own needs, if not present, at least future.

--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
  Réponse avec citation
Vieux 19/01/2008, 20h52   #13
Jerry Stuckle
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

Manuel Lemos wrote:

<snip junk>

Manuel,

I'm not going to argue with you about whether the HTTPClass is easier to
use or whatever.

My single point was that cURL is more flexible. You can do anything
with cURL that you can with the HTTPClient class and more. That is
pretty obvious - because the HTTPClient class is built on cURL - so if
cURL can't do it, neither can the HTTPClient class.

But being built on cURL, the HTTPClient class restricts what you can do.
So it is less flexible.

You can sit there and argue all you want as to the other merits of your
class. I won't bite. Because that was not my point.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

  Réponse avec citation
Vieux 19/01/2008, 21h06   #14
Gary L. Burnore
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

On Sat, 19 Jan 2008 15:52:42 -0500, Jerry Stuckle
<jstucklex@attglobal.net> wrote:

>Manuel Lemos wrote:
>
><snip junk>
>
>Manuel,
>
>I'm not going to argue with you about whether the HTTPClass is easier to
>use or whatever.
>
>My single point was that cURL is more flexible. You can do anything
>with cURL that you can with the HTTPClient class and more. That is
>pretty obvious - because the HTTPClient class is built on cURL


Why would you build a class based on something if it doesn't do more
or make it easier to do whatever it's based on does?

Based on your argument, he should use C instead of curl because it's
more flexible. In fact, he should use C because it's more flexible
than PHP.
--
gburnore at DataBasix dot Com
---------------------------------------------------------------------------
How you look depends on where you go.
---------------------------------------------------------------------------
Gary L. Burnore | ÝÛ³ºÝ³Þ³ºÝ³³Ýۺݳ޳ºÝ³Ý³Þ³ºÝ³ÝÝÛ³
| ÝÛ³ºÝ³Þ³ºÝ³³Ýۺݳ޳ºÝ³Ý³Þ³ºÝ³ÝÝÛ³
Official .sig, Accept no substitutes. | ÝÛ³ºÝ³Þ³ºÝ³³Ýۺݳ޳ºÝ³Ý³Þ³ºÝ³ÝÝÛ³
| ÝÛ 0 1 7 2 3 / Ý³Þ 3 7 4 9 3 0 Û³
Black Helicopter Repair Services, Ltd.| Official Proof of Purchase
================================================== =========================
  Réponse avec citation
Vieux 19/01/2008, 21h09   #15
Manuel Lemos
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

Hello,

on 01/19/2008 06:52 PM Jerry Stuckle said the following:
> I'm not going to argue with you about whether the HTTPClass is easier to
> use or whatever.
>
> My single point was that cURL is more flexible. You can do anything
> with cURL that you can with the HTTPClient class and more. That is
> pretty obvious - because the HTTPClient class is built on cURL - so if
> cURL can't do it, neither can the HTTPClient class.
>
> But being built on cURL, the HTTPClient class restricts what you can do.
> So it is less flexible.


No, that is not the way it works. I already explained that to you.

The HTTP client class uses Curl when fsockopen calls cannot be used
under the current PHP setup. Curl is used as a better than nothing solution.

For instance before PHP 4.3.0 you can only make SSL request with curl.
The class used curl for SSL requests, but of course, with curl it cannot
not send or receive streamed data in small chunks that never exceed the
PHP memory limits.

If you want that flexibility you need to use PHP 4.3.0 or newer. Then
the class will use fsockopen for SSL requests.

In any case, the HTTP client class abstracts that for you. You do not
need to adapt your application code depending on the PHP version, as the
class does it for you.

I developed the HTTP client class not just as a mere curl wrapper, but
to actually add some benefits on top of curl/fsockopen. So, it was meant
to add flexibility, not to remove it.

That is why I questioned you about you flexibility statement. Maybe you
tried an old version of the HTTP client class and you found some
limitations that no longer exist. But if you still find it less
flexible, I want to understand what you are talking about.



--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
  Réponse avec citation
Vieux 20/01/2008, 03h56   #16
Jerry Stuckle
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

Manuel Lemos wrote:
> Hello,
>
> on 01/19/2008 06:52 PM Jerry Stuckle said the following:
>> I'm not going to argue with you about whether the HTTPClass is easier to
>> use or whatever.
>>
>> My single point was that cURL is more flexible. You can do anything
>> with cURL that you can with the HTTPClient class and more. That is
>> pretty obvious - because the HTTPClient class is built on cURL - so if
>> cURL can't do it, neither can the HTTPClient class.
>>
>> But being built on cURL, the HTTPClient class restricts what you can do.
>> So it is less flexible.

>
> No, that is not the way it works. I already explained that to you.
>
> The HTTP client class uses Curl when fsockopen calls cannot be used
> under the current PHP setup. Curl is used as a better than nothing solution.
>
> For instance before PHP 4.3.0 you can only make SSL request with curl.
> The class used curl for SSL requests, but of course, with curl it cannot
> not send or receive streamed data in small chunks that never exceed the
> PHP memory limits.
>
> If you want that flexibility you need to use PHP 4.3.0 or newer. Then
> the class will use fsockopen for SSL requests.
>
> In any case, the HTTP client class abstracts that for you. You do not
> need to adapt your application code depending on the PHP version, as the
> class does it for you.
>
> I developed the HTTP client class not just as a mere curl wrapper, but
> to actually add some benefits on top of curl/fsockopen. So, it was meant
> to add flexibility, not to remove it.
>
> That is why I questioned you about you flexibility statement. Maybe you
> tried an old version of the HTTP client class and you found some
> limitations that no longer exist. But if you still find it less
> flexible, I want to understand what you are talking about.
>
>
>


Manuel,

You are obviously not able to step back and take an objective look at
your classes. I have tried to discuss much of this with you previously,
but you have consistently argued about unrelated things.

I really don't feel like continuing this argument. Please let me know
when you can look at it objectively, and I will be happy to *discuss* it
with you.

I will continue to recommend cURL for the reasons I have outlined. The
difference is I have no relationship with cURL, other than as a user of
the library.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

  Réponse avec citation
Vieux 20/01/2008, 12h10   #17
Paul Lautman
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

mr_marcin wrote:
> Hi
>
> Does anybody have some idea how to input some text into inputbox on
> one page, than press some button on that page, that will load another
> page, and finally read the responde? Suppose I want to write a price
> comparision engine, where I would like to parse shops website for
> price each time user wants.
>
> I have found similar feature in Symfony framework, called sfBrowser
> (or sfTestBrowser). These are made for automated functional testing,
> but should provide the functinality I am requesting.
>
> The question is: will this be efficient enough? Maybe there are other
> ways to achieve this? Of course I can always try to make it more
> manually - look for some pattern in url (search is usually done via
> GET), and parse output html.
>
> Thanks for
> Marcin


Take a look at Snoopy
http://sourceforge.net/project/showf...?group_id=2091


  Réponse avec citation
Vieux 20/01/2008, 21h09   #18
Mark A. Boyd
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

Jerry Stuckle <jstucklex@attglobal.net> posted in comp.lang.php:

> Manuel Lemos wrote:
>>
>> I developed the HTTP client class not just as a mere curl wrapper, but
>> to actually add some benefits on top of curl/fsockopen. So, it was
>> meant to add flexibility, not to remove it.
>>
>> That is why I questioned you about you flexibility statement. Maybe you
>> tried an old version of the HTTP client class and you found some
>> limitations that no longer exist. But if you still find it less
>> flexible, I want to understand what you are talking about.
>>
>>

>
> Manuel,
>
> You are obviously not able to step back and take an objective look at
> your classes. I have tried to discuss much of this with you previously,
> but you have consistently argued about unrelated things.
>
> I really don't feel like continuing this argument. Please let me know
> when you can look at it objectively, and I will be happy to *discuss* it
> with you.
>
> I will continue to recommend cURL for the reasons I have outlined. The
> difference is I have no relationship with cURL, other than as a user of
> the library.


PMFBI, but I don't understand why you don't see the flexibility offered by
Manuel's class.

I suspect that most of us are not hired at the consultant level, but as
application/Web developers. As such, we likely have much less influence over
our clients' decisions about Web server configurations. If a client is paying
consultant fees for advice, I would think they are more willing to listen to
such advice. When hiring/contracting a Web developer, they are more apt to
set the requirements that he/she must comply with.

So, if one can build a library of reusable code via the httpClient class that
works with/without cURL, well, isn't that flexible?

Or are you suggesting that one should develope this library with code to
handle either situation oneself? If you were to do that, would you create two
separate libraries of code or would you create a class that can handle either
situation? (Ponder this question as a developer, not as a higher-paid
consultant.)

Or would you simply turn down jobs that cannot use cURL?


(Note: I've only used cURL myself, but then I only work on our own sites -
unfortunately inheriting some frightening stuff.)


--
Mark A. Boyd
Keep-On-Learnin'
  Réponse avec citation
Vieux 20/01/2008, 21h58   #19
Manuel Lemos
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

Hello,

on 01/20/2008 01:56 AM Jerry Stuckle said the following:
> You are obviously not able to step back and take an objective look at
> your classes. I have tried to discuss much of this with you previously,
> but you have consistently argued about unrelated things.
>
> I really don't feel like continuing this argument. Please let me know
> when you can look at it objectively, and I will be happy to *discuss* it
> with you.
>
> I will continue to recommend cURL for the reasons I have outlined. The
> difference is I have no relationship with cURL, other than as a user of
> the library.


Jerry, never mind, this is not important for me.

I am afraid you continue to avoid my points. You keep talking about
arguing, discussing, A is better than B, competing. But I did not come
here for that.

I am not interested in competition with you. For me, cooperating is
better than competing.

I am interested in spreading my class because the constructive feedback
that I get from the users s me to improve the class, so it will be
better prepared for my present and future needs.

You said you tried my class in the past. So I invited you to specify the
limitations that you found on it. Unfortunately you avoided to be
specific, and so, you failed to provide any constructive feedback.

You just made vague assertions about not being flexible without
providing a real world usage that demonstrates where my class was not
able to solve a problem that you had in accessing resources via HTTP.]

When I demonstrated the limitations of relying on the Curl library, you
basically just tried to minimize the problems or just ignored their
relevance.

Maybe I am getting this wrong, but the impression that you are passing,
at least to me, is that you really are not interested in ing, but
rather want to minimize the work and participation of other people in
this newsgroup, so your participation can prevail.

The bottom line, is if you are not interested to be ful and your
interest is just competing with arguments, don't bother, I am not
interested.


--

Regards,
Manuel Lemos

PHP professionals looking for PHP jobs
http://www.phpclasses.org/professionals/

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
  Réponse avec citation
Vieux 20/01/2008, 23h01   #20
Jerry Stuckle
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Automated web browing

Mark A. Boyd wrote:
> Jerry Stuckle <jstucklex@attglobal.net> posted in comp.lang.php:
>
>> Manuel Lemos wrote:
>>> I developed the HTTP client class not just as a mere curl wrapper, but
>>> to actually add some benefits on top of curl/fsockopen. So, it was
>>> meant to add flexibility, not to remove it.
>>>
>>> That is why I questioned you about you flexibility statement. Maybe you
>>> tried an old version of the HTTP client class and you found some
>>> limitations that no longer exist. But if you still find it less
>>> flexible, I want to understand what you are talking about.
>>>
>>>

>> Manuel,