PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > php.general > php page scrapping challenge!
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
php page scrapping challenge!

Réponse
 
LinkBack Outils de la discussion
Vieux 05/05/2008, 05h26   #1
paragasu
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut php page scrapping challenge!

well, this going to be fun.

the website i am trying to scrapped is http://www.cathayholdings.com.my/
it is a movie cinema website with very irritating design. They really tried
to imposed the
security to the point it is really not user friendly. The whole website
written in asp.

I really hate to go around looking for the show time for the latest movie
and decided to
build my own simple website to display the movie and show time from the
cathay cinema
my own way.

But, it is proven not so easy to do. The datetime buried deep inside the
online booking. Thus
user will be able to see the showtimes only when the user click the online
booking. Then, after
user click the online booking, the link open on a new window and generate a
. this
will be part of the URL. So basically, there is two value pass to the
server. (one GET request & one in HTTP header)

Apart from that, they use javascript (AJAX?) to pull the showtime from the
server after you have to
click 3 times. OMG.. i only wan't to know the time and have to go thus whole
step.

using php curl library to simulate the request just to get the movie name
and show time list from the
server. it is possible? post your code..

** no reward, just for php programming fun..

  Réponse avec citation
Vieux 05/05/2008, 06h06   #2
Craige Leeder
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Fwd: [PHP] php page scrapping challenge!

Hey Paragasu,

Sounds like fun, though not really that difficult. It is a very
horrible site, but it shouldnt' take that much to create the script
for. They do not, in-fact, use Javascript to pull the movie times from
the database. They reload the page with the added querystring
variables (for my run through):

isSearchBy=cin // How are we searching
visCinID=1000 // What is the cinema ID
visMovieName=Iron+Man // What movie do we want to see?

I'd give it a try, but I am not setup to use curl at the moment, and
don't anticipate having done so in time to do this.

What you need to do is access the url that assigns you your session
ID, and store that for subsequent curl calls to the server. You should
pass it with all of them. You also need to find out where they
generate the (what I assume is) dynamic part of their url so you can
use that to access the actually movie url. The part in my url was:
.../(3yujtbmepau3jb45a22gju55)/...

Good luck with this project, and let us know how it goes.
- Craige


On Sun, May 4, 2008 at 11:26 PM, paragasu <paragasu@gmail.com> wrote:
> well, this going to be fun.
>
> the website i am trying to scrapped is http://www.cathayholdings.com.my/
> it is a movie cinema website with very irritating design. They really tried
> to imposed the
> security to the point it is really not user friendly. The whole website
> written in asp.
>
> I really hate to go around looking for the show time for the latest movie
> and decided to
> build my own simple website to display the movie and show time from the
> cathay cinema
> my own way.
>
> But, it is proven not so easy to do. The datetime buried deep inside the
> online booking. Thus
> user will be able to see the showtimes only when the user click the online
> booking. Then, after
> user click the online booking, the link open on a new window and generate a
> . this
> will be part of the URL. So basically, there is two value pass to the
> server. (one GET request & one in HTTP header)
>
> Apart from that, they use javascript (AJAX?) to pull the showtime from the
> server after you have to
> click 3 times. OMG.. i only wan't to know the time and have to go thus whole
> step.
>
> using php curl library to simulate the request just to get the movie name
> and show time list from the
> server. it is possible? post your code..
>
> ** no reward, just for php programming fun..
>

  Réponse avec citation
Vieux 05/05/2008, 06h25   #3
paragasu
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: [PHP] php page scrapping challenge!

On Mon, May 5, 2008 at 12:06 PM, Craige Leeder <cleeder@gmail.com> wrote:

> Hey Paragasu,
>
> Sounds like fun, though not really that difficult. It is a very
> horrible site, but it shouldnt' take that much to create the script
> for. They do not, in-fact, use Javascript to pull the movie times from
> the database. They reload the page with the added querystring
> variables (for my run through):
>
> isSearchBy=cin // How are we searching
> visCinID=1000 // What is the cinema ID
> visMovieName=Iron+Man // What movie do we want to see?
>
> I'd give it a try, but I am not setup to use curl at the moment, and
> don't anticipate having done so in time to do this.
>
> What you need to do is access the url that assigns you your session
> ID, and store that for subsequent curl calls to the server. You should
> pass it with all of them. You also need to find out where they
> generate the (what I assume is) dynamic part of their url so you can
> use that to access the actually movie url. The part in my url was:
> .../(3yujtbmepau3jb45a22gju55)/...
>
> Good luck with this project, and let us know how it goes.
> - Craige
>


ups.. nice to hear someone agree with me =) .. at least i am not alone
saying that.
beautiful website with very poor usability is really horrible to use. I
will in fact i should
because i don't like the website usability.. get back to you later..

  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 20h35.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,13855 seconds with 11 queries