On Tue, 18 Sep 2007 01:54:17 -0000, yawnmoth put finger to keyboard
and typed:
>Websites like amazon.com, newegg.com, bestbuy.com, etc, have pages,
>replete with information, for each and every single product they
>sell. My question is... where do they get this information from?
Their suppliers, usually.
>Websites like facebook.com and digg.com have API's that let you pull
>information in an easy to parse XML format that, in theory, should
>never change, even whilst the layout of the website does. Is that
>what websites like bestbuy.com does? Does bestbuy.com pull it's
>information in an easy-to-parse from it's vendor?
That depends on the vendor. Some make their data available via XML
over the web, others supply CSV files via FTP, others supply the data
in proprietory format on disk, and some just provide a printed
catalogue.
>Or maybe bestbuy.com just uses a bunch of regular expressions and
>parses it's vendors webpage that way? If so, they're liable to have
>to recode their parsing tools every time their vendor gets a facelift.
Not all vendors put their catalogue on the web, but where they do,
this is often the way it's done.
>The third possibility that occurs to me is that maybe they just have
>people enter this information into a computer, all day?
For some vendors, yes, this is the only way to get the data into your
database.
>Any ideas? If they have an easy to parse API format, is that
>something that other vendors are likely to have, as well?
There are no standards. Each vendor does it their own way.
Mark
--
http://www.BritishSurnames.co.uk - What does your surname say about you?
"Life is bigger, it's bigger than you"