|
|
|
#1 |
|
Messages: n/a
Hébergeur: |
What's the best way to pull down XML from a URL? fopen($URL), then
using xml_parse? Or should I be using XML_Parser or SimpleXML? Thanks, Waynn |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
2008/5/12 Waynn Lue <waynnlue@gmail.com>:
> What's the best way to pull down XML from a URL? fopen($URL), then > using xml_parse? Or should I be using XML_Parser or SimpleXML? XML parsers fall into two general camps - DOM and SAX. DOM parsers represent an entire XML document as a tree, in-memory, when they are first instantiated. They are generally more memory-hungry and take longer to instantiate, but they can answer queries like "what is the path to this node" or "give me the siblings of this node". SAX parsers are stream- or event-based, and are much more lightweight - they parse the XML in a JIT fashion, and can't answer much more than "give me the next node". If you just need the data, a SAX parser will probably do everything you need. If you need the tree structure implicit in an XML document, use a DOM parser. Expat, which XML Parser (http://uk3.php.net/manual/en/book.xml.php) is based on, is a SAX parser. DOM XML (http://uk3.php.net/manual/en/book.domxml.php) is, obviously, a DOM parser. I don't know, off the top of my head, which camp SimpleXML falls into. |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
So if I'm looking to parse certain attributes out of an XML tree, if I
use SAX, it seems that I would need to keep track of state internally. E.g., if I have a tree like <head> <a> <b></b> </a> <a> <b></b> </a> </head> and say I'm interested in all that's between <b> underneath any <a>, I'd need to have a state machine that looked for an <a> followed by a <b>. If I'm doing that, though, it seems like I should just start using a DOM parser instead? Thanks for any insight, Waynn On Mon, May 12, 2008 at 1:29 AM, David Otton <phpmail@jawbone.freeserve.co.uk> wrote: > 2008/5/12 Waynn Lue <waynnlue@gmail.com>: > > > What's the best way to pull down XML from a URL? fopen($URL), then > > using xml_parse? Or should I be using XML_Parser or SimpleXML? > > XML parsers fall into two general camps - DOM and SAX. DOM parsers > represent an entire XML document as a tree, in-memory, when they are > first instantiated. They are generally more memory-hungry and take > longer to instantiate, but they can answer queries like "what is the > path to this node" or "give me the siblings of this node". > > SAX parsers are stream- or event-based, and are much more lightweight > - they parse the XML in a JIT fashion, and can't answer much more than > "give me the next node". > > If you just need the data, a SAX parser will probably do everything > you need. If you need the tree structure implicit in an XML document, > use a DOM parser. Expat, which XML Parser > (http://uk3.php.net/manual/en/book.xml.php) is based on, is a SAX > parser. DOM XML (http://uk3.php.net/manual/en/book.domxml.php) is, > obviously, a DOM parser. I don't know, off the top of my head, which > camp SimpleXML falls into. > |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
Fot SImpler XMLs and not too large up to 1Mb I would use
$X = simplexml_load_file($URL); simple xml is fairly fast and is very easy to use it accepts foreach loops, accessing attributes via array fashion etc On May 12, 2008, at 9:02 AM, Waynn Lue wrote: > What's the best way to pull down XML from a URL? fopen($URL), then > using xml_parse? Or should I be using XML_Parser or SimpleXML? > > Thanks, > Waynn > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > Bojan Tesanovic http://www.carster.us/ |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
Here is the very simple way
![]() <?php $XML=<<<XMLL <head> <a href='/asas' > <b>First</b> </a> <a href='/bla' > <b class='klas' >Second</b> </a> </head> XMLL; $X = simplexml_load_string($XML); foreach ($X->a as $a){ echo $a->b ."\n"; if( $a->b['class'] ) { echo 'B has class - ' .$a->b['class']."\n"; } } ?> On May 12, 2008, at 1:28 PM, Waynn Lue wrote: > So if I'm looking to parse certain attributes out of an XML tree, if I > use SAX, it seems that I would need to keep track of state internally. > E.g., if I have a tree like > > <head> > <a> > <b></b> > </a> > <a> > <b></b> > </a> > </head> > > and say I'm interested in all that's between <b> underneath any <a>, > I'd need to have a state machine that looked for an <a> followed by a > <b>. If I'm doing that, though, it seems like I should just start > using a DOM parser instead? > > Thanks for any insight, > Waynn > > On Mon, May 12, 2008 at 1:29 AM, David Otton > <phpmail@jawbone.freeserve.co.uk> wrote: >> 2008/5/12 Waynn Lue <waynnlue@gmail.com>: >> >>> What's the best way to pull down XML from a URL? fopen($URL), then >>> using xml_parse? Or should I be using XML_Parser or SimpleXML? >> >> XML parsers fall into two general camps - DOM and SAX. DOM parsers >> represent an entire XML document as a tree, in-memory, when they are >> first instantiated. They are generally more memory-hungry and take >> longer to instantiate, but they can answer queries like "what is the >> path to this node" or "give me the siblings of this node". >> >> SAX parsers are stream- or event-based, and are much more >> lightweight >> - they parse the XML in a JIT fashion, and can't answer much more >> than >> "give me the next node". >> >> If you just need the data, a SAX parser will probably do everything >> you need. If you need the tree structure implicit in an XML >> document, >> use a DOM parser. Expat, which XML Parser >> (http://uk3.php.net/manual/en/book.xml.php) is based on, is a SAX >> parser. DOM XML (http://uk3.php.net/manual/en/book.domxml.php) is, >> obviously, a DOM parser. I don't know, off the top of my head, which >> camp SimpleXML falls into. >> > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > Bojan Tesanovic http://www.carster.us/ |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
2008/5/12 Waynn Lue <waynnlue@gmail.com>:
> So if I'm looking to parse certain attributes out of an XML tree, if I > use SAX, it seems that I would need to keep track of state internally. > E.g., if I have a tree like > > <head> > <a> > <b></b> > </a> > <a> > <b></b> > </a> > </head> > > and say I'm interested in all that's between <b> underneath any <a>, > I'd need to have a state machine that looked for an <a> followed by a > <b>. If I'm doing that, though, it seems like I should just start > using a DOM parser instead? Yeah, I think you've got it nailed, although your example is simple enough (you're only holding one state value - "am I a child of <a>?") that I'd probably still reflexively reach for the lightweight solution). I use SAX for lightweight hacks, one step up from regexes - I know the information I want is between <tag> and </tag>, and I don't care about the rest of the document. The more I need to navigate the document, the more likely I am to use DOM. I could build my own data structures on top of a SAX parser, but why bother reinventing the wheel? Of course, you have to factor document size into that - parsing a big XML document into a tree can be slow. You might also want to explore XPath (http://uk.php.net/manual/en/function...ment-xpath.php http://uk.php.net/manual/en/class.domxpath.php)... XPath is to XML as Regexes are to text files. There's a good chance you'll be able to roll all your parsing up into a couple of XPath queries. I probably should have added that simple parsers come in two flavours - Push Parsers and Pull Parsers. I tend to think (lazily) of Push and Pull as variations on SAX, but strictly speaking they are different. |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
Hi All
I am using a PHP Mailer to send mass mails. How can I Identify how mails have bounced. Chetan Dattaram Rane Software Engineer |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
Chetan Rane wrote:
> Hi All > > I am using a PHP Mailer to send mass mails. > How can I Identify how mails have bounced. > Hi, I guess you have to read some RFC's to get an idea about e-mail protocols. -- Aschwin Wesselius /'What you would like to be done to you, do that to the other....'/ |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
Seems like the general way is to create a mailbox (POP3 or IMAP) to
accept the bounces, then check it periodically and mark the emails as invalid in your local database. I would set threshholds so you don't mark something failed that only bounced once - it could have been a mail setup error or something else; I'd say wait for 3 failures in a 7 day period at least. If you get 3 bounces by that point, the address is probably safely dead. You can use PHP's IMAP functions to check the mailbox (even for POP3) or a million classes or your own functions directly on the socket (POP3 is a simple protocol) - it also s if you parse the bounced email message to process the return address and the mail code; perhaps build something better than just 3 failures = invalid, but actually determine if they're full out failures, or if they're just temporary bounces, etc. Another method: you could just parse mail logs, if you have access to them. > Chetan Rane wrote: > > Hi All > > > > I am using a PHP Mailer to send mass mails. > > How can I Identify how mails have bounced. |
|
|
|
#10 |
|
Messages: n/a
Hébergeur: |
Chetan Rane wrote:
> Hi All > > I am using a PHP Mailer to send mass mails. > How can I Identify how mails have bounced. > You send them with a bounce-address that uniquely identifies the recipient - when the email bounces, you know exactly which recipient it was. I typically have my mailserver do a quick database update to set a status for such bounces. /Per Jessen, Zürich |
|
|
|
#11 |
|
Messages: n/a
Hébergeur: |
mike wrote:
> Seems like the general way is to create a mailbox (POP3 or IMAP) to > accept the bounces, then check it periodically and mark the emails as > invalid in your local database. > > I would set threshholds so you don't mark something failed that only > bounced once - it could have been a mail setup error or something > else; I'd say wait for 3 failures in a 7 day period at least. If you > get 3 bounces by that point, the address is probably safely dead. > > You can use PHP's IMAP functions to check the mailbox (even for POP3) > or a million classes or your own functions directly on the socket > (POP3 is a simple protocol) - it also s if you parse the bounced > email message to process the return address and the mail code; perhaps > build something better than just 3 failures = invalid, but actually > determine if they're full out failures, or if they're just temporary > bounces, etc. I use this method and it works reasonably well. The hard part is the last sentence - there are so many ways to say "mailbox full" - half don't include smtp error codes, the rest tell you the same thing in thousands of different ways. -- Postgresql & php tutorials http://www.designmagick.com/ |
|
|
|
#12 |
|
Messages: n/a
Hébergeur: |
Ok, thanks so much for the . I went with DOM-parsing to begin
with, I'll explore XPath + SimpleXML later. Thanks, Waynn On Mon, May 12, 2008 at 5:23 AM, David Otton <phpmail@jawbone.freeserve.co.uk> wrote: > 2008/5/12 Waynn Lue <waynnlue@gmail.com>: >> So if I'm looking to parse certain attributes out of an XML tree, if I >> use SAX, it seems that I would need to keep track of state internally. >> E.g., if I have a tree like >> >> <head> >> <a> >> <b></b> >> </a> >> <a> >> <b></b> >> </a> >> </head> >> >> and say I'm interested in all that's between <b> underneath any <a>, >> I'd need to have a state machine that looked for an <a> followed by a >> <b>. If I'm doing that, though, it seems like I should just start >> using a DOM parser instead? > > Yeah, I think you've got it nailed, although your example is simple > enough (you're only holding one state value - "am I a child of <a>?") > that I'd probably still reflexively reach for the lightweight > solution). I use SAX for lightweight hacks, one step up from regexes - > I know the information I want is between <tag> and </tag>, and I don't > care about the rest of the document. The more I need to navigate the > document, the more likely I am to use DOM. I could build my own data > structures on top of a SAX parser, but why bother reinventing the > wheel? Of course, you have to factor document size into that - parsing > a big XML document into a tree can be slow. > > You might also want to explore XPath > (http://uk.php.net/manual/en/function...ment-xpath.php > http://uk.php.net/manual/en/class.domxpath.php)... XPath is to XML as > Regexes are to text files. There's a good chance you'll be able to > roll all your parsing up into a couple of XPath queries. > > I probably should have added that simple parsers come in two flavours > - Push Parsers and Pull Parsers. I tend to think (lazily) of Push and > Pull as variations on SAX, but strictly speaking they are different. > |
|
|
|
#13 |
|
Messages: n/a
Hébergeur: |
On Tue, May 13, 2008 at 7:29 PM, Waynn Lue <waynnlue@gmail.com> wrote:
> Ok, thanks so much for the . I went with DOM-parsing to begin > with, I'll explore XPath + SimpleXML later. just fyi, youre likely to get more bang for your buck starting off w/ SimpleXML. DOM is a successor to DOMXML from php4. its a bulky, yet powerful interface into the DOM. SimpleXML is also a DOM parser, however the interface is simpler in exchange for less power. the good news is in php5 you can switch back and for between DOM and SimpleXML easily at virtually no cost. my modo in php5 is to use SimpleXML unless there is a real need for DOM, and in that case most likey, you can get away w/ converting to DOM at runtime (again very little cost there) and doing a few operations, then carrying on w/ SimpleXML. -nathan |
|
|
|
#14 |
|
Messages: n/a
Hébergeur: |
Thank you all.
Maybe you didn't get what I meant. I have made it working excellent for me. I have summarized the solution: http://phparch.cn On Wed, May 14, 2008 at 11:10 AM, Nathan Nobbe <quickshiftin@gmail.com> wrote: > On Tue, May 13, 2008 at 7:29 PM, Waynn Lue <waynnlue@gmail.com> wrote: > > > Ok, thanks so much for the . I went with DOM-parsing to begin > > with, I'll explore XPath + SimpleXML later. > > > just fyi, youre likely to get more bang for your buck starting off w/ > SimpleXML. DOM is a successor to DOMXML from php4. its a bulky, yet > powerful interface into the DOM. SimpleXML is also a DOM parser, however > the interface is simpler in exchange for less power. the good news is in > php5 you can switch back and for between DOM and SimpleXML easily at > virtually no cost. > > my modo in php5 is to use SimpleXML unless there is a real need for DOM, > and > in that case most likey, you can get away w/ converting to DOM at runtime > (again very little cost there) and doing a few operations, then carrying > on > w/ SimpleXML. > > -nathan > -- Regards, Shelley |
|
![]() |
| Outils de la discussion | |
|
|