|
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hi all,
I have a simple question : how can I ensure that 2 files are identical ? How about this ? --------8<------------------------------------------------------ function files_identical($path1, $path2) { return (file_get_contents($path1) == file_get_contents($path2)); } --------8<------------------------------------------------------ Note that I would like to compare any type of files (text and binary). Thanks for any , -- Mathieu |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
mathieu leddet wrote:
> I have a simple question : how can I ensure that 2 files are identical ? > > How about this ? > > --------8<------------------------------------------------------ > > function files_identical($path1, $path2) { > > return (file_get_contents($path1) == file_get_contents($path2)); > > } > > --------8<------------------------------------------------------ > > Note that I would like to compare any type of files (text and binary). http://php.net/md5_file -Stut -- http://stut.net/ |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Quoting mathieu leddet <mathieu.leddet@mobilescope.com>:
> Hi all, > > I have a simple question : how can I ensure that 2 files are identical ? > > How about this ? > > --------8<------------------------------------------------------ > > function files_identical($path1, $path2) { > > return (file_get_contents($path1) == file_get_contents($path2)); > > } > > --------8<------------------------------------------------------ > > Note that I would like to compare any type of files (text and binary). > > Thanks for any , > > > -- > Mathieu > You could use "md5_file" for this. Something like: function files_identical($path1, $path2) { return (md5_file($path1) == md5_file($path2)); } |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
mathieu leddet wrote:
> Hi all, > > I have a simple question : how can I ensure that 2 files are identical ? > > How about this ? > > --------8<------------------------------------------------------ > > function files_identical($path1, $path2) { > > return (file_get_contents($path1) == file_get_contents($path2)); > > } I would say, use a md5 checksum on both files: function files_identical($path1, $path2) { return (md5(file_get_contents($path1)) === md5(file_get_contents($path2))); } -- Aschwin Wesselius /'What you would like to be done to you, do that to the other....'/ |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
> -----Original Message----- > From: mathieu leddet [mailto:mathieu.leddet@mobilescope.com] > Sent: 12 March 2008 11:04 > To: php-general@lists.php.net > Subject: [php] Comparing files > > > Hi all, > > I have a simple question : how can I ensure that 2 files are identical ? > > How about this ? > > --------8<------------------------------------------------------ > > function files_identical($path1, $path2) { > > return (file_get_contents($path1) == file_get_contents($path2)); > > } > > --------8<------------------------------------------------------ > > Note that I would like to compare any type of files (text and binary). > > Thanks for any , > Depending upon the size of the files, I would expect it would be quicker to compare a hash of each file. Edward |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
> -----Original Message-----
> From: Edward Kay [mailto:edward@labhut.com] > Sent: Wednesday, March 12, 2008 7:13 AM > To: mathieu leddet; php-general@lists.php.net > Subject: RE: [php] Comparing files > > > > > -----Original Message----- > > From: mathieu leddet [mailto:mathieu.leddet@mobilescope.com] > > Sent: 12 March 2008 11:04 > > To: php-general@lists.php.net > > Subject: [php] Comparing files > > > > > > Hi all, > > > > I have a simple question : how can I ensure that 2 files are identical ? > > > > How about this ? > > > > --------8<------------------------------------------------------ > > > > function files_identical($path1, $path2) { > > > > return (file_get_contents($path1) == file_get_contents($path2)); > > > > } > > > > --------8<------------------------------------------------------ > > > > Note that I would like to compare any type of files (text and binary). > > > > Thanks for any , > > > > Depending upon the size of the files, I would expect it would be quicker to > compare a hash of each file. > > Edward > I don't understand how comparing hashes can be faster than comparing contents, except for big files for which you will likely hit the memory limit first and for files who only differ from each other at the very end of them, so the comparison will only be halted then. If the file sizes vary too much, however, a mixed strategy would be the winner; and certainly, you will want to store path names and calculated hashes in a database of some kind to save yourself from hogging the server each time (yeah, CPU and RAM are cheap, but not unlimited resources). Comparing hashes means that a hash must be calculated for files A and B and the related overhead will increase according to the file size (right or wrong?). Comparing the file contents will have an associated overhead for buffering and moving the file contents into memory, and it's also a linear operation (strings are compared byte to byte till there's a difference). So... why not doing the following? 1 - Compare file sizes (this is just a property stored in the file system structures, right?). If sizes are different, the files are different. Otherwise move to step 2. 2 - If the file sizes are smaller than certain size (up to you to find the optimal file size), just compare contents through, say, file_get_contents. Otherwise move to step 3. 3 - Grab some random bytes at the beginning, at the middle and at the end of both files and compare them. If they are different, the files are different. Otherwise move to step 4. 4 - If you reach this point, you are doomed. You have 2 big files that you must compare and they are apparently equal so far. Comparing contents will be over killing if at all possible, so you will want to generate hashes and compare them. Run md5_file on both files (it would be great if you have, say, file A's hash already calculated and stored in a DB or data file) and compare results. It is always up to what kind of files you are dealing with, if the files are often different only at the end of the stream, you may want to skip step 2. But this is what I would generally do. By the way, md5 is a great hashing function, but it is not bullet-proof, collisions may happen (though it's much better than crc32, for example). So, you may also think of how critical is to you to have some false positives (some files that are considered equal by md5_file and they are not) and probably use some diff-like solution instead of md5_file. Anyway, having compared sizes and random bytes (steps 1 through 3), it's very likely that md5_file will catch it if two files are different in just a few bytes. Regards, Rob Andrés Robinet | Lead Developer | BESTPLACE CORPORATION 5100 Bayview Drive 206, Royal Lauderdale Landings, Fort Lauderdale, FL 33308 | TEL 954-607-4207| FAX 954-337-2695 | Email: info@bestplace.net | MSN Chat: best@bestplace.net | SKYPE: bestplace | Web: bestplace.biz | Web: seo-diy.com |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
> -----Original Message----- > From: Andrés Robinet [mailto:agrobinet@bestplace.biz] > Sent: 12 March 2008 12:33 > To: 'Edward Kay'; 'mathieu leddet'; php-general@lists.php.net > Subject: RE: [php] Comparing files > > > > -----Original Message----- > > From: Edward Kay [mailto:edward@labhut.com] > > Sent: Wednesday, March 12, 2008 7:13 AM > > To: mathieu leddet; php-general@lists.php.net > > Subject: RE: [php] Comparing files > > > > > > > > > -----Original Message----- > > > From: mathieu leddet [mailto:mathieu.leddet@mobilescope.com] > > > Sent: 12 March 2008 11:04 > > > To: php-general@lists.php.net > > > Subject: [php] Comparing files > > > > > > > > > Hi all, > > > > > > I have a simple question : how can I ensure that 2 files are > identical ? > > > > > > How about this ? > > > > > > --------8<------------------------------------------------------ > > > > > > function files_identical($path1, $path2) { > > > > > > return (file_get_contents($path1) == file_get_contents($path2)); > > > > > > } > > > > > > --------8<------------------------------------------------------ > > > > > > Note that I would like to compare any type of files (text and binary). > > > > > > Thanks for any , > > > > > > > Depending upon the size of the files, I would expect it would > be quicker to > > compare a hash of each file. > > > > Edward > > > > I don't understand how comparing hashes can be faster than > comparing contents, > except for big files for which you will likely hit the memory > limit first and > for files who only differ from each other at the very end of them, so the > comparison will only be halted then. If the file sizes vary too > much, however, a > mixed strategy would be the winner; and certainly, you will want > to store path > names and calculated hashes in a database of some kind to save > yourself from > hogging the server each time (yeah, CPU and RAM are cheap, but > not unlimited > resources). > > Comparing hashes means that a hash must be calculated for files A > and B and the > related overhead will increase according to the file size (right > or wrong?). > Comparing the file contents will have an associated overhead for > buffering and > moving the file contents into memory, and it's also a linear > operation (strings > are compared byte to byte till there's a difference). So... why > not doing the > following? > > 1 - Compare file sizes (this is just a property stored in the file system > structures, right?). If sizes are different, the files are > different. Otherwise > move to step 2. > 2 - If the file sizes are smaller than certain size (up to you to find the > optimal file size), just compare contents through, say, file_get_contents. > Otherwise move to step 3. > 3 - Grab some random bytes at the beginning, at the middle and at > the end of > both files and compare them. If they are different, the files are > different. > Otherwise move to step 4. > 4 - If you reach this point, you are doomed. You have 2 big files > that you must > compare and they are apparently equal so far. Comparing contents > will be over > killing if at all possible, so you will want to generate hashes > and compare > them. Run md5_file on both files (it would be great if you have, > say, file A's > hash already calculated and stored in a DB or data file) and > compare results. > > It is always up to what kind of files you are dealing with, if > the files are > often different only at the end of the stream, you may want to > skip step 2. But > this is what I would generally do. > > By the way, md5 is a great hashing function, but it is not bullet-proof, > collisions may happen (though it's much better than crc32, for > example). So, you > may also think of how critical is to you to have some false > positives (some > files that are considered equal by md5_file and they are not) and > probably use > some diff-like solution instead of md5_file. Anyway, having > compared sizes and > random bytes (steps 1 through 3), it's very likely that md5_file > will catch it > if two files are different in just a few bytes. > Agreed. In by first reply, I meant that hashes would likely be quicker/more memory friendly when handling larger files, but this is just a hunch - I haven't benchmarked anything. It was really meant to give the OP other possibilities to look into. Edward |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
Quoting Andrés Robinet <agrobinet@bestplace.biz>:
>> -----Original Message----- >> From: Edward Kay [mailto:edward@labhut.com] >> Sent: Wednesday, March 12, 2008 7:13 AM >> To: mathieu leddet; php-general@lists.php.net >> Subject: RE: [php] Comparing files >> >> >> >> > -----Original Message----- >> > From: mathieu leddet [mailto:mathieu.leddet@mobilescope.com] >> > Sent: 12 March 2008 11:04 >> > To: php-general@lists.php.net >> > Subject: [php] Comparing files >> > >> > >> > Hi all, >> > >> > I have a simple question : how can I ensure that 2 files are identical ? >> > >> > How about this ? >> > >> > --------8<------------------------------------------------------ >> > >> > function files_identical($path1, $path2) { >> > >> > return (file_get_contents($path1) == file_get_contents($path2)); >> > >> > } >> > >> > --------8<------------------------------------------------------ >> > >> > Note that I would like to compare any type of files (text and binary). >> > >> > Thanks for any , >> > >> >> Depending upon the size of the files, I would expect it would be quicker to >> compare a hash of each file. >> >> Edward >> > > I don't understand how comparing hashes can be faster than comparing > contents, > except for big files for which you will likely hit the memory limit first and > for files who only differ from each other at the very end of them, so the > comparison will only be halted then. If the file sizes vary too > much, however, a > mixed strategy would be the winner; and certainly, you will want to > store path > names and calculated hashes in a database of some kind to save yourself from > hogging the server each time (yeah, CPU and RAM are cheap, but not unlimited > resources). I must agree that a mixed solution would be best here. > Comparing hashes means that a hash must be calculated for files A > and B and the > related overhead will increase according to the file size (right or wrong?). > Comparing the file contents will have an associated overhead for > buffering and > moving the file contents into memory, and it's also a linear > operation (strings > are compared byte to byte till there's a difference). So... why not doing the > following? > > 1 - Compare file sizes (this is just a property stored in the file system > structures, right?). If sizes are different, the files are > different. Otherwise > move to step 2. I like this idea. It's fast and will catch most differences. > 2 - If the file sizes are smaller than certain size (up to you to find the > optimal file size), just compare contents through, say, file_get_contents. > Otherwise move to step 3. > 3 - Grab some random bytes at the beginning, at the middle and at the end of > both files and compare them. If they are different, the files are different. > Otherwise move to step 4. Not sure about this one. Will all the file operations not create to much overhead if you are dealing with large files? > 4 - If you reach this point, you are doomed. You have 2 big files > that you must > compare and they are apparently equal so far. Comparing contents will be over > killing if at all possible, so you will want to generate hashes and compare > them. Run md5_file on both files (it would be great if you have, > say, file A's > hash already calculated and stored in a DB or data file) and compare results. > > It is always up to what kind of files you are dealing with, if the files are > often different only at the end of the stream, you may want to skip > step 2. But > this is what I would generally do. > > By the way, md5 is a great hashing function, but it is not bullet-proof, > collisions may happen (though it's much better than crc32, for > example). So, you MD5 is for sure not bullet-proof. You could always switch to sha1_file for a bit more security. > may also think of how critical is to you to have some false positives (some > files that are considered equal by md5_file and they are not) and > probably use > some diff-like solution instead of md5_file. Anyway, having compared > sizes and > random bytes (steps 1 through 3), it's very likely that md5_file > will catch it > if two files are different in just a few bytes. > > Regards, > > Rob > > Andrés Robinet | Lead Developer | BESTPLACE CORPORATION 5100 Bayview > Drive 206, Royal Lauderdale Landings, Fort Lauderdale, FL 33308 | > TEL 954-607-4207 | FAX 954-337-2695 | > Email: info@bestplace.net | MSN Chat: best@bestplace.net | SKYPE: > bestplace | > Web: bestplace.biz | Web: seo-diy.com > > > > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > > |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
That would work fine.
If you happen to be on UNIX, cmp -s is another possibility: // Returns 0 if the files are identical or // 1 if the files differ. function cmpFiles($a, $b) { $cmd = 'cmp -s ' . escapeshellarg($a) . ' ' . escapeshellarg($b) . ' 2>&1'; exec($cmd, $output, $exitCode); if ($exitCode != 0 && $exitCode != 1) { throw new Exception("Command \"$cmd\" failed with exit " . "code $exitCode: " . join("\n", $output)); } return $exitCode; } Regards, John Peters On Mar 12, 7:04 am, mathieu.led...@mobilescope.com ("mathieu leddet") wrote: > Hi all, > > I have a simple question : how can I ensure that 2 files are identical ? > > How about this ? > > --------8<------------------------------------------------------ > > function files_identical($path1, $path2) { > > return (file_get_contents($path1) == file_get_contents($path2)); > > } > > --------8<------------------------------------------------------ > > Note that I would like to compare any type of files (text and binary). > > Thanks for any , > > -- > Mathieu |
|
![]() |
| Outils de la discussion | |
|
|