PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > php.general > Comparing files
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
Comparing files

Réponse
 
LinkBack Outils de la discussion
Vieux 12/03/2008, 13h04   #1
mathieu leddet
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Comparing files

Hi all,

I have a simple question : how can I ensure that 2 files are identical ?

How about this ?

--------8<------------------------------------------------------

function files_identical($path1, $path2) {

return (file_get_contents($path1) == file_get_contents($path2));

}

--------8<------------------------------------------------------

Note that I would like to compare any type of files (text and binary).

Thanks for any ,


--
Mathieu
  Réponse avec citation
Vieux 12/03/2008, 13h08   #2
Stut
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: [PHP] Comparing files

mathieu leddet wrote:
> I have a simple question : how can I ensure that 2 files are identical ?
>
> How about this ?
>
> --------8<------------------------------------------------------
>
> function files_identical($path1, $path2) {
>
> return (file_get_contents($path1) == file_get_contents($path2));
>
> }
>
> --------8<------------------------------------------------------
>
> Note that I would like to compare any type of files (text and binary).


http://php.net/md5_file

-Stut

--
http://stut.net/
  Réponse avec citation
Vieux 12/03/2008, 13h08   #3
Thijs Lensselink
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: [PHP] Comparing files

Quoting mathieu leddet <mathieu.leddet@mobilescope.com>:

> Hi all,
>
> I have a simple question : how can I ensure that 2 files are identical ?
>
> How about this ?
>
> --------8<------------------------------------------------------
>
> function files_identical($path1, $path2) {
>
> return (file_get_contents($path1) == file_get_contents($path2));
>
> }
>
> --------8<------------------------------------------------------
>
> Note that I would like to compare any type of files (text and binary).
>
> Thanks for any ,
>
>
> --
> Mathieu
>


You could use "md5_file" for this. Something like:

function files_identical($path1, $path2) {

return (md5_file($path1) == md5_file($path2));

}

  Réponse avec citation
Vieux 12/03/2008, 13h09   #4
Aschwin Wesselius
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: [PHP] Comparing files

mathieu leddet wrote:
> Hi all,
>
> I have a simple question : how can I ensure that 2 files are identical ?
>
> How about this ?
>
> --------8<------------------------------------------------------
>
> function files_identical($path1, $path2) {
>
> return (file_get_contents($path1) == file_get_contents($path2));
>
> }

I would say, use a md5 checksum on both files:


function files_identical($path1, $path2) {

return (md5(file_get_contents($path1)) === md5(file_get_contents($path2)));

}



--

Aschwin Wesselius

/'What you would like to be done to you, do that to the other....'/

  Réponse avec citation
Vieux 12/03/2008, 13h13   #5
Edward Kay
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut RE: [PHP] Comparing files



> -----Original Message-----
> From: mathieu leddet [mailto:mathieu.leddet@mobilescope.com]
> Sent: 12 March 2008 11:04
> To: php-general@lists.php.net
> Subject: [php] Comparing files
>
>
> Hi all,
>
> I have a simple question : how can I ensure that 2 files are identical ?
>
> How about this ?
>
> --------8<------------------------------------------------------
>
> function files_identical($path1, $path2) {
>
> return (file_get_contents($path1) == file_get_contents($path2));
>
> }
>
> --------8<------------------------------------------------------
>
> Note that I would like to compare any type of files (text and binary).
>
> Thanks for any ,
>


Depending upon the size of the files, I would expect it would be quicker to
compare a hash of each file.

Edward

  Réponse avec citation
Vieux 12/03/2008, 14h33   #6
Andrés Robinet
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut RE: [PHP] Comparing files

> -----Original Message-----
> From: Edward Kay [mailto:edward@labhut.com]
> Sent: Wednesday, March 12, 2008 7:13 AM
> To: mathieu leddet; php-general@lists.php.net
> Subject: RE: [php] Comparing files
>
>
>
> > -----Original Message-----
> > From: mathieu leddet [mailto:mathieu.leddet@mobilescope.com]
> > Sent: 12 March 2008 11:04
> > To: php-general@lists.php.net
> > Subject: [php] Comparing files
> >
> >
> > Hi all,
> >
> > I have a simple question : how can I ensure that 2 files are identical ?
> >
> > How about this ?
> >
> > --------8<------------------------------------------------------
> >
> > function files_identical($path1, $path2) {
> >
> > return (file_get_contents($path1) == file_get_contents($path2));
> >
> > }
> >
> > --------8<------------------------------------------------------
> >
> > Note that I would like to compare any type of files (text and binary).
> >
> > Thanks for any ,
> >

>
> Depending upon the size of the files, I would expect it would be quicker to
> compare a hash of each file.
>
> Edward
>


I don't understand how comparing hashes can be faster than comparing contents,
except for big files for which you will likely hit the memory limit first and
for files who only differ from each other at the very end of them, so the
comparison will only be halted then. If the file sizes vary too much, however, a
mixed strategy would be the winner; and certainly, you will want to store path
names and calculated hashes in a database of some kind to save yourself from
hogging the server each time (yeah, CPU and RAM are cheap, but not unlimited
resources).

Comparing hashes means that a hash must be calculated for files A and B and the
related overhead will increase according to the file size (right or wrong?).
Comparing the file contents will have an associated overhead for buffering and
moving the file contents into memory, and it's also a linear operation (strings
are compared byte to byte till there's a difference). So... why not doing the
following?

1 - Compare file sizes (this is just a property stored in the file system
structures, right?). If sizes are different, the files are different. Otherwise
move to step 2.
2 - If the file sizes are smaller than certain size (up to you to find the
optimal file size), just compare contents through, say, file_get_contents.
Otherwise move to step 3.
3 - Grab some random bytes at the beginning, at the middle and at the end of
both files and compare them. If they are different, the files are different.
Otherwise move to step 4.
4 - If you reach this point, you are doomed. You have 2 big files that you must
compare and they are apparently equal so far. Comparing contents will be over
killing if at all possible, so you will want to generate hashes and compare
them. Run md5_file on both files (it would be great if you have, say, file A's
hash already calculated and stored in a DB or data file) and compare results.

It is always up to what kind of files you are dealing with, if the files are
often different only at the end of the stream, you may want to skip step 2. But
this is what I would generally do.

By the way, md5 is a great hashing function, but it is not bullet-proof,
collisions may happen (though it's much better than crc32, for example). So, you
may also think of how critical is to you to have some false positives (some
files that are considered equal by md5_file and they are not) and probably use
some diff-like solution instead of md5_file. Anyway, having compared sizes and
random bytes (steps 1 through 3), it's very likely that md5_file will catch it
if two files are different in just a few bytes.

Regards,

Rob

Andrés Robinet | Lead Developer | BESTPLACE CORPORATION
5100 Bayview Drive 206, Royal Lauderdale Landings, Fort Lauderdale, FL 33308 |
TEL 954-607-4207| FAX 954-337-2695 |
Email: info@bestplace.net | MSN Chat: best@bestplace.net | SKYPE: bestplace |
Web: bestplace.biz | Web: seo-diy.com



  Réponse avec citation
Vieux 12/03/2008, 15h15   #7
Edward Kay
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut RE: [PHP] Comparing files



> -----Original Message-----
> From: Andrés Robinet [mailto:agrobinet@bestplace.biz]
> Sent: 12 March 2008 12:33
> To: 'Edward Kay'; 'mathieu leddet'; php-general@lists.php.net
> Subject: RE: [php] Comparing files
>
>
> > -----Original Message-----
> > From: Edward Kay [mailto:edward@labhut.com]
> > Sent: Wednesday, March 12, 2008 7:13 AM
> > To: mathieu leddet; php-general@lists.php.net
> > Subject: RE: [php] Comparing files
> >
> >
> >
> > > -----Original Message-----
> > > From: mathieu leddet [mailto:mathieu.leddet@mobilescope.com]
> > > Sent: 12 March 2008 11:04
> > > To: php-general@lists.php.net
> > > Subject: [php] Comparing files
> > >
> > >
> > > Hi all,
> > >
> > > I have a simple question : how can I ensure that 2 files are

> identical ?
> > >
> > > How about this ?
> > >
> > > --------8<------------------------------------------------------
> > >
> > > function files_identical($path1, $path2) {
> > >
> > > return (file_get_contents($path1) == file_get_contents($path2));
> > >
> > > }
> > >
> > > --------8<------------------------------------------------------
> > >
> > > Note that I would like to compare any type of files (text and binary).
> > >
> > > Thanks for any ,
> > >

> >
> > Depending upon the size of the files, I would expect it would

> be quicker to
> > compare a hash of each file.
> >
> > Edward
> >

>
> I don't understand how comparing hashes can be faster than
> comparing contents,
> except for big files for which you will likely hit the memory
> limit first and
> for files who only differ from each other at the very end of them, so the
> comparison will only be halted then. If the file sizes vary too
> much, however, a
> mixed strategy would be the winner; and certainly, you will want
> to store path
> names and calculated hashes in a database of some kind to save
> yourself from
> hogging the server each time (yeah, CPU and RAM are cheap, but
> not unlimited
> resources).
>
> Comparing hashes means that a hash must be calculated for files A
> and B and the
> related overhead will increase according to the file size (right
> or wrong?).
> Comparing the file contents will have an associated overhead for
> buffering and
> moving the file contents into memory, and it's also a linear
> operation (strings
> are compared byte to byte till there's a difference). So... why
> not doing the
> following?
>
> 1 - Compare file sizes (this is just a property stored in the file system
> structures, right?). If sizes are different, the files are
> different. Otherwise
> move to step 2.
> 2 - If the file sizes are smaller than certain size (up to you to find the
> optimal file size), just compare contents through, say, file_get_contents.
> Otherwise move to step 3.
> 3 - Grab some random bytes at the beginning, at the middle and at
> the end of
> both files and compare them. If they are different, the files are
> different.
> Otherwise move to step 4.
> 4 - If you reach this point, you are doomed. You have 2 big files
> that you must
> compare and they are apparently equal so far. Comparing contents
> will be over
> killing if at all possible, so you will want to generate hashes
> and compare
> them. Run md5_file on both files (it would be great if you have,
> say, file A's
> hash already calculated and stored in a DB or data file) and
> compare results.
>
> It is always up to what kind of files you are dealing with, if
> the files are
> often different only at the end of the stream, you may want to
> skip step 2. But
> this is what I would generally do.
>
> By the way, md5 is a great hashing function, but it is not bullet-proof,
> collisions may happen (though it's much better than crc32, for
> example). So, you
> may also think of how critical is to you to have some false
> positives (some
> files that are considered equal by md5_file and they are not) and
> probably use
> some diff-like solution instead of md5_file. Anyway, having
> compared sizes and
> random bytes (steps 1 through 3), it's very likely that md5_file
> will catch it
> if two files are different in just a few bytes.
>


Agreed. In by first reply, I meant that hashes would likely be quicker/more
memory friendly when handling larger files, but this is just a hunch - I
haven't benchmarked anything. It was really meant to give the OP other
possibilities to look into.

Edward

  Réponse avec citation
Vieux 12/03/2008, 15h21   #8
Thijs Lensselink
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut RE: [PHP] Comparing files

Quoting Andrés Robinet <agrobinet@bestplace.biz>:

>> -----Original Message-----
>> From: Edward Kay [mailto:edward@labhut.com]
>> Sent: Wednesday, March 12, 2008 7:13 AM
>> To: mathieu leddet; php-general@lists.php.net
>> Subject: RE: [php] Comparing files
>>
>>
>>
>> > -----Original Message-----
>> > From: mathieu leddet [mailto:mathieu.leddet@mobilescope.com]
>> > Sent: 12 March 2008 11:04
>> > To: php-general@lists.php.net
>> > Subject: [php] Comparing files
>> >
>> >
>> > Hi all,
>> >
>> > I have a simple question : how can I ensure that 2 files are identical ?
>> >
>> > How about this ?
>> >
>> > --------8<------------------------------------------------------
>> >
>> > function files_identical($path1, $path2) {
>> >
>> > return (file_get_contents($path1) == file_get_contents($path2));
>> >
>> > }
>> >
>> > --------8<------------------------------------------------------
>> >
>> > Note that I would like to compare any type of files (text and binary).
>> >
>> > Thanks for any ,
>> >

>>
>> Depending upon the size of the files, I would expect it would be quicker to
>> compare a hash of each file.
>>
>> Edward
>>

>
> I don't understand how comparing hashes can be faster than comparing
> contents,
> except for big files for which you will likely hit the memory limit first and
> for files who only differ from each other at the very end of them, so the
> comparison will only be halted then. If the file sizes vary too
> much, however, a
> mixed strategy would be the winner; and certainly, you will want to
> store path
> names and calculated hashes in a database of some kind to save yourself from
> hogging the server each time (yeah, CPU and RAM are cheap, but not unlimited
> resources).


I must agree that a mixed solution would be best here.

> Comparing hashes means that a hash must be calculated for files A
> and B and the
> related overhead will increase according to the file size (right or wrong?).
> Comparing the file contents will have an associated overhead for
> buffering and
> moving the file contents into memory, and it's also a linear
> operation (strings
> are compared byte to byte till there's a difference). So... why not doing the
> following?
>
> 1 - Compare file sizes (this is just a property stored in the file system
> structures, right?). If sizes are different, the files are
> different. Otherwise
> move to step 2.


I like this idea. It's fast and will catch most differences.

> 2 - If the file sizes are smaller than certain size (up to you to find the
> optimal file size), just compare contents through, say, file_get_contents.
> Otherwise move to step 3.
> 3 - Grab some random bytes at the beginning, at the middle and at the end of
> both files and compare them. If they are different, the files are different.
> Otherwise move to step 4.


Not sure about this one. Will all the file operations not create to
much overhead if you are dealing with large files?

> 4 - If you reach this point, you are doomed. You have 2 big files
> that you must
> compare and they are apparently equal so far. Comparing contents will be over
> killing if at all possible, so you will want to generate hashes and compare
> them. Run md5_file on both files (it would be great if you have,
> say, file A's
> hash already calculated and stored in a DB or data file) and compare results.
>
> It is always up to what kind of files you are dealing with, if the files are
> often different only at the end of the stream, you may want to skip
> step 2. But
> this is what I would generally do.
>
> By the way, md5 is a great hashing function, but it is not bullet-proof,
> collisions may happen (though it's much better than crc32, for
> example). So, you


MD5 is for sure not bullet-proof. You could always switch to sha1_file for a
bit more security.

> may also think of how critical is to you to have some false positives (some
> files that are considered equal by md5_file and they are not) and
> probably use
> some diff-like solution instead of md5_file. Anyway, having compared
> sizes and
> random bytes (steps 1 through 3), it's very likely that md5_file
> will catch it
> if two files are different in just a few bytes.
>
> Regards,
>
> Rob
>
> Andrés Robinet | Lead Developer | BESTPLACE CORPORATION 5100 Bayview
> Drive 206, Royal Lauderdale Landings, Fort Lauderdale, FL 33308 |
> TEL 954-607-4207 | FAX 954-337-2695 |
> Email: info@bestplace.net | MSN Chat: best@bestplace.net | SKYPE:
> bestplace |
> Web: bestplace.biz | Web: seo-diy.com
>
>
>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>



  Réponse avec citation
Vieux 12/03/2008, 20h45   #9
petersprc
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Comparing files

That would work fine.

If you happen to be on UNIX, cmp -s is another possibility:

// Returns 0 if the files are identical or
// 1 if the files differ.

function cmpFiles($a, $b)
{
$cmd = 'cmp -s ' . escapeshellarg($a) . ' ' .
escapeshellarg($b) . ' 2>&1';
exec($cmd, $output, $exitCode);
if ($exitCode != 0 && $exitCode != 1) {
throw new Exception("Command \"$cmd\" failed with exit " .
"code $exitCode: " . join("\n", $output));
}
return $exitCode;
}

Regards,

John Peters

On Mar 12, 7:04 am, mathieu.led...@mobilescope.com ("mathieu leddet")
wrote:
> Hi all,
>
> I have a simple question : how can I ensure that 2 files are identical ?
>
> How about this ?
>
> --------8<------------------------------------------------------
>
> function files_identical($path1, $path2) {
>
> return (file_get_contents($path1) == file_get_contents($path2));
>
> }
>
> --------8<------------------------------------------------------
>
> Note that I would like to compare any type of files (text and binary).
>
> Thanks for any ,
>
> --
> Mathieu


  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 02h37.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,20305 seconds with 17 queries