Discussion: Slow Regex Code
Afficher un message
Vieux 10/06/2008, 13h39   #10
Mirco Wahab
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut boost::regex - open ranges a no no? was: Slow Regex Code

Mirco Wahab wrote:

I modified the expression:

> ...
> boost::regex reg("\\b\\d{2,}\\b");
> ...


to:
...
boost::regex reg("\\b\\d\\d+\\b");
...

with tremendeous improvements:

> [Windows XP-32, Athlon-64/3200+,@2290MHz]
> - Visual Studio 2008 + Boost 1.35.0 9.3 sec
> - Perl 5.10 (Active-) 10.4 sec


[Windows XP(32bit), Athlon-64/3200+ @2290MHz]
Visual Studio 2008 + Boost 1.35.0 1.8 sec
Perl 5.10.003 (AP, use64bitint=undef) 9.5 sec

> [Linux 2.6.23, Pentium4,@2660MHz]
> - gcc 4.3, -O2, Boost 1.33.1 13.2 sec
> - Perl 5.8.8 8.2 sec


[Linux 2.6.23(32bit), Pentium4/NW @2660MHz]
gcc 4.3.1 -O2, Boost 1.33.1 1.2 sec (user)
Perl 5.8.8 (32bit, use64bitint=undef) 6.2 sec (user)

> [Linux 2.6.23, Core2/Q6600,@3240MHz]
> - gcc 4.3, -O2, Boost 1.33.1 6.3 sec
> - Perl 5.8.8 (i586, use64bitint=undef) 3.2 sec


[Linux 2.6.23(32bit), Core2/Q6600,@3240MHz]
gcc 4.3.1 -O2, Boost 1.33.1 0.55sec (user)
Perl 5.8.8 (32bit, use64bitint=undef) 2.4 sec (user)

> [Linux 2.6.24, Core2/Q9300,@3338MHz]
> - gcc 4.3, -O2, Boost 1.34.1 'std::runtime_error' (??)
> - Perl 5.10 (i586, use64bitint=undef) 10.4 sec


[Linux 2.6.25(32bit), Core2/Q9300,@3338MHz]
gcc 4.3.1, -O3, Boost 1.34.1 0.42sec (user)[*]
Perl 5.10.0 (32bit, use64bitint=undef) 4.0 sec (user)
[*] => after kernel update & gcc update,
g++ -O3 -c boostrg.cxx -o boostrg.o
works now


modified Code, C++:
==>
#include <boost/regex.hpp>
#include <fstream>
#include <iostream>


int number_count(const char *block, unsigned int len)
{
boost::match_flag_type flags = boost::match_default;
boost::regex reg("\\b\\d\\d+\\b");
boost::cmatch what;

const char *from = block, *to = block+len;
int n = 0;
while( boost::regex_search(from, to, what, reg, flags) ) {
from = what[0].second;
++n;
}
return n;
}

int main ()
{
std::ifstream in("nietzsche8.txt"); // this is a 112 MB file,
// it's 8 x the Nietzsche
if(in) { // fulltext in plain ASCII
in.seekg(0, std::ios::end); // get to EOF
unsigned int len = in.tellg(); // read file pointer
in.seekg(0, std::ios::beg); // back to pos 0

char *block = new char [len+1]; // don't be stingy
in.read(block, len); // slurp the file
int n = number_count(block, len); // process data
std::cout << "The text (" << len/1024 << "KB) has "
<< n << " numbers >= 10!" << std::endl;
delete [] block; // play fair
}
return 0;
}
<==

modified Code, Perl:
==>

open my $fh, '<', 'nietzsche8.txt' or die "what? $!";
my $block;
do { local $/; $block = <$fh> };
close $fh;

my $n;
++$n while $block =~ /\b\d\d+\b/g; # process data
print "The text (" . int(length($block)/1024) ."KB) has $n numbers >= 10!\n";

<==


At least for me, a very interesting difference.
Boost::Regex gives Perl a significant margin.

Regards

Mirco
  Réponse avec citation
 
Page generated in 0,06281 seconds with 9 queries