PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Forums Hébergement > Forum Serveur - Sécurité et techniques > comp.unix.shell > Column mapping
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
comp.unix.shell Using and programming the Unix shell.

Column mapping

Réponse
 
LinkBack Outils de la discussion
Vieux 02/05/2008, 15h59   #1
Sashi
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Column mapping

Hi all, I have two files thus:
File 1:

Col_A Col_B
a1 b1
a2 b2
.....


File 2:

Col_A Col_B
a4
a2
a7
a32
.......

So in the second file, Col_B is empty.

How do I output the corresponding second column for the second file
based on the mapping in the first file?

I think I can do something like this:

for line in $(cat f2)
do echo -n $line; echo -n '\t'; grep $line f1 | awk '{print $2;}'
done

Looks a little ugly and is probably not a very smart thing to do.
Any better suggestions?

Thanks,
Sashi
  Réponse avec citation
Vieux 02/05/2008, 16h12   #2
Janis Papanagnou
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Column mapping

Sashi wrote:
> Hi all, I have two files thus:
> File 1:
>
> Col_A Col_B
> a1 b1
> a2 b2
> ....
>
>
> File 2:
>
> Col_A Col_B
> a4
> a2
> a7
> a32
> ......
>
> So in the second file, Col_B is empty.
>
> How do I output the corresponding second column for the second file
> based on the mapping in the first file?
>
> I think I can do something like this:
>
> for line in $(cat f2)
> do echo -n $line; echo -n '\t'; grep $line f1 | awk '{print $2;}'
> done


(Looks very inefficient.)

>
> Looks a little ugly and is probably not a very smart thing to do.
> Any better suggestions?


Modulo formatting something like...

awk 'BEGIN{OFS="\t"} NR==FNR{m[$1]=$2;next} NF==1{$2=m[$1]} 1' File1 File2


Janis

>
> Thanks,
> Sashi

  Réponse avec citation
Vieux 03/05/2008, 17h51   #3
Dan Stromberg
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Column mapping

On Fri, 02 May 2008 07:59:53 -0700, Sashi wrote:

> Hi all, I have two files thus:
> File 1:
>
> Col_A Col_B
> a1 b1
> a2 b2
> ....
>
>
> File 2:
>
> Col_A Col_B
> a4
> a2
> a7
> a32
> ......
>
> So in the second file, Col_B is empty.
>
> How do I output the corresponding second column for the second file
> based on the mapping in the first file?
>
> I think I can do something like this:
>
> for line in $(cat f2)
> do echo -n $line; echo -n '\t'; grep $line f1 | awk '{print $2;}' done
>
> Looks a little ugly and is probably not a very smart thing to do. Any
> better suggestions?
>
> Thanks,
> Sashi


Isn't this what join(1) is for?

  Réponse avec citation
Vieux 03/05/2008, 18h26   #4
Dave B
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Column mapping

On Saturday 3 May 2008 18:51, Dan Stromberg wrote:

> Isn't this what join(1) is for?


AFAIK join operates on sorted files, which doesn't seem to be the case here.

--
D.
  Réponse avec citation
Vieux 04/05/2008, 03h15   #5
Dan Stromberg
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Column mapping

On Sat, 03 May 2008 19:26:04 +0200, Dave B wrote:

> On Saturday 3 May 2008 18:51, Dan Stromberg wrote:
>
>> Isn't this what join(1) is for?

>
> AFAIK join operates on sorted files, which doesn't seem to be the case
> here.


It seems to me either you accept the nlogn sort, or you end up with an
n^2 pair of nested loops, like what the OP had.

  Réponse avec citation
Vieux 04/05/2008, 09h08   #6
Dave B
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Column mapping

On Sunday 4 May 2008 04:15, Dan Stromberg wrote:

> On Sat, 03 May 2008 19:26:04 +0200, Dave B wrote:
>
>> On Saturday 3 May 2008 18:51, Dan Stromberg wrote:
>>
>>> Isn't this what join(1) is for?

>>
>> AFAIK join operates on sorted files, which doesn't seem to be the case
>> here.

>
> It seems to me either you accept the nlogn sort, or you end up with an
> n^2 pair of nested loops, like what the OP had.


Not if you use awk. With awk, you have a linear solution on the number of
input records (assuming awk's hash lookups are efficient).

--
D.
  Réponse avec citation
Vieux 04/05/2008, 09h32   #7
Dave B
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Column mapping

On Sunday 4 May 2008 04:15, Dan Stromberg wrote:

> On Sat, 03 May 2008 19:26:04 +0200, Dave B wrote:
>
>> On Saturday 3 May 2008 18:51, Dan Stromberg wrote:
>>
>>> Isn't this what join(1) is for?

>>
>> AFAIK join operates on sorted files, which doesn't seem to be the case
>> here.

>
> It seems to me either you accept the nlogn sort, or you end up with an
> n^2 pair of nested loops, like what the OP had.


Not if you use awk. With awk, you have a linear solution on the number of
input records (assuming awk's hash lookups are efficient).
And, furthermore, I think it cannot be assumed that the files can be
reordered at will.

--
D.
  Réponse avec citation
Vieux 04/05/2008, 09h42   #8
Dave B
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Column mapping

On Sunday 4 May 2008 10:32, Dave B wrote:

>> It seems to me either you accept the nlogn sort, or you end up with an
>> n^2 pair of nested loops, like what the OP had.

>
> Not if you use awk. With awk, you have a linear solution on the number of
> input records (assuming awk's hash lookups are efficient).
> And, furthermore, I think it cannot be assumed that the files can be
> reordered at will.


Actually, there is another reason against sorting here: the OP showed the
input files with a heading line (Col_A Col_B). If that line is really
part of the input files and must be preserved, then sorting becomes
problematic (not that it can't be done of course, but not in a simple way).

--
D.
  Réponse avec citation
Vieux 04/05/2008, 12h26   #9
Janis Papanagnou
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Column mapping

Dave B wrote:
> On Sunday 4 May 2008 04:15, Dan Stromberg wrote:
>
>
>>On Sat, 03 May 2008 19:26:04 +0200, Dave B wrote:
>>
>>
>>>On Saturday 3 May 2008 18:51, Dan Stromberg wrote:
>>>
>>>
>>>>Isn't this what join(1) is for?
>>>
>>>AFAIK join operates on sorted files, which doesn't seem to be the case
>>>here.

>>
>>It seems to me either you accept the nlogn sort, or you end up with an
>>n^2 pair of nested loops, like what the OP had.

>
>
> Not if you use awk. With awk, you have a linear solution on the number of
> input records (assuming awk's hash lookups are efficient).


Are you sure awk use hash lookups? Occasionally done efficiency
analysis makes me suspect that a "standard" awk will use a tree
representation for associative arrays. The consequence would be
a O(N logN) complexity for accessing every element individually.

A full traversal, OTOH, could at least in theory be done in O(N),
even with a tree representation when additional measures would
be taken.

I'd be interested whether anyone knows about the implementation
of associative arrays in the various existing awks. (The POSIX
document seems not to specify any complexity requirements.)

Janis

> And, furthermore, I think it cannot be assumed that the files can be
> reordered at will.
>

  Réponse avec citation
Vieux 04/05/2008, 18h35   #10
Dave B
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Column mapping

On Sunday 4 May 2008 13:26, Janis Papanagnou wrote:

>> Not if you use awk. With awk, you have a linear solution on the number of
>> input records (assuming awk's hash lookups are efficient).

>
> Are you sure awk use hash lookups? Occasionally done efficiency
> analysis makes me suspect that a "standard" awk will use a tree
> representation for associative arrays. The consequence would be
> a O(N logN) complexity for accessing every element individually.


Not sure, but I always thought that that would be a reasonable
implementation of associative arrays (or maps). Now that you mention trees,
however, I'm not so sure anymore, since it's true that implementing maps
with trees has its advantages too (see eg
http://en.wikipedia.org/wiki/Associative_array for a not-too-technical
discussion).

> A full traversal, OTOH, could at least in theory be done in O(N),
> even with a tree representation when additional measures would
> be taken.
>
> I'd be interested whether anyone knows about the implementation
> of associative arrays in the various existing awks. (The POSIX
> document seems not to specify any complexity requirements.)


FWIW, GNU awk (the only awk I have access to the sources ATM) seems to use
hash tables.
According to http://www.gnu.org/manual/gawk/html_node/internals.html,
lookups are performed using the assoc_lookup() function. Looking at the
sources (gawk 3.1.5), the function is in array.c, lines 483-562. The hash
function itself is at lines 331-414 in the same file.

--
D.
  Réponse avec citation
Vieux 04/05/2008, 18h41   #11
Dave B
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Column mapping

On Sunday 4 May 2008 19:35, Dave B wrote:

> According to http://www.gnu.org/manual/gawk/html_node/internals.html,


Sorry, the link is

http://www.gnu.org/manual/gawk/html_node/Internals.html

(uppercase I in "Internals.html").

--
D.
  Réponse avec citation
Vieux 04/05/2008, 19h36   #12
Stephane CHAZELAS
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Column mapping

2008-05-04, 19:35(+02), Dave B:
[...]
> FWIW, GNU awk (the only awk I have access to the sources ATM) seems to use
> hash tables.
> According to http://www.gnu.org/manual/gawk/html_node/internals.html,
> lookups are performed using the assoc_lookup() function. Looking at the
> sources (gawk 3.1.5), the function is in array.c, lines 483-562. The hash
> function itself is at lines 331-414 in the same file.

[..]

From reading the code in the heirloom toolchest, looks like the
original awk from 1979 (and nawk) by A, W and K also used hash
tables.

So does mawk.

--
Stéphane
  Réponse avec citation
Vieux 04/05/2008, 22h20   #13
Janis Papanagnou
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Column mapping

Stephane CHAZELAS wrote:
> 2008-05-04, 19:35(+02), Dave B:
> [...]
>
>>FWIW, GNU awk (the only awk I have access to the sources ATM) seems to use
>>hash tables.
>>According to http://www.gnu.org/manual/gawk/html_node/internals.html,
>>lookups are performed using the assoc_lookup() function. Looking at the
>>sources (gawk 3.1.5), the function is in array.c, lines 483-562. The hash
>>function itself is at lines 331-414 in the same file.

>
> [..]
>
> From reading the code in the heirloom toolchest, looks like the
> original awk from 1979 (and nawk) by A, W and K also used hash
> tables.
>
> So does mawk.
>


And some brief timing tests with gawk seem to confirm linear complexity.

Janis
  Réponse avec citation
Vieux 05/05/2008, 01h47   #14
Dan Stromberg
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Column mapping

On Sun, 04 May 2008 10:32:33 +0200, Dave B wrote:

> On Sunday 4 May 2008 04:15, Dan Stromberg wrote:
>
>> On Sat, 03 May 2008 19:26:04 +0200, Dave B wrote:
>>
>>> On Saturday 3 May 2008 18:51, Dan Stromberg wrote:
>>>
>>>> Isn't this what join(1) is for?
>>>
>>> AFAIK join operates on sorted files, which doesn't seem to be the case
>>> here.

>>
>> It seems to me either you accept the nlogn sort, or you end up with an
>> n^2 pair of nested loops, like what the OP had.

>
> Not if you use awk. With awk, you have a linear solution on the number
> of input records (assuming awk's hash lookups are efficient). And,
> furthermore, I think it cannot be assumed that the files can be
> reordered at will.


Whether awk uses hashing or not, I realized shortly after I posted that,
and shortly before I canceled the post, that hashing would be an
interesting 3rd kind of solution.

  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 06h59.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,17467 seconds with 22 queries