|
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Does anybody know an easy way to test if a word is singular or plural --
something a bit smarter than just checking if there is an s on the end! Thanks, ~ Mark -- Posted via http://www.ruby-forum.com/. |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Mark Dodwell wrote:
> Does anybody know an easy way to test if a word is singular or plural -- > something a bit smarter than just checking if there is an s on the end! > > Thanks, > > ~ Mark You might be able to use some form of dictionary lookup and that will with words like mice, but it still will not with words like moose where the singular and plural are the same. |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
-------- Original-Nachricht -------- > Datum: Sat, 24 May 2008 07:45:01 +0900 > Von: "Michael W. Ryder" <_mwryder@worldnet.att.net> > An: ruby-talk@ruby-lang.org > Betreff: Re: #plural? or #singular? > Mark Dodwell wrote: > > Does anybody know an easy way to test if a word is singular or plural -- > > something a bit smarter than just checking if there is an s on the end! > > > > Thanks, > > > > ~ Mark > > You might be able to use some form of dictionary lookup and that will > with words like mice, but it still will not with words like > moose where the singular and plural are the same. Dear Mark, for the simpler task, where there are different forms for singular and plural (eg., mouse-mice, house-houses), you could use this: http://api.rubyonrails.org/classes/Inflector.html For the more difficult cases, where singular and plural forms coincide (and for the easier cases as well), a part-of-speech tagger can be ful. I don't know of any written in Ruby, but I can recommend the tree-tagger, which you can script from Ruby to suit your needs. It is available for several languages, so you can find irregular plurals of words in different languages .... It is here : http://www.ims.uni-stuttgart.de/proj...ex/TreeTagger/ Best regards, Axel -- Super-Acktion nur in der GMX Spieleflat: 10 Tage für 1 Euro. Über 180 Spiele downloaden und spiele: http://flat.games.gmx.de |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
On May 23, 6:07=A0pm, Mark Dodwell <s...@mkdynamic.co.uk> wrote: > Does anybody know an easy way to test if a word is singular or plural -- > something a bit smarter than just checking if there is an s on the end! English gem may . If you devise #plural? and #singular? I'd be happy to add them to the API. T. |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
http://www.deveiate.org/projects/Linguistics/ might be also useful to you. -- Posted via http://www.ruby-forum.com/. |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
There are lots of difficulties here.
Is "sheep" singular or plural? Is "fish" singular or plural? Is "the government" singular or plural? Is "England" singular or plural? (England is a country / England are bound to lose the match) Is "English" singular or plural? (English is a language / The English are eccentric) So even an exhaustive list of words is not going to give you the right answer all the time. You need to take the word in context, i.e. you need to parse the sentence grammatically. Here There Be Dragons. -- Posted via http://www.ruby-forum.com/. |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
On Sat, May 24, 2008 at 5:08 PM, Dave Bass <davebass@musician.org> wrote:
Even worse sometimes it is undefined I guess, or caption may play a role. I can see data. Maybe some native speakers will tell me that this is not a correct sentence, I do not know, but than there is I can see Data. Languages (plural) are just a big mess (singular) ![]() Robert -- http://ruby-smalltalk.blogspot.com/ --- Whereof one cannot speak, thereof one must be silent. Ludwig Wittgenstein |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
-------- Original-Nachricht -------- > Datum: Sun, 25 May 2008 00:27:48 +0900 > Von: "Robert Dober" <robert.dober@gmail.com> > An: ruby-talk@ruby-lang.org > Betreff: Re: #plural? or #singular? > On Sat, May 24, 2008 at 5:08 PM, Dave Bass <davebass@musician.org> wrote: > Even worse sometimes it is undefined I guess, or caption may play a role. > > I can see data. > > Maybe some native speakers will tell me that this is not a correct > sentence, I do not know, but than there is > > I can see Data. > > Languages (plural) are just a big mess (singular) ![]() > > Robert > -- > http://ruby-smalltalk.blogspot.com/ > > --- > Whereof one cannot speak, thereof one must be silent. > Ludwig Wittgenstein Dear Robert and Dave, well, this is what tree-tagger (see tags output below, for the tagset see my previous post) says: I can see data. (noun plural) I can see Data. (proper noun singular) England is a country. (proper noun singular) England are bound to lose the match. (proper noun singular) (nobody is perfect). English is a language. (proper noun singular) The English are eccentric. (noun plural) Languages (noun plural) are just a big mess (noun singular). Parts-of-speech tagging uses a Bayesian decision model, requiring training on a set of human-tagged text. There are large amounts of texts available for many languages, such as newspaper articles. The authors of tree-taggers claim about 96 % correct tagging somewhere in the docs ( can't find it right now). It's also fast - you can tag an entire novel in just a few seconds - and it's available for several major languages, not just English. Best regards, Axel ----------------------------------- I PP I can MD can see VV see data NNS datum SENT . I PP I can MD can see VV see Data NP Data SENT . England NP England is VBZ be a DT a country NN country SENT . England NP England are VBP be bound VVN bind to TO to lose VV lose the DT the match NN match SENT . English NP English is VBZ be a DT a language NN language SENT . The DT the English NNS English are VBP be eccentric JJ eccentric SENT . Languages NNS language are VBP be just RB just a DT a big JJ big mess NN mess SENT . -- Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten Browser-Versionen downloaden: http://www.gmx.net/de/go/browser |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
On May 24, 2008, at 9:14 AM, Axel Etzold wrote: > well, this is what tree-tagger (see tags output below, for the tagset > see my previous post) says: > England are bound to lose the match. (proper noun singular) (nobody > is perfect). The collective noun in American English is singular, while in British English the collective noun is plural. In American English, we would say "England is bound to lose the match," so your results are correct, if the language under consideration is American English. (Although I'm not sure what to make of the plural verb.) > Parts-of-speech tagging uses a Bayesian decision model, requiring > training on a set of human-tagged text. Did you train tree-tagger on a data set of American English? Ray |
|
|
|
#10 |
|
Messages: n/a
Hébergeur: |
>
> I can see data. (noun plural) > I can see Data. (proper noun singular) > England is a country. (proper noun singular) > England are bound to lose the match. (proper noun singular) (nobody is perfect). > English is a language. (proper noun singular) > The English are eccentric. (noun plural) > Languages (noun plural) are just a big mess (noun singular). Impressive, I have to admit ![]() > > Parts-of-speech tagging uses a Bayesian decision model, requiring > training on a set of human-tagged text. > There are large amounts of texts available for many languages, such > as newspaper articles. > The authors of tree-taggers claim about 96 % correct tagging somewhere > in the docs ( can't find it right now). > It's also fast - you can tag an entire novel in just a few seconds - > and it's available for several major languages, not just English. Even more so !!! Thanx for sharing R. |
|
|
|
#11 |
|
Messages: n/a
Hébergeur: |
On Sat, May 24, 2008 at 11:14 AM, Axel Etzold <AEtzold@gmx.de> wrote:
> > > -------- Original-Nachricht -------- >> Datum: Sun, 25 May 2008 00:27:48 +0900 >> Von: "Robert Dober" <robert.dober@gmail.com> >> An: ruby-talk@ruby-lang.org >> Betreff: Re: #plural? or #singular? > >> On Sat, May 24, 2008 at 5:08 PM, Dave Bass <davebass@musician.org> wrote: >> Even worse sometimes it is undefined I guess, or caption may play a role. >> >> I can see data. >> >> Maybe some native speakers will tell me that this is not a correct >> sentence, I do not know, but than there is >> >> I can see Data. >> >> Languages (plural) are just a big mess (singular) ![]() >> >> Robert >> -- >> http://ruby-smalltalk.blogspot.com/ >> >> --- >> Whereof one cannot speak, thereof one must be silent. >> Ludwig Wittgenstein > > Dear Robert and Dave, > > well, this is what tree-tagger (see tags output below, for the tagset > see my previous post) says: > > I can see data. (noun plural) > I can see Data. (proper noun singular) > England is a country. (proper noun singular) > England are bound to lose the match. (proper noun singular) (nobody is perfect). > English is a language. (proper noun singular) > The English are eccentric. (noun plural) > Languages (noun plural) are just a big mess (noun singular). You will always have problems with collective nouns (brood, flock, pride, etc), especially if you train yourself on languages that aren't spoken. > > Parts-of-speech tagging uses a Bayesian decision model, requiring > training on a set of human-tagged text. > There are large amounts of texts available for many languages, such > as newspaper articles. > The authors of tree-taggers claim about 96 % correct tagging somewhere > in the docs ( can't find it right now). > It's also fast - you can tag an entire novel in just a few seconds - > and it's available for several major languages, not just English. I think many people balk at your question because you didn't specify the terms of the problem. What language? What vernacular? What venue? cheerio (plural), Todd |
|
|
|
#12 |
|
Messages: n/a
Hébergeur: |
> > > England are bound to lose the match. (proper noun singular) (nobody > > is perfect). > > The collective noun in American English is singular, while in British > English the collective noun is plural. In American English, we would > say "England is bound to lose the match," so your results are correct, > if the language under consideration is American English. (Although I'm > not sure what to make of the plural verb.) > > > Parts-of-speech tagging uses a Bayesian decision model, requiring > > training on a set of human-tagged text. > > Did you train tree-tagger on a data set of American English? Dear Ray, I didn't know about that difference in collective noun to singular or plural mapping in American and British English. I gather from the docs that the training of treetagger was done by the authors on the Wall Street Journal and some other American English sources. I am myself not a native English speaker. So, being easily impressible as a continental European from Germany, at some point in time, I was sent to an English school in south-west England (it's called a grammar school, even though they teach many subjects and mostly to English people), where I was taught that a) Speaking "proper English" is of paramount importance (see the musical "My fair Lady"). b) Proper English is spoken only in England. c) Americans don't use English at all - don't believe them if they claim they do. (See the musical "My fair Lady", song: "Why can't the English?", "... well in America, they haven't used it [English] for years"). further a), b) and c) are true because d) The terms "proper English" and "Queen's English" can be used interchangeably. e) Americans have continuously failed to come up with a Queen - Jackie Kennedy or a future female President are no acceptable substitutes. f) Admitting anything else would harm or destroy the very profitable language industry in England. It seems I am still somewhat under the influence of that .... ![]() Best regards, Axel -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger |
|
|
|
#13 |
|
Messages: n/a
Hébergeur: |
A really cheap way to do this with ActiveSupport would be to do
something like this: class String def singular? self.singularize == self end def plural? self.pluralize == self end end In the console, it looks like this: >> "things".plural? => true >> "things".singular? => false >> "sheep".plural? => true >> "sheep".singular? => true I don't know if that's the best solution, but it works. ![]() --Jeremy On Fri, May 23, 2008 at 6:07 PM, Mark Dodwell <seo@mkdynamic.co.uk> wrote: > Does anybody know an easy way to test if a word is singular or plural -- > something a bit smarter than just checking if there is an s on the end! > > Thanks, > > ~ Mark > -- > Posted via http://www.ruby-forum.com/. > > -- http://jeremymcanally.com/ http://entp.com Read my books: Ruby in Practice (http://manning.com/mcanally/) My free Ruby e-book (http://humblelittlerubybook.com/) Or, my blogs: http://mrneighborly.com http://rubyinpractice.com |
|
|
|
#14 |
|
Messages: n/a
Hébergeur: |
-------- Original-Nachricht -------- > Datum: Sun, 25 May 2008 03:14:18 +0900 > Von: "Todd Benson" <caduceass@gmail.com> > An: ruby-talk@ruby-lang.org > Betreff: Re: #plural? or #singular? > On Sat, May 24, 2008 at 11:14 AM, Axel Etzold <AEtzold@gmx.de> wrote: > > > > > > -------- Original-Nachricht -------- > >> Datum: Sun, 25 May 2008 00:27:48 +0900 > >> Von: "Robert Dober" <robert.dober@gmail.com> > >> An: ruby-talk@ruby-lang.org > >> Betreff: Re: #plural? or #singular? > > > >> On Sat, May 24, 2008 at 5:08 PM, Dave Bass <davebass@musician.org> > wrote: > >> Even worse sometimes it is undefined I guess, or caption may play a > role. > >> > >> I can see data. > >> > >> Maybe some native speakers will tell me that this is not a correct > >> sentence, I do not know, but than there is > >> > >> I can see Data. > >> > >> Languages (plural) are just a big mess (singular) ![]() > >> > >> Robert > >> -- > >> http://ruby-smalltalk.blogspot.com/ > >> > >> --- > >> Whereof one cannot speak, thereof one must be silent. > >> Ludwig Wittgenstein > > > > Dear Robert and Dave, > > > > well, this is what tree-tagger (see tags output below, for the tagset > > see my previous post) says: > > > > I can see data. (noun plural) > > I can see Data. (proper noun singular) > > England is a country. (proper noun singular) > > England are bound to lose the match. (proper noun singular) (nobody is > perfect). > > English is a language. (proper noun singular) > > The English are eccentric. (noun plural) > > Languages (noun plural) are just a big mess (noun singular). > > You will always have problems with collective nouns (brood, flock, > pride, etc), especially if you train yourself on languages that aren't > spoken. > > > > > Parts-of-speech tagging uses a Bayesian decision model, requiring > > training on a set of human-tagged text. > > There are large amounts of texts available for many languages, such > > as newspaper articles. > > The authors of tree-taggers claim about 96 % correct tagging somewhere > > in the docs ( can't find it right now). > > It's also fast - you can tag an entire novel in just a few seconds - > > and it's available for several major languages, not just English. > > I think many people balk at your question because you didn't specify > the terms of the problem. What language? What vernacular? What > venue? > > cheerio (plural), > Todd Dear Todd, well, I didn't start the thread ... so I don't have to specify the problem. The OP wanted to decide whether a given noun is singular or plural. As I see it, in English, nouns can be grouped into four groups: 1) Those that form a plural by adding an 's' : eg., house -> houses 2) Those that don't belong to the first group and have different forms for singular and plural : eg., man -> men, mouse->mice 3) Those that don't belong to the first two groups, because singular and plural forms both exist and coincide (eg. moose->moose) 4) Those that don't belong to the previous groups, as they don't have two forms, because they describe some collective (eg. police (at least in British English)) or something uncountable (eg. pride). The first two groups and the last can be dealt with by a program that generates a plural from a singular (ie., the linguistics gem). Especially due to the group 3 nouns, a program that 'pluralizes' a given noun doesn't answer the OP's question, because it cannot decide (from the missing information of the circumstances) whether a given noun is singular or plural. Dave and Robert gave several examples for this. My point is that there exists a type of software - parts-of-speech taggers - that can resolve these questions from circumstance information - not always correctly, as it's a computer program relying on probabilities, but remarkably well. I didn't understand your point about languages that aren't spoken ... if you had a Latin text, say, (there's a large collection available on project Gutenberg), and you manually tagged a part of it, to let a Bayesian classification program learn probabilities, it would be able to identify the parts-of-speech of another Latin text, e.g., identify plural nouns in it in Latin (that's certainly much easier than in English, as there's hardly anything in the group 3 for Latin - I'd bet you'd find a nice little list of words printed in fat in every grammar (oh, please remember - hand is 'manus' and 'hands' is also 'manus'). > What language? What vernacular? What > venue? I assume that the OP is talking about some standard written form of a language, like standard English, French, German, etc .. Now, you get ready-made taggers on the net for some of these languages, so your computer can say, this Italian word is a plural noun, even if you don't know any Italian. If you wanted to identify plural nouns from singular ones in Turkish, you could still use eg. treetagger for that, but you have to get a Turkish text tagged manually first to teach the program the probabilities that a given word form is a plural or a singular - it pays to have a native-language Turk to do that. ![]() For those language that there are ready-made solutions offered, somebody has already taken a large amount of typical texts (novels, newspaper articles, poems etc.), tagged them manually and provided parameter files for download, so no training from the user's part is necessary anymore. Best regards, Axel -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer |
|
|
|
#15 |
|
Messages: n/a
Hébergeur: |
On Sat, May 24, 2008 at 5:35 PM, Ray Baxter <ray.baxter@gmail.com> wrote:
> The collective noun in American English is singular, while in British > English the collective noun is plural. While this is completely off-topic, I feel impelled to correct your assumption here: there is no such prescription in so-called 'British English'[1]. I quite regularly hear people using both the singular and the plural referring to the same collective noun even in the same breath, e.g. "The government's in disarray. They're going to have a tough time recovering from this defeat". Regards, Sean [1] Here, in Britain we tend to think that what we speak is yer actual original English, so it doesn't require qualification ![]() |
|
|
|
#16 |
|
Messages: n/a
Hébergeur: |
On May 25, 2008, at 6:18 AM, Sean O'Halpin wrote:
> On Sat, May 24, 2008 at 5:35 PM, Ray Baxter <ray.baxter@gmail.com> > wrote: > >> The collective noun in American English is singular, while in British >> English the collective noun is plural. > > While this is completely off-topic, I feel impelled to correct your > assumption here: there is no such prescription in so-called 'British > English'[1]. I quite regularly hear people using both the singular and > the plural referring to the same collective noun even in the same > breath, e.g. "The government's in disarray. They're going to have a > tough time recovering from this defeat". My favourite (sic) is the British use of the possessive for no reason: "We're going to Tesco's". ![]() ///ark |
|
|
|
#17 |
|
Messages: n/a
Hébergeur: |
>> The collective noun in American English is singular, while in British
>> English the collective noun is plural. > > While this is completely off-topic, I feel impelled to correct your > assumption here: there is no such prescription in so-called 'British > English'[1]. I quite regularly hear people using both the singular and > the plural referring to the same collective noun even in the same > breath, e.g. "The government's in disarray. They're going to have a > tough time recovering from this defeat". "The Who are on tour again." Brit style works in American English, but only in terms of groups of artists. |
|
|
|
#18 |
|
Messages: n/a
Hébergeur: |
Mark Wilden wrote:
> My favourite (sic) is the British use of the possessive for no reason: > "We're going to Tesco's". ![]() I think you'll find it's short for "We're going to Tesco's [store]". Similar in principle to "We met up at Fred's [house]" or "Homer is often to be found in Moe's [bar]". A fairly recent usage is the possessive pronoun for a similar purpose: "After dinner we went back to hers" meaning "...back to her place". -- Posted via http://www.ruby-forum.com/. |
|
|
|
#19 |
|
Messages: n/a
Hébergeur: |
On Sun, May 25, 2008 at 6:28 PM, Dave Bass <davebass@musician.org> wrote:
> Mark Wilden wrote: >> My favourite (sic) is the British use of the possessive for no reason: >> "We're going to Tesco's". ![]() > > I think you'll find it's short for "We're going to Tesco's [store]". And if I might add, I guess this too will be used frequently: We're going to Tesco's because *they* have some new -- dunoo what they are selling tho ![]() Which, if applied often in the training text might become some strange plural form, but I am really imagining here ![]() But in reality we are not talking plural or singular form anymore we are right into the miraculous wonders of languages where social context mixes with syntax, semantics and meaning. Cheers Robert http://ruby-smalltalk.blogspot.com/ --- Whereof one cannot speak, thereof one must be silent. Ludwig Wittgenstein |
|
|
|
#20 |
|
Messages: n/a
Hébergeur: |
On Sat, May 24, 2008 at 5:34 PM, Axel Etzold <AEtzold@gmx.de> wrote:
>> I think many people balk at your question because you didn't specify >> the terms of the problem. What language? What vernacular? What >> venue? >> >> cheerio (plural), >> Todd > > Dear Todd, > > well, I didn't start the thread ... so I don't have to specify the problem. > The OP wanted to decide whether a given noun is singular or plural. I was talking to the OP, but I guess I didn't say that outright. > > As I see it, in English, nouns can be grouped into four groups: > > 1) Those that form a plural by adding an 's' : eg., house -> houses > 2) Those that don't belong to the first group and have different forms > for singular and plural : eg., man -> men, mouse->mice > 3) Those that don't belong to the first two groups, because singular and > plural forms both exist and coincide (eg. moose->moose) > 4) Those that don't belong to the previous groups, as they don't have two > forms, because they describe some collective (eg. police (at least in British English)) or something uncountable (eg. pride). > > The first two groups and the last can be dealt with by a program > that generates a plural from a singular (ie., the linguistics gem). > Especially due to the group 3 nouns, a program that 'pluralizes' > a given noun doesn't answer the OP's question, because it cannot decide > (from the missing information of the circumstances) whether a given noun is singular or plural. > Dave and Robert gave several examples for this. > My point is that there exists a type of software - parts-of-speech taggers - that can resolve these questions from circumstance information - not always correctly, as it's a computer program relying on probabilities, but remarkably well. > > I didn't understand your point about languages that aren't spoken ... > if you had a Latin text, say, (there's a large collection available > on project Gutenberg), and you manually tagged a part of it, to let > a Bayesian classification program learn probabilities, it would be able to identify the parts-of-speech of another Latin text, e.g., identify plural nouns in it in Latin (that's certainly much easier than in English, as there's hardly anything in the group 3 for Latin - I'd bet you'd find a nice little list of words printed in fat in every grammar (oh, please remember - hand is 'manus' and 'hands' is also 'manus'). > > >> What language? What vernacular? What >> venue? > > I assume that the OP is talking about some standard written form > of a language, like standard English, French, German, etc .. Hmm. Most people unconsciously change their use of communication by location or the company they are in. > > Now, you get ready-made taggers on the net for some > of these languages, so your computer can say, this Italian word is a plural > noun, even if you don't know any Italian. > If you wanted to identify plural nouns from singular ones in Turkish, you could still use eg. treetagger for that, but you have to get a Turkish text tagged manually first to teach the program the probabilities that a given > word form is a plural or a singular - it pays to have a native-language Turk to do that. ![]() > For those language that there are ready-made solutions offered, somebody > has already taken a large amount of typical texts (novels, newspaper > articles, poems etc.), tagged them manually and provided parameter > files for download, so no training from the user's part is necessary anymore. > > Best regards, > > Axel I pretty much agree with you, but I still think the side cases pop up more frequently than we think. With the non-spoken languages point, I meant things like symbology, programming languages, formal logic, and the like. "Plural" may take on a different meaning. |
|
|
|
#21 |
|
Messages: n/a
Hébergeur: |
On May 25, 2008, at 11:24 AM, Robert Dober wrote:
> On Sun, May 25, 2008 at 6:28 PM, Dave Bass <davebass@musician.org> > wrote: >> Mark Wilden wrote: >>> My favourite (sic) is the British use of the possessive for no >>> reason: >>> "We're going to Tesco's". ![]() >> >> I think you'll find it's short for "We're going to Tesco's [store]". > And if I might add, I guess this too will be used frequently: > We're going to Tesco's because *they* have some new -- dunoo what they > are selling tho ![]() Well, OK, but it seems to me the same as saying "We're going to England's." I mean, when you think about it, any possessive -could- have an implied noun. I think the usage simply arises from the fact that many stores do have possessive names, so it feels "natural." ///ark |
|
|
|
#22 |
|
Messages: n/a
Hébergeur: |
> possessive names, so it feels "natural."
I guess this sentence is a very precise way to express the complexity of the problem, it is about feelings ![]() Cheers Robert -- http://ruby-smalltalk.blogspot.com/ --- Whereof one cannot speak, thereof one must be silent. Ludwig Wittgenstein |
|
![]() |
| Outils de la discussion | |
|
|