|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#51 |
|
Messages: n/a
Hébergeur: |
Mark McIntyre wrote:
> On Sun, 21 Oct 2007 11:00:37 -0400, in comp.lang.c , Ernie Wright > <erniew@comcast.net> wrote: > >>Mark McIntyre wrote: >> >>> There is no intrinsic reason (other than ludditism) to distrust >>> either search facility. >> >>It's not clear to me what you mean by "instrinsic" here. > > intrinsic as in built in? Well, yeah, but as opposed to what? I couldn't tell if I disagreed with you or not--whether the reasons I offered were ones you characterize as instrinsic. >>I have a couple of reasons for distrusting the search facility for PDFs >>in Adobe Acrobat. On general principle, it's much easier to screw up >>the programming of a search facility for a complex format, > > This falls under the heading of "ludditism"... :-) Or Luddism, even. You say that like it's a bad thing. >> and it's also more likely that a richer encoding will contain more >> errors merely by chance. > > AFAIK the plaintext is generated _from_ the PDF. I doubt its proofread > afterwards, either. Wait, I thought we were talking about the general case. If we're talking solely about ISO C drafts, we don't have to guess about which ones are better for searching. That's testable. >> But more specifically, I have several times seen a PDF search fail >> for items that I can actually see. There are any number of reasons >> this can happen with PDF, none of which apply to plain text. > > While there are different reasons why the plaintext search might fail. There's no symmetry here. There are *more* reasons that a PDF search might fail. What, for example, do you think is going on in this search: http://home.comcast.net/~erniew/images/pdfsearch.gif - Ernie http://home.comcast.net/~erniew |
|
|
|
#52 |
|
Messages: n/a
Hébergeur: |
Richard wrote:
> Ernie Wright <erniew@comcast.net> writes: > >>I have a couple of reasons for distrusting the search facility for PDFs >>in Adobe Acrobat. On general principle, it's much easier to screw up >>the programming of a search facility for a complex format, and it's also >>more likely that a richer encoding will contain more errors merely by > > More spelling errors? No. > Or are you saying the same thing twice? ie its harder to program a > search algorithm for more complex formats? Which is as obvious as > water is wetter than dry sand. Evidently it's *not* that obvious. >>chance. But more specifically, I have several times seen a PDF search >>fail for items that I can actually see. There are any number of reasons >>this can happen with PDF, none of which apply to plain text. > > There is one reason : that the search algorithm is full of errors. Are you saying that's the only possible reason? There are many others. Or are you saying that this is one reason that applies to both PDF and plain text? Clearly there is more than one of those also. I was specifically talking about reasons that apply to PDF and do not apply to plain text. > I have never personally had a PDF search not work. I have. http://home.comcast.net/~erniew/images/pdfsearch.gif - Ernie http://home.comcast.net/~erniew |
|
|
|
#53 |
|
Messages: n/a
Hébergeur: |
On Sat, 20 Oct 2007 18:09:04 -0400, in comp.lang.c , CBFalconer
<cbfalconer@yahoo.com> wrote: >No, you misinterpret my comment. The point is that PDF searches >can only be done by a PDF reader. With text you have a choice, >such as grep, a text editor, or any other piece of text handling >software on your system. Right - so whereas a PDF can only be searched by tool which can read PDFs, a text file can only be searched by a tool which can read text*.... hmm. >So you can suit your search methods to software familiar to you. Why would anyone under the age of 30 be familiar with arcane stuff like grep? I find it hard to find anyone under that age who even knows that computers _have_ a commandline, and for GUI users, double-clicking a file of any sort merely starts the default viewer with whatever search facilities there are. (* okay, I grant you, you could use a disk editor to search a text file - probably, if it was 7-bit ascii, and stored in sequential bytes...) -- Mark McIntyre "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian Kernighan |
|
|
|
#54 |
|
Messages: n/a
Hébergeur: |
On Mon, 22 Oct 2007 05:40:01 +0000 (UTC), in comp.lang.c , Harald van
D)&k <truedfx@gmail.com> wrote: >On Sun, 21 Oct 2007 23:44:24 +0100, Mark McIntyre wrote: >> On Sun, 21 Oct 2007 08:13:50 +0000 (UTC), in comp.lang.c , $)CHarald >> van D)&k <truedfx@gmail.com> wrote: >> >>>The text version of n869 could >>>have contained spellings that cause problems for searching, but doesn't. >> >> You assert. > >I have verified, but you conveniently chose to ignore this repeatedly. Not at all. My point is, unless you choose to abandon all reason you must accept your claim is unprovable except by me doing my own search or by you publishing some evidence that there are no spelling errors in the file. I'm not sure how you could usefully do that. You could I suppose load it into a spell checker, and carefully train it to ignore all the specialist language, diacriticals, punctuation marks etc etc etc. You'd need to get someone to double-check your results, to remove any possibility of you making a minor mistake, or overlooking an error. So I'm betting that what you mean is "I have yet to detect any spelling errors in hte plaintext version" - which is fair enough, but not the same as "there are no errors, period". Meanwhile since i've used the PDF extensively with no issues searching, I must disconcur with your findings regarding that. -- Mark McIntyre "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian Kernighan |
|
|
|
#55 |
|
Messages: n/a
Hébergeur: |
On Mon, 22 Oct 2007 12:25:56 -0400, in comp.lang.c , Ernie Wright
<erniew@comcast.net> wrote: >Mark McIntyre wrote: > >> On Sun, 21 Oct 2007 11:00:37 -0400, in comp.lang.c , Ernie Wright >> <erniew@comcast.net> wrote: >> >>>Mark McIntyre wrote: >>> >>>> There is no intrinsic reason (other than ludditism) to distrust >>>> either search facility. >>> >>>It's not clear to me what you mean by "instrinsic" here. >> >> intrinsic as in built in? > >Well, yeah, but as opposed to what? Sorry, but I pretty much assumed my audience could read english. Not trying to be offensive, but I have no intention of defining what an intrinsic property is. >> This falls under the heading of "ludditism"... :-) > >Or Luddism, even. You say that like it's a bad thing. From Wikipedia: An official announcement, 12th February 1811 "Any person who breaks or destroys machinery in any mill used in the preparing or spinning of wool or cotton or other material for the use of the stocking or lace manufacture, on being lawfully convicted .....shall suffer death." Seems pretty bad to me... >> While there are different reasons why the plaintext search might fail. > >There's no symmetry here. There are *more* reasons that a PDF search >might fail. I disagree. There are *different* reasons. >What, for example, do you think is going on in this search: > > http://home.comcast.net/~erniew/images/pdfsearch.gif *shrug* Probably there's an embedded n-dash sized space in "angle". Plaintext searches can and do fail for similar reasons - I've seen searches fail because of embedded (invisible) characters outside the range 32-127. -- Mark McIntyre "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian Kernighan |
|
|
|
#56 |
|
Messages: n/a
Hébergeur: |
On Mon, 22 Oct 2007 05:38:50 +0000 (UTC), in comp.lang.c , $)CHarald
van D)&k <truedfx@gmail.com> wrote: >The problem with the spellings in this thread was that you can't search >for __func__ and find where it's referenced. (Or more accurately, that >you can't search for __FUNCTION__ to find that it's not referenced.) Perhaps not - but you can search for UNCTI, which is highly likely to be unique. -- Mark McIntyre "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian Kernighan |
|
|
|
#57 |
|
Messages: n/a
Hébergeur: |
Mark McIntyre wrote:
> On Sat, 20 Oct 2007 18:09:04 -0400, in comp.lang.c , CBFalconer > <cbfalconer@yahoo.com> wrote: > >>No, you misinterpret my comment. The point is that PDF searches >>can only be done by a PDF reader. With text you have a choice, >>such as grep, a text editor, or any other piece of text handling >>software on your system. So you can suit your search methods to >>software familiar to you. > > Why would anyone under the age of 30 be familiar with arcane stuff > like grep? People who read C Standards are not likely to be a "Joe" user, notwithstanding their age. <snip> |
|
|
|
#58 |
|
Messages: n/a
Hébergeur: |
CBFalconer <cbfalconer@yahoo.com> writes:
> $)CHarald van D)&k wrote: >> Keith Thompson wrote: >> >>> As far as I can tell, it has not yet been established that, for >>> example, n1256.pdf contains any incorrect spellings. >> >> From n869.txt, from the foreword: >> -- __func__ predefined identifier >> >> From n1256.pdf: >> â € †_ _func_ _ predeï¬ identiï¬ |
|
|
|
#59 |
|
Messages: n/a
Hébergeur: |
Mark McIntyre <markmcintyre@spamcop.net> wrote:
> On Mon, 22 Oct 2007 05:38:50 +0000 (UTC), in comp.lang.c , $)CHarald > van D)&k <truedfx@gmail.com> wrote: > > >The problem with the spellings in this thread was that you can't search > >for __func__ and find where it's referenced. (Or more accurately, that > >you can't search for __FUNCTION__ to find that it's not referenced.) > > Perhaps not - but you can search for UNCTI, which is highly likely to > be unique. Or you could - perhaps? revolutionary idea, I know - check the index. I know that this is the C Standard we're talking about, not the Perl... erm... lack of any standard, but there _is_ more than one way to do it. Richard |
|
|
|
#60 |
|
Messages: n/a
Hébergeur: |
Mark McIntyre wrote:
>>>On Sun, 21 Oct 2007 11:00:37 -0400, in comp.lang.c , Ernie Wright >>><erniew@comcast.net> wrote: >>> >>>>It's not clear to me what you mean by "instrinsic" here. Geez, I just noticed that I typed that extraneous 's' twice in two different posts. > Sorry, but I pretty much assumed my audience could read english. Not > trying to be offensive, but I have no intention of defining what an > intrinsic property is. That's OK. We can come back to this if need be. >> http://home.comcast.net/~erniew/images/pdfsearch.gif > > *shrug* > Probably there's an embedded n-dash sized space in "angle". It's a bug in the Acrobat browser plug-in. Both Acrobat Pro and the plug-in, when it's working correctly, find 80 occurrences of "angle" in the document, including several embedded in words like "triangle." In my experience, this kind of flakiness with PDF isn't that unusual. If the result of a search is important, and I have both PDF and ASCII text versions of a document, and the search result is negative in the PDF version, it doesn't seem unreasonable at all to repeat the search using the ASCII version in a text editor. - Ernie http://home.comcast.net/~erniew |
|
|
|
#61 |
|
Messages: n/a
Hébergeur: |
Ben Bacarisse wrote:
> CBFalconer <cbfalconer@yahoo.com> writes: > .... snip ... > >> The blanks between '_' chars are an effect of the font used. The >> other anomaly is due to the use of some peculiar character to >> represent the sequence 'fi'. So there are no incorrect spelling >> identified, but one more of the penalties of .PDF publication is >> exposed. > > I disagree. From what I can see the "_ _" problem *is* a case of > incorrect spelling (but a well-intentioned one). It seems to have > been put in deliberately to make the double underscore obvious. > The "fi" ligature is simply correct and good quality PDF readers > will cut and paste it as "fi" (two characters) and match it in a > search for "f" followed by "i". In other words, it works just > fine. > > I don't think either says anything about the penalty of PDF > publication. In fact for interactive (i.e. non scripted) searches > I like the PDF better, now. I can search for text, jump to > specific pages or go right to a given section just by typing any > part of the section number or name. We'll just have to agree to disagree. My main point was that with text you have no problem selecting the search software to suit your tastes and needs. -- Chuck F (cbfalconer at maineline dot net) Available for consulting/temporary embedded and systems. <http://cbfalconer.home.att.net> -- Posted via a free Usenet account from http://www.teranews.com |
|
|
|
#62 |
|
Messages: n/a
Hébergeur: |
On Tue, 23 Oct 2007 06:14:22 GMT, in comp.lang.c ,
rlb@hoekstra-uitgeverij.nl (Richard Bos) wrote: >Or you could - perhaps? revolutionary idea, I know - check the index. Typically, an index doesn't index every single word! A - see pages 2-1200 inclusive. But - see pages 2,3,4,5,6,7,8,11,12,14,16,.... for - see pages 2-1200 except 1104 and 896 -- Mark McIntyre "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian Kernighan |
|
|
|
#63 |
|
Messages: n/a
Hébergeur: |
Mark McIntyre <markmcintyre@spamcop.net> wrote:
> On Tue, 23 Oct 2007 06:14:22 GMT, in comp.lang.c , > rlb@hoekstra-uitgeverij.nl (Richard Bos) wrote: > > >Or you could - perhaps? revolutionary idea, I know - check the index. > > Typically, an index doesn't index every single word! > > A - see pages 2-1200 inclusive. > But - see pages 2,3,4,5,6,7,8,11,12,14,16,.... > for - see pages 2-1200 except 1104 and 896 Typically, one doesn't search for indefinite articles or coordinating conjunctions. One searches for important nouns - precisely the ones that are found in a good index. In this case, __func__ is in the index, while __FUNCTION__ is not; one look at this index at the start of this whole useless argument about PDF versus text versus underscores versus identifiers would have settled the matter. Richard |
|
|
|
#64 |
|
Messages: n/a
Hébergeur: |
On Fri, 26 Oct 2007 06:26:20 GMT, in comp.lang.c ,
rlb@hoekstra-uitgeverij.nl (Richard Bos) wrote: >In this case, __func__ is in the index, while >__FUNCTION__ is not; one look at this index at the start of this whole >useless argument about PDF versus text versus underscores versus >identifiers would have settled the matter. I agree, but I think the point being made is that indices do not index everything, and can thus only tell what _is_ in the book. -- Mark McIntyre "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian Kernighan |
|
|
|
#65 |
|
Messages: n/a
Hébergeur: |
Mark McIntyre <markmcintyre@spamcop.net> wrote:
> rlb@hoekstra-uitgeverij.nl (Richard Bos) wrote: > > >In this case, __func__ is in the index, while > >__FUNCTION__ is not; one look at this index at the start of this whole > >useless argument about PDF versus text versus underscores versus > >identifiers would have settled the matter. > > I agree, but I think the point being made is that indices do not index > everything, and can thus only tell what _is_ in the book. A good index, which I think one should assume the Standard does have, indexes everything important. Thus, if something is not in the Standard's index, either it isn't in the Standard, or it's only mentioned in passing. Since __func__ is in it but __FUNCTION__ is not... draw your own conclusion. Richard |
|
|
|
#66 |
|
Messages: n/a
Hébergeur: |
On Mon, 29 Oct 2007 13:26:40 GMT, in comp.lang.c ,
rlb@hoekstra-uitgeverij.nl (Richard Bos) wrote: >Mark McIntyre <markmcintyre@spamcop.net> wrote: > >> rlb@hoekstra-uitgeverij.nl (Richard Bos) wrote: >> >> >In this case, __func__ is in the index, while >> >__FUNCTION__ is not; one look at this index at the start of this whole >> >useless argument about PDF versus text versus underscores versus >> >identifiers would have settled the matter. >> >> I agree, but I think the point being made is that indices do not index >> everything, and can thus only tell what _is_ in the book. > >A good index, .... >indexes everything important. .... >Since __func__ is in it but __FUNCTION__ is not... >draw your own conclusion. I agree, but its not a proof and cannot settle the matter. -- Mark McIntyre "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian Kernighan |
|
![]() |
| Outils de la discussion | |
|
|