|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Initial issue: read in an arbitrary-length piece of text.
Perceived issue: handle variable-length data The code below is a suggestion for implementing a variable length buffer that could be used to read text or handle arrays of arbitrary length. I don't have the expertise in C of many folks here so I feel like I'm offering a small furry animal for sacrifice to a big armour plated one... but will offer it anyway. Please do suggest improvements or challenge the premise. It would be great if it could be improved to become a generally useful piece of code. Well, here goes. This should be fun. :-? - The following utility code is passed a buffer (allocated by the caller) and maintains it at an appropriate size. The main function increases the allocation (when necessary) by factors - rather than fixed amounts - for speed. There is a secondary function to trim a buffer back to a specific size. An extra byte (one more than is requested) is always left at the end. /* * Expanding buffer */ #define EBUF_SIZE_INIT 128 #define EBUF_SIZE_MIN 128 #define EBUF_INCREASE 1.5 /* Factor to increase space by each time */ #include <stdio.h> #include <stdlib.h> #include <malloc.h> int ebuf_full(char **buf, size_t *buf_size, size_t offset) { size_t new_size; char *new_buf; if (*buf_size < offset + 2) { /* NB last pos left empty */ new_size = *buf_size * EBUF_INCREASE + 1; if (new_size < offset + 2) new_size = offset + 2; if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN; if ((new_buf = realloc(*buf, new_size)) == NULL) { return 1; /* Failed to realloc buffer */ } *buf = new_buf; *buf_size = new_size; } return 0; /* Reallocated successfuly */ } int ebuf_trim(char **buf, size_t *buf_size, size_t offset) { int new_size = offset + 2; /* Includes empty char */ char *new_buf; if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN; if (new_size != *buf_size) { if ((new_buf = realloc(*buf, new_size)) == NULL) { return 1; /* Reallocation failed (unlikely) */ } *buf = new_buf; *buf_size = new_size; } return 0; /* Reallocation succeeded */ } |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On 30 May, 14:12, James Harris <james.harri...@googlemail.com> wrote:
> Initial issue: read in an arbitrary-length piece of text. > Perceived issue: handle variable-length data > > The code below is a suggestion for implementing a variable length > buffer that could be used to read text or handle arrays of arbitrary > length. I don't have the expertise in C of many folks here so I feel > like I'm offering a small furry animal for sacrifice to a big armour > plated one... but will offer it anyway. Please do suggest improvements > or challenge the premise. It would be great if it could be improved to > become a generally useful piece of code. > > Well, here goes. This should be fun. :-? > > - > > The following utility code is passed a buffer (allocated by the > caller) and maintains it at an appropriate size. The main function > increases the allocation (when necessary) by factors - rather than > fixed amounts - for speed. There is a secondary function to trim a > buffer back to a specific size. An extra byte (one more than is > requested) is always left at the end. > > /* > * Expanding buffer > */ > > #define EBUF_SIZE_INIT 128 > #define EBUF_SIZE_MIN 128 > #define EBUF_INCREASE 1.5 /* Factor to increase space by each time */ > > #include <stdio.h> > #include <stdlib.h> > #include <malloc.h> > > int ebuf_full(char **buf, size_t *buf_size, size_t offset) { > size_t new_size; > char *new_buf; > > if (*buf_size < offset + 2) { /* NB last pos left empty */ > new_size = *buf_size * EBUF_INCREASE + 1; > if (new_size < offset + 2) new_size = offset + 2; > if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN; > if ((new_buf = realloc(*buf, new_size)) == NULL) { > return 1; /* Failed to realloc buffer */ > } > *buf = new_buf; > *buf_size = new_size; > } > return 0; /* Reallocated successfuly */ > > } > > int ebuf_trim(char **buf, size_t *buf_size, size_t offset) { > int new_size = offset + 2; /* Includes empty char */ > char *new_buf; > > if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN; > if (new_size != *buf_size) { > if ((new_buf = realloc(*buf, new_size)) == NULL) { > return 1; /* Reallocation failed (unlikely) */ > } > *buf = new_buf; > *buf_size = new_size; > } > return 0; /* Reallocation succeeded */ > > } An example of intended use follows. Note that the routines are coded to expect buffer and current size as parameters. Despite the error handling the code is intended to be fast. Including "if (offset + 2 > buf1_size)" in the main code the function should only be called if the buffer is too small. The cost of one integer comparison is small. int main() { char *buf1; size_t buf1_size = EBUF_SIZE_INIT; size_t offset; if ((buf1 = malloc(buf1_size)) == NULL) { fprintf(stderr, "Buffer initial malloc of %d bytes failed\n", buf1_size); exit(1); } ... offset = <position in buffer to write to> ... /* Check buf1 is big enough */ if (offset + 2 > buf1_size && ebuf_full(&buf1, &buf1_size, offset)) { fprintf(stderr, "Buffer overflow - have %d bytes but need %d bytes", buf1_size, offset + 2); exit(1); } buf1[offset] = 0; ... free(buf1); } |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
On 30 May, 14:12, James Harris <james.harri...@googlemail.com> wrote:
> Initial issue: read in an arbitrary-length piece of text. > Perceived issue: handle variable-length data > > The code below is a suggestion for implementing a variable length > buffer that could be used to read text or handle arrays of arbitrary > length. I don't have the expertise in C of many folks here so I feel > like I'm offering a small furry animal for sacrifice to a big armour > plated one... but will offer it anyway. Please do suggest improvements > or challenge the premise. It would be great if it could be improved to > become a generally useful piece of code. > > Well, here goes. This should be fun. :-? > > - > > The following utility code is passed a buffer (allocated by the > caller) and maintains it at an appropriate size. The main function > increases the allocation (when necessary) by factors - rather than > fixed amounts - for speed. There is a secondary function to trim a > buffer back to a specific size. An extra byte (one more than is > requested) is always left at the end. > > /* > * Expanding buffer > */ > > #define EBUF_SIZE_INIT 128 > #define EBUF_SIZE_MIN 128 > #define EBUF_INCREASE 1.5 /* Factor to increase space by each time */ > > #include <stdio.h> > #include <stdlib.h> > #include <malloc.h> > > int ebuf_full(char **buf, size_t *buf_size, size_t offset) { > size_t new_size; > char *new_buf; > > if (*buf_size < offset + 2) { /* NB last pos left empty */ > new_size = *buf_size * EBUF_INCREASE + 1; > if (new_size < offset + 2) new_size = offset + 2; > if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN; > if ((new_buf = realloc(*buf, new_size)) == NULL) { > return 1; /* Failed to realloc buffer */ > } > *buf = new_buf; > *buf_size = new_size; > } > return 0; /* Reallocated successfuly */ > > } > > int ebuf_trim(char **buf, size_t *buf_size, size_t offset) { > int new_size = offset + 2; /* Includes empty char */ > char *new_buf; > > if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN; > if (new_size != *buf_size) { > if ((new_buf = realloc(*buf, new_size)) == NULL) { > return 1; /* Reallocation failed (unlikely) */ > } > *buf = new_buf; > *buf_size = new_size; > } > return 0; /* Reallocation succeeded */ > > } Here's another piece of example code to use the proposed functions. This one is to read an arbitrary-length line. Hopefully when compared with a custom line-reading function the code below keeps a far simpler interface while allowing any necessary options. It should also be fast in that, again, the function only gets called if there is a need for more space. Since the function allocates memory in ever-increasing chunks for most iterations the function will not be called. #define ENDCHAR '\n' FILE *infile = stdin; char *buffer; size_t bufsize = 100; /* Initial size only */ size_t offset; ... (allocate buffer) /* Read to 'endchar' */ for (offset = 0; (ch = getc(infile)) != EOF; ) { if (offset + 2 > bufsize && ebuf_full(&buffer, &bufsize, offset) { fprintf(stderr, "Line too long for memory"); exit(1); } buffer[offset++] = ch; if (ch == ENDCHAR) break; } ... (free buffer) Notably since we invoke getc() we could easily have more than one termination character such as if (ch == '\n' || ch == '\0' || ch == ',') etc. which is intended to be a big advantage over calling a line reader function. -- James |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
"James Harris" <james.harris.1@googlemail.com> wrote in message > Initial issue: read in an arbitrary-length piece of text. > Perceived issue: handle variable-length data > > The code below is a suggestion for implementing a variable length > buffer that could be used to read text or handle arrays of arbitrary > length. I don't have the expertise in C of many folks here so I feel > like I'm offering a small furry animal for sacrifice to a big armour > plated one... but will offer it anyway. Please do suggest improvements > or challenge the premise. It would be great if it could be improved to > become a generally useful piece of code. > Firstly, don't worry about the actual code bodies at this stage. Any reasonably competent C programmer should be able to provide those. The thing is the interfaces. The first problem is that if we use char *, the functions will only work on character arrays. If we use void *s, this problem disappears, but there might be issues about too many casts to access the actual data. The second issue is whether to use a structure for the buffer, or, as you have done, pass in several parameters to represent size and capacity. There's a nasty C stitch-up if we use void *s with option 2. ebuf_full(void **buf ...) char *buffer; /* this is illegal */ ebuf_full(&buffer) buffer has to be assigned to a dummy void *first. Which makes the function unusable. -- Free games and programming goodies. http://www.personal.leeds.ac.uk/~bgy1mm |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On 30 May, 15:14, Eric Sosman <Eric.Sos...@sun.com> wrote:
> JamesHarriswrote: > > On 30 May, 14:12,JamesHarris<james.harri...@googlemail.com> wrote: > >> Initial issue: read in an arbitrary-length piece of text. > >> Perceived issue: handle variable-length data > > >> The code below is a suggestion for implementing a variable length > >> buffer that could be used to read text or handle arrays of arbitrary > >> length. I don't have the expertise in C of many folks here so I feel > >> like I'm offering a small furry animal for sacrifice to a big armour > >> plated one... but will offer it anyway. Please do suggest improvements > >> or challenge the premise. It would be great if it could be improved to > >> become a generally useful piece of code. > > >> Well, here goes. This should be fun. :-? > > >> - > > >> The following utility code is passed a buffer (allocated by the > >> caller) and maintains it at an appropriate size. The main function > >> increases the allocation (when necessary) by factors - rather than > >> fixed amounts - for speed. There is a secondary function to trim a > >> buffer back to a specific size. An extra byte (one more than is > >> requested) is always left at the end. > > >> /* > >> * Expanding buffer > >> */ > > >> #define EBUF_SIZE_INIT 128 > >> #define EBUF_SIZE_MIN 128 > >> #define EBUF_INCREASE 1.5 /* Factor to increase space by each time */ > > >> #include <stdio.h> > >> #include <stdlib.h> > >> #include <malloc.h> > > Only a small, furry animal offering itself up for sacrifice > would use a non-standard header like <malloc.h>. Consider > yourself eaten by the Ravenous Bugblatter Beast. Haha - the sacrificial animal of my analogy was the code I was offering up - rather than me!! But you've raised - and eaten - some good points. I wasn't aware not to use malloc.h, for example. > Also, there seems to be no reason for <stdio.h> in the > buffer-bashing code; it doesn't hurt to #include extraneous > baggage, but it doesn't either. Everything you need > is in <stdlib.h>. OK > >> int ebuf_full(char **buf, size_t *buf_size, size_t offset) { > >> size_t new_size; > >> char *new_buf; > > >> if (*buf_size < offset + 2) { /* NB last pos left empty */ > > This magical `2' appears in quite a few places. Maybe > it deserves a #define of its own? Agreed, it's a bit scabby as it stands. The reason for the +2 is that there's a +1 to change from an offset to a length - e.g. an offset of 7 means a length of 8 - and I wanted to leave one extra byte after the specified length. I'd rather avoid the clutter of another defined constant. I'll rewrite to consistently use offsets rather than lengths and thus avoid the +2. > >> new_size = *buf_size * EBUF_INCREASE + 1; > > As a small matter of personal preference and prejudice, > I myself would avoid floating-point arithmetic here and do > the calculation in integers. Not a big deal, though. Me too. The reason for including a factor of 1.5 was simply to demonstrate that we don't need to settle for integer factors. > >> if (new_size < offset + 2) new_size = offset + 2; > >> if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN; > >> if ((new_buf = realloc(*buf, new_size)) == NULL) { > >> return 1; /* Failed to realloc buffer */ > >> } > >> *buf = new_buf; > >> *buf_size = new_size; > >> } > >> return 0; /* Reallocated successfuly */ > > >> } > > >> int ebuf_trim(char **buf, size_t *buf_size, size_t offset) { > >> int new_size = offset + 2; /* Includes empty char */ > >> char *new_buf; > > >> if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN; > >> if (new_size != *buf_size) { > >> if ((new_buf = realloc(*buf, new_size)) == NULL) { > >> return 1; /* Reallocation failed (unlikely) */ > > There's an interface design decision lurking here: Should > this be considered a "failure," or just an "unsuccessful > attempt to optimize?" Arguments can be made for both points > of view. IMHO you've chosen rightly, because it's possible > that ebuf_trim() could fail in an attempt to *increase* the > size of the buffer, in which case the calling program might > be, er, surprised to discover that the buffer was too small > for the offset. OK > >> } > >> *buf = new_buf; > >> *buf_size = new_size; > >> } > >> return 0; /* Reallocation succeeded */ > > >> } > > > An example of intended use follows. Note that the routines are coded > > to expect buffer and current size as parameters. Despite the error > > handling the code is intended to be fast. Including "if (offset + 2 > > > buf1_size)" in the main code the function should only be called if the > > buffer is too small. The cost of one integer comparison is small. > > > int main() { > > `int main(void)' would be very slightly better. > > > char *buf1; > > size_t buf1_size = EBUF_SIZE_INIT; > > size_t offset; > > > if ((buf1 = malloc(buf1_size)) == NULL) { > > fprintf(stderr, "Buffer initial malloc of %d bytes failed\n", > > buf1_size); > > What is the type of buf1_size? Answer: size_t. What type > of operand does the "%d" specifier convert? Answer: int. Are > size_t and int the same? Answer: No. What should you do to > fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I > didn't mean that ...) I've no idea how to print these values, then. Would they be better as unsigned ints? I guess this would mean unsigned ints would have to be wide enough for any memory offset. Not sure if that can be relied upon. > For fprintf() and so on, <stdio.h> *is* needed. > > > exit(1); > > ITYM `exit(EXIT_FAILURE);'. Or `return EXIT_FAILURE;'. OK. Was trying to keep the interface light. As such I wanted to reduce the number of defines. The procedure name in meant to indicate the meaning of a zero or non-zero return. The function can exist in an if statement as if (ebuf_full(....)) handle error > > } > > Instead of pre-allocating the initial buffer, why not > set buf1=NULL and buf1_size=0 and just let ebuf_full() > take care of everything? I didn't know this could be done. Will include in the rewrite. > > ... > > > offset = <position in buffer to write to> > > > ... > > > /* Check buf1 is big enough */ > > if (offset + 2 > buf1_size && ebuf_full(&buf1, &buf1_size, offset)) > > { > > fprintf(stderr, "Buffer overflow - have %d bytes but need %d > > bytes", > > buf1_size, offset + 2); > > exit(1); > > } > > buf1[offset] = 0; > > > ... > > > free(buf1); > > ... and since main() returns an int value, you should ...? > (C99 introduced a special rule for main() that says falling > off the end is equivalent to returning zero, but IMHO this > should be viewed as a concession to the large amount of sloppy > code already in existence, not as an encouragement to further > sloppiness. Besides, C99 implementations have not exactly > taken the world by storm, and lots of C90 implementations are > still in use.) OK. That was a miss on my part. > > } > > It seems to me you understand the basic ideas of how to > use realloc() to grow a buffer (although the fact that you > can reallocate a NULL may have escaped you). There are a > few glitches in the way you've done things, easily fixable. > > If you want to package something like this for wider > use as a buffer-managing utility, you might consider putting > the buffer information in a struct and passing a single > struct pointer to the functions. Not only would this make > the interface clearer by reducing the argument count, but > it would also make it easy for you to add further fillips > of functionality later on, just by adding a few elements > to the struct and leaving the calls alone. I thought about that but chose against it. Options seem to be 1. Address and size are scalars in the caller - limits other info that can be stored 2. Struct holding address, size, factor and other parameters - simplfies calls to ebuf-trim - requires normal use of pointers to be dereferenced via the struct 3. Struct holding parameters other than the address - still needs extra parameter to be passed to ebuf_full - requires ebuf_full to locate parameter block On balance the first option seemed best. It keeps the system simple without losing function. |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
On 31 May, 09:05, "Malcolm McLean" <regniz...@btinternet.com> wrote:
> "JamesHarris" <james.harri...@googlemail.com> wrote in message > > Initial issue: read in an arbitrary-length piece of text. > > Perceived issue: handle variable-length data > > > The code below is a suggestion for implementing a variable length > > buffer that could be used to read text or handle arrays of arbitrary > > length. I don't have the expertise in C of many folks here so I feel > > like I'm offering a small furry animal for sacrifice to a big armour > > plated one... but will offer it anyway. Please do suggest improvements > > or challenge the premise. It would be great if it could be improved to > > become a generally useful piece of code. > > Firstly, don't worry about the actual code bodies at this stage. Any > reasonably competent C programmer should be able to provide those. > > The thing is the interfaces. > > The first problem is that if we use char *, the functions will only work on > character arrays. If we use void *s, this problem disappears, but there > might be issues about too many casts to access the actual data. AFAIK casts tend to make code less safe and I try to avoid them. Is there a good solution to this? > The second issue is whether to use a structure for the buffer, or, as you > have done, pass in several parameters to represent size and capacity. > There's a nasty C stitch-up if we use void *s with option 2. > > ebuf_full(void **buf ...) > > char *buffer; > /* this is illegal */ > ebuf_full(&buffer) > > buffer has to be assigned to a dummy void *first. Which makes the function > unusable. Not nice! Having a separate struct would allow other advantages such as having a per-buffer size increase factor but I think it would need the pointer to be dereferenced when it is used normally. On balance I think address and length (or, better, address and offset) is better. Here's a rewrite where I've improved the code slightly by simplifying a few bits of it. It now uses offsets rather than lengths and bases the increase on the requested offset so eliminating some of the checks. It does require the factor to be greater than or equal to 1. I'll include the functions and a sample main in one go. /* * Test expanding buffer */ #define EBUF_INCREASE 1.5 /* Factor (>= 1) for space increase */ #include <stdio.h> #include <stdlib.h> int ebuf_full(char **buf, size_t *buf_limit, size_t offset) { size_t new_limit; char *new_buf; if (offset >= *buf_limit) { new_limit = offset * EBUF_INCREASE; if ((new_buf = realloc(*buf, new_limit + 1)) == NULL) { return 1; /* Failed to realloc */ } *buf = new_buf; *buf_limit = new_limit; } return 0; /* Realloc succeeded */ } int ebuf_trim(char **buf, size_t *buf_limit, size_t offset) { char *new_buf; if ((new_buf = realloc(*buf, offset + 1)) == NULL) { return 1; /* Realloc failed */ } *buf = new_buf; *buf_limit = offset; return 0; /* Succeeded */ } int main(void) { char *buf1 = NULL; size_t buf1_limit = 0; size_t offset; for (offset = 0; offset < 1000; offset += 200) { fprintf(stderr, "\n---Checking for offset %d\n", offset); if (offset >= buf1_limit && ebuf_full(&buf1, &buf1_limit, offset)) { fprintf(stderr, "-Ebuf overflow %d/%d bytes", buf1_limit, offset); exit(1); } buf1[offset] = 'x'; } fprintf(stderr, "\n---Trim from %d to %d\n", buf1_limit, offset); if (ebuf_trim(&buf1, &buf1_limit, offset)) { fprintf(stderr, "-Buffer trim to %d failure\n", offset); exit(1); } free(buf1); return 0; } |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
James Harris wrote:
> On 30 May, 15:14, Eric Sosman <Eric.Sos...@sun.com> wrote: >> JamesHarriswrote: >>> [...] >>> size_t buf1_size = EBUF_SIZE_INIT; >>> size_t offset; >>> if ((buf1 = malloc(buf1_size)) == NULL) { >>> fprintf(stderr, "Buffer initial malloc of %d bytes failed\n", >>> buf1_size); >> What is the type of buf1_size? Answer: size_t. What type >> of operand does the "%d" specifier convert? Answer: int. Are >> size_t and int the same? Answer: No. What should you do to >> fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I >> didn't mean that ...) > > I've no idea how to print these values, then. Would they be better as > unsigned ints? I guess this would mean unsigned ints would have to be > wide enough for any memory offset. Not sure if that can be relied > upon. If you can count on a C99 implementation, there's a length modifier "z" for printing size_t values: printf ("Size = %zu\n", buf1_size); If you need to live with the more widely available C90 systems, there's no "z" modifier and you need to convert the size_t to something printf() knows how to handle: printf ("Size = %u\n", (unsigned int)buf1_size); or (safer): printf ("Size = %lu\n", (unsigned long)buf1_size); or even (extremely safe, extremely unusual): printf ("Size = %.0f\n", (double)buf1_size); These will work as well on C99 as they do on C90. -- Eric.Sosman@sun.com |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
On 25 Jul, 22:33, Eric Sosman <Eric.Sos...@sun.com> wrote:
> James Harris wrote: > > On 30 May, 15:14, Eric Sosman <Eric.Sos...@sun.com> wrote: > >> JamesHarriswrote: > >>> [...] > >>> size_t buf1_size = EBUF_SIZE_INIT; > >>> size_t offset; > >>> if ((buf1 = malloc(buf1_size)) == NULL) { > >>> fprintf(stderr, "Buffer initial malloc of %d bytes failed\n", > >>> buf1_size); > >> What is the type of buf1_size? Answer: size_t. What type > >> of operand does the "%d" specifier convert? Answer: int. Are > >> size_t and int the same? Answer: No. What should you do to > >> fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I > >> didn't mean that ...) > > > I've no idea how to print these values, then. Would they be better as > > unsigned ints? I guess this would mean unsigned ints would have to be > > wide enough for any memory offset. Not sure if that can be relied > > upon. > > If you can count on a C99 implementation, there's a length > modifier "z" for printing size_t values: > > printf ("Size = %zu\n", buf1_size); > > If you need to live with the more widely available C90 > systems, there's no "z" modifier and you need to convert the > size_t to something printf() knows how to handle: > > printf ("Size = %u\n", (unsigned int)buf1_size); > > or (safer): > > printf ("Size = %lu\n", (unsigned long)buf1_size); > > or even (extremely safe, extremely unusual): > > printf ("Size = %.0f\n", (double)buf1_size); > > These will work as well on C99 as they do on C90. > Rather than use size_t would I be better to use a type of unsigned int or unsigned long in the first place? -- James |
|
![]() |
| Outils de la discussion | |
|
|