Re: Expanding buffer - response to "Determine the size of malloc"query
On 30 May, 15:14, Eric Sosman <Eric.Sos...@sun.com> wrote:
> JamesHarriswrote:
> > On 30 May, 14:12,JamesHarris<james.harri...@googlemail.com> wrote:
> >> Initial issue: read in an arbitrary-length piece of text.
> >> Perceived issue: handle variable-length data
>
> >> The code below is a suggestion for implementing a variable length
> >> buffer that could be used to read text or handle arrays of arbitrary
> >> length. I don't have the expertise in C of many folks here so I feel
> >> like I'm offering a small furry animal for sacrifice to a big armour
> >> plated one... but will offer it anyway. Please do suggest improvements
> >> or challenge the premise. It would be great if it could be improved to
> >> become a generally useful piece of code.
>
> >> Well, here goes. This should be fun. :-?
>
> >> -
>
> >> The following utility code is passed a buffer (allocated by the
> >> caller) and maintains it at an appropriate size. The main function
> >> increases the allocation (when necessary) by factors - rather than
> >> fixed amounts - for speed. There is a secondary function to trim a
> >> buffer back to a specific size. An extra byte (one more than is
> >> requested) is always left at the end.
>
> >> /*
> >> * Expanding buffer
> >> */
>
> >> #define EBUF_SIZE_INIT 128
> >> #define EBUF_SIZE_MIN 128
> >> #define EBUF_INCREASE 1.5 /* Factor to increase space by each time */
>
> >> #include <stdio.h>
> >> #include <stdlib.h>
> >> #include <malloc.h>
>
> Only a small, furry animal offering itself up for sacrifice
> would use a non-standard header like <malloc.h>. Consider
> yourself eaten by the Ravenous Bugblatter Beast.
Haha - the sacrificial animal of my analogy was the code I was
offering up - rather than me!! But you've raised - and eaten - some
good points. I wasn't aware not to use malloc.h, for example.
> Also, there seems to be no reason for <stdio.h> in the
> buffer-bashing code; it doesn't hurt to #include extraneous
> baggage, but it doesn't either. Everything you need
> is in <stdlib.h>.
OK
> >> int ebuf_full(char **buf, size_t *buf_size, size_t offset) {
> >> size_t new_size;
> >> char *new_buf;
>
> >> if (*buf_size < offset + 2) { /* NB last pos left empty */
>
> This magical `2' appears in quite a few places. Maybe
> it deserves a #define of its own?
Agreed, it's a bit scabby as it stands. The reason for the +2 is that
there's a +1 to change from an offset to a length - e.g. an offset of
7 means a length of 8 - and I wanted to leave one extra byte after the
specified length. I'd rather avoid the clutter of another defined
constant. I'll rewrite to consistently use offsets rather than lengths
and thus avoid the +2.
> >> new_size = *buf_size * EBUF_INCREASE + 1;
>
> As a small matter of personal preference and prejudice,
> I myself would avoid floating-point arithmetic here and do
> the calculation in integers. Not a big deal, though.
Me too. The reason for including a factor of 1.5 was simply to
demonstrate that we don't need to settle for integer factors.
> >> if (new_size < offset + 2) new_size = offset + 2;
> >> if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
> >> if ((new_buf = realloc(*buf, new_size)) == NULL) {
> >> return 1; /* Failed to realloc buffer */
> >> }
> >> *buf = new_buf;
> >> *buf_size = new_size;
> >> }
> >> return 0; /* Reallocated successfuly */
>
> >> }
>
> >> int ebuf_trim(char **buf, size_t *buf_size, size_t offset) {
> >> int new_size = offset + 2; /* Includes empty char */
> >> char *new_buf;
>
> >> if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
> >> if (new_size != *buf_size) {
> >> if ((new_buf = realloc(*buf, new_size)) == NULL) {
> >> return 1; /* Reallocation failed (unlikely) */
>
> There's an interface design decision lurking here: Should
> this be considered a "failure," or just an "unsuccessful
> attempt to optimize?" Arguments can be made for both points
> of view. IMHO you've chosen rightly, because it's possible
> that ebuf_trim() could fail in an attempt to *increase* the
> size of the buffer, in which case the calling program might
> be, er, surprised to discover that the buffer was too small
> for the offset.
OK
> >> }
> >> *buf = new_buf;
> >> *buf_size = new_size;
> >> }
> >> return 0; /* Reallocation succeeded */
>
> >> }
>
> > An example of intended use follows. Note that the routines are coded
> > to expect buffer and current size as parameters. Despite the error
> > handling the code is intended to be fast. Including "if (offset + 2 >
> > buf1_size)" in the main code the function should only be called if the
> > buffer is too small. The cost of one integer comparison is small.
>
> > int main() {
>
> `int main(void)' would be very slightly better.
>
> > char *buf1;
> > size_t buf1_size = EBUF_SIZE_INIT;
> > size_t offset;
>
> > if ((buf1 = malloc(buf1_size)) == NULL) {
> > fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
> > buf1_size);
>
> What is the type of buf1_size? Answer: size_t. What type
> of operand does the "%d" specifier convert? Answer: int. Are
> size_t and int the same? Answer: No. What should you do to
> fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
> didn't mean that ...)
I've no idea how to print these values, then. Would they be better as
unsigned ints? I guess this would mean unsigned ints would have to be
wide enough for any memory offset. Not sure if that can be relied
upon.
> For fprintf() and so on, <stdio.h> *is* needed.
>
> > exit(1);
>
> ITYM `exit(EXIT_FAILURE);'. Or `return EXIT_FAILURE;'.
OK. Was trying to keep the interface light. As such I wanted to reduce
the number of defines. The procedure name in meant to indicate the
meaning of a zero or non-zero return. The function can exist in an if
statement as
if (ebuf_full(....)) handle error
> > }
>
> Instead of pre-allocating the initial buffer, why not
> set buf1=NULL and buf1_size=0 and just let ebuf_full()
> take care of everything?
I didn't know this could be done. Will include in the rewrite.
> > ...
>
> > offset = <position in buffer to write to>
>
> > ...
>
> > /* Check buf1 is big enough */
> > if (offset + 2 > buf1_size && ebuf_full(&buf1, &buf1_size, offset))
> > {
> > fprintf(stderr, "Buffer overflow - have %d bytes but need %d
> > bytes",
> > buf1_size, offset + 2);
> > exit(1);
> > }
> > buf1[offset] = 0;
>
> > ...
>
> > free(buf1);
>
> ... and since main() returns an int value, you should ...?
> (C99 introduced a special rule for main() that says falling
> off the end is equivalent to returning zero, but IMHO this
> should be viewed as a concession to the large amount of sloppy
> code already in existence, not as an encouragement to further
> sloppiness. Besides, C99 implementations have not exactly
> taken the world by storm, and lots of C90 implementations are
> still in use.)
OK. That was a miss on my part.
> > }
>
> It seems to me you understand the basic ideas of how to
> use realloc() to grow a buffer (although the fact that you
> can reallocate a NULL may have escaped you). There are a
> few glitches in the way you've done things, easily fixable.
>
> If you want to package something like this for wider
> use as a buffer-managing utility, you might consider putting
> the buffer information in a struct and passing a single
> struct pointer to the functions. Not only would this make
> the interface clearer by reducing the argument count, but
> it would also make it easy for you to add further fillips
> of functionality later on, just by adding a few elements
> to the struct and leaving the calls alone.
I thought about that but chose against it. Options seem to be
1. Address and size are scalars in the caller
- limits other info that can be stored
2. Struct holding address, size, factor and other parameters
- simplfies calls to ebuf-trim
- requires normal use of pointers to be dereferenced via the struct
3. Struct holding parameters other than the address
- still needs extra parameter to be passed to ebuf_full
- requires ebuf_full to locate parameter block
On balance the first option seemed best. It keeps the system simple
without losing function.
|