Delphi Techniques: A Simple Alternative to TStringList
Sample code for this article can be downloaded from
Code Central.
The TStringList class is widely used in Delphi apps; they're just one of
those utility classes that are simple and useful enough that you find
yourself using them all the time. But I have to be honest with you, I get
really tired of writing try...finally blocks every time that I want to
use a TStringList. Wouldn't it be nice if there were a more convenient
way?
Enter 'advanced records'. Advanced records are simply old-school records
with the addition of methods, properties, and constructors. They're more
powerful than traditional records, but less capable than classes.
The nice thing about records (traditional or advanced) is that unlike
classes the memory for records is automatically allocated; ie: there's no
need for an explicit call to Create or Free. This allows us to write this
code:
var
list: TStringListRecord;
begin
list.LoadFromFile('test.txt');
...
end;
rather than this code:
var
list: TStringList;
begin
list := TStringList.Create;
try
list.LoadFromFile('test.txt');
...
finally
list.Free;
end;
end;
The Basics
Let's implement this record type by starting with the bare essentials.
The two most important properties are Count and Items, and let's add some
of the more common methods at the same time. We'll use a dynamic array to
hold the strings.
TStringListRecord = record
private
FItems: array of string;
function GetCount: integer; inline;
function GetItem(index: integer): string;
procedure SetItem(index: integer; const Value: string);
public
function Add(const line: string): integer;
procedure Clear;
function IndexOf(const s: string): integer;
property Count: integer read GetCount;
property Items[index: integer]: string read GetItem write SetItem; default;
end;
The interface is what you would expect. Note that private and public are
the only visibility categories allowed. Since advanced records don't
support inheritance protected visibility doesn't have any meaning.
The implementation is equally straight forward:
function TStringListRecord.Add(const line: string): integer;
begin
SetLength(FItems,Count+1);
FItems[Count-1] := line;
result := Count - 1;
end;
procedure TStringListRecord.Clear;
begin
SetLength( FItems, 0 );
end;
function TStringListRecord.GetCount: integer;
begin
result := length(FItems);
end;
function TStringListRecord.GetItem(index: integer): string;
begin
result := FItems[index];
end;
procedure TStringListRecord.SetItem(index: integer; const Value: string);
begin
FItems[index] := Value;
end;
function TStringListRecord.IndexOf(const s: string): integer;
var
k: integer;
begin
result := -1;
for k := 0 to Count - 1 do begin
if FItems[k] = s then begin
result := k;
break;
end;
end;
end;
The Delete Method
The Delete method proves to be a little more interesting. The general
approach is to simply slide the elements in the array down by one to take
the place of the item that was deleted. One idea would be to use a for
loop to slide the elements one by one, but that sounded slow to me so I
decided to use the Move procedure. My first attempt looked like this (the
lines have been numbered for easy reference):
{D1} procedure TStringListRecord.Delete(Index: Integer);
{D2} begin
{D3} if (Index < 0) or (Index >= Count) then raise EStringListError.Create('List index out of bounds');
{D4}
{D5} if Index < Count - 1 then begin
{D6} move( FItems[Index+1], FItems[Index], sizeof(FItems[0]) * (Count - Index - 1) );
{D7} end;
{D8}
{D9} SetLength( FItems, Count - 1 );
{D10}end;
I used the code below to test the Delete method:
{T1} FList.Add('one');
{T2} FList.Add('two');
{T3} FList.Add('three');
{T4} FList.Delete(1);
{T5} FList.Clear;
Alas, when I ran the code I received a EInvalidPointer error on line T5,
and the FastMM memory manager reported a memory leak when I exited the
application. The problem is that by using the Move procedure we are going
behind Delphi's back and preventing it from correctly managing the memory
associated with the strings.
Here's what happening. After line T3, memory looks like this:
On the left is the FItems array with three elements, each containing a
string. The actual data for the strings is stored on the heap and is
shown on the right. Each string has a four byte reference count and a
four byte length followed by the actual text, and then finally a null
terminating byte (not shown in the diagram).
The Delete method is then called. Line D6 slides the contents of slot 2
downward into slot 1, essentially copying the value in slot 2 into slot
1. After line D6, memory looks like this:
As you can see, the string 'two' has been leaked, and there are now two
pointers to 'three' although the reference count is only 1. When line D9
is executed Delphi notices that slot 2 points to a valid string and so it
decrements the reference count and then frees the memory, thus causing
slot 1 to point to deallocated memory. At this point the memory layout
is:
An exception won't be raised until line T5, "FList.Clear", is executed.
Delphi will release the memory for the string stored in slot 0, and will
attempt to release the memory for the string stored in slot 1. However,
since that memory has already been freed, a EInvalidPointer exception
will be raised. When the application exits, the FastMM memory manager
will detect that the 'two' string has been leaked and will display a
warning message.
I decided to try a simpler technique:
procedure TStringListRecord.Delete(Index: Integer);
var
k: integer;
begin
if (Index < 0) or (Index >= Count) then raise EStringListError.Create('List index out of bounds');
for k := Index to Count - 2 {2nd last element} do begin
FItems[k] := FItems[k+1];
end;
SetLength( FItems, Count - 1 );
end;
This works well, except for the minor detail that it is 100 times slower.
Let's go back to the original technique and try to make it work
correctly.
The first problem was that a string was being leaked. This can be
resolved by assigning an empty string to that slot. Behind the scenes
Delphi will free the previous contents of the slot and then assign the
new contents. Delphi represents zero-length long strings as nil, so slot
1 will contain nil rather than a pointer to memory:
After the call to the Move procedure is completed we need to assign nil
to the last slot to ensure that Delphi doesn't deallocate that memory
when we resize the array. Simply assigning an empty string will of course
cause Delphi to deallocate the memory for the string so we need to take a
sneakier approach:
PPointer(@FItems[Count - 1])^ := nil;
This gives us the result that we want, and we can now resize the FItems
array safely.
Properties
TStringList has a number of other properties in addition to Count and
Items such as CaseSensitive, Duplicates, Sorted, QuoteChar, etc. These
are a bit more difficult to deal with.
The problem is that in Delphi most variables (including records) which
are locally declared are not initialized. Thus, in the code below the
value of list.ACardinal is undefined. It might be 0, 57, or any arbitrary
value.
type
TStringListRecord = record
private
ACardinal: cardinal;
end;
procedure Test;
var
list: TStringListRecord;
begin
ShowMessageFmt('%d',[list.ACardinal]);
end;
One way is to add a Initialize procedure to the record which would assign
default values, but this would require that the programmer remember to
call Initialize every time that a TStringListRecord is used. This is an
error waiting to happen.
You'll notice that the Items property works fine. That's because Delphi
does automatically initializes dynamic arrays when they're
declared locally. This is done out of necessity. You'll recall from
earlier that when a dynamic array is resized, Delphi will attempt to free
any memory associated with unused elements. If the dynamic array variable
contained garbage an access violation error would occur during a resize
operation.
This gives rise to a second way to initializing properties - use dynamic
arrays to store the property. When the record is declared the array will
have zero elements, and when a value is assigned it will be resized to
have precisely one element. For example:
TStringListRecord = record
private
FCaseSensitive: array of boolean;
function GetCaseSensitive: boolean;
procedure SetCaseSensitive(const Value: boolean);
public
property CaseSensitive: boolean read GetCaseSensitive write SetCaseSensitive;
end;
function TStringListRecord.GetCaseSensitive: boolean;
begin
result := (length(FCaseSensitive) = 0) or FCaseSensitive[0];
end;
procedure TStringListRecord.SetCaseSensitive(const Value: boolean);
begin
if length(FCaseSensitive) = 0 then begin
SetLength(FCaseSensitive,1);
end;
FCaseSensitive[0] := Value;
end;
The third and final option is to keep the TStringListRecord class simple
and ignore the other properties. Since TStringListRecord is intended to
be used more for convenience than power we'll take this approach and
resort to using TStringList when we need more functionality.
Performance
We've taken a fairly simplistic approach to how the strings are stored.
Each time we add a string the FItems array is resized and the string is
assigned. In comparison, TStringList uses a more sophisticated algorithm.
TStringLists have both a Count property and a Capacity property where the
Capacity is always greater than equal to the Count. Capacity refers to
the number of elements in the underlying array, and Count refers to the
number of elements that are actually in use.
When the array needs to increase in size TStringList will add several
elements at a time rather than just one. This allows TStringList to
reduce the number of resizes. The key functionality occurs in
TStringList's Grow method:
procedure TStringList.Grow;
var
Delta: Integer;
begin
if FCapacity > 64 then Delta := FCapacity div 4 else
if FCapacity > 8 then Delta := 16 else
Delta := 4;
SetCapacity(FCapacity + Delta);
end;
Thus the Capacity will grow by either 4 elements, 8 elements, or 25% of
the current Capacity depending on the current number of elements.
Based on this you would think that TStringList would be considerably
faster than TStringListRecord. In fact they're quite similar in
performance. Presumably the additional functionality found in TStringList
offsets the advantage of the more intelligent algorithm.
Conclusion
Using advanced records to implement a replacement for TStringList allows
us to eliminate the housekeeping code associated with a TStringList. The
result is simpler, more readable code. While in this article
TStringListRecord was kept quite simple, there's no reason that the full
functionality of the TStringList class could not be implemented.
f