Review of VB.NET Data Storage Structures
Part two of an eight-part series of blogs

There are a bewildering array (excuse the pun) of data storage structures available to you in Visual Basic. Choose from arrays, ArrayLists, SortedLists, Dictionaries, HashTables, Lists and DataTables, among others. This blog gives an example of each type of structure, and benchmarks them to show which perform best and worst.

  1. VB.NET Data Storage Types Compared and Benchmarked
  2. VB.NET Benchmarking Test of speeds of data structures (this blog)
  3. Arrays - Visual Basic data structures
  4. ArrayLists and SortedLists - Visual Basic data structures
  5. Dictionaries and HashTables - Visual Basic data structures
  6. Lists - Visual Basic data structures
  7. Using data tables - Visual Basic data structures
  8. Benchmark Results and Recommendations

Posted by Andy Brown on 24 August 2011

You need a minimum screen resolution of about 700 pixels width to see our blogs. This is because they contain diagrams and tables which would not be viewable easily on a mobile phone or small laptop. Please use a larger tablet, notebook or desktop computer, or change your screen resolution settings.

VB.NET Benchmarking Test

I've tried to make the test as fair as I can.  This page explains in some details the tests I've run, together with how (and why) they're set up the way they are. 

This probably doesn't make for terribly interesting reading, so you might like to skip this page and go on to the rest of the blog.

The Aim

I've heard about so many data structures in Visual Basic 2010 (VB.NET), including:

  • Arrays
  • ArrayLists
  • SortedLists
  • Dictionaries
  • HashTables
  • Lists
  • Data tables

The ones I use regularly when coding are arrays and data tables (the first and last items in the list above), but I've always wondered whether I should be widening my horizons.

My aim, then, has been to find out how quickly each of the above data structures will accomplish 4 basic tasks:

  • Writing values into the data structure
  • Sorting the values
  • Reading them back out again by their numeric position
  • Reading them back out again by their text value

All of this needs some explanation!

Key/Value Pairs

For all of the data structures, I'm storing for each bit of information:

  • Its index number within the structure (1, 2, 3, etc)
  • The corresponding text

Thus if I set up a structure containing the names of the seven dwarves, it might be reasonable to say that the value of the 7th element in the data structure was Dopey (although actually it would probably be the 6th element, since they tend to be numbered from 0).

The above should penalise data tables (which are designed to hold rows of data consisting of multiple columns), and benefit structures like dictionaries and hash tables, which are designed specifically to store key/value pairs for quick access.

Generating Each Test Value

Given a number (such as 3) I wanted to be able to generate a unique text value.  I had two requirements:

  • The text value should be sortable (thus, it can't just be something like 00003, as this would give the sort algorithm too little to do)
  • The time taken to generate the text value should be the same for all data structures

The code I've used to generate test values for each integer is as follows:

Function TestCode(ByVal i As Integer) As String

'given integer, creates random(ish) string

'(the idea is to create an algorithm which always takes the same time, but gives data which can then be sorted)

'algorithm: take number, reverse and prefix with Y

'eg 3257 would become X75230

Dim s As String = "x"

Dim num As String = i.ToString.PadLeft(MaxRunSize, "0"c)

Dim j As Integer

For j = MaxRunSize To 1 Step -1

s &= num.Substring(j - 1, 1)

Next

Return s

End Function

What this does is generate a text string for any number.  The length of the string depends on the value of a variable called MaxRunSize.  Here are a couple of examples:

MaxRunSize Input Output
5 3 x30000
5 3257 x75230

It really doesn't matter what the algorithm does, as long as it takes the same amount of time to do it each time that it's called.

The Run Size

The run size for the test can be set to be any single digit.  For example:

Run size Number of values generated
1 10
2 100
3 1,000
4 10,000
5 100,000

So a run size of 5 would create a data structure with 100,000 values, ranging from x00000 to x99999.

The 4 Tests

The 4 timed tests are as follows:

Test What it involves
Writing Writing out the values into the structure (for example, for a run size of 1 this would involve writing out 10 values only).
Sorting Sorting these 10 values alphabetically by the text value (so, for example, x9000 would come after x8999, even though they correspond to the numbers 9 and 9998 respectively).
Reading (by number) Ths test involves reading 10 values in "at random" from the structure generated.  For example, if the run size is 3, the test will read in the text values for items number 999, 899, 799, 699, 599, 499, 399, 299, 199 and 99.  This tests the ability of the structure to locate items by number quickly.
Reading (by text value) This test is similar to the previous one, but involves locating items by their text value.  For example, if the run size is 2, the test will involve finding the position withn the data structure of the text values x000, x001, x002, x003, x004, x005, x006, x007, x008 and x009.  These are interspersed throughout the structure, to make this a fair test.

To generate the 10 text values to read in, I've used the following code:

Sub StoreTenValues()

'store 10 values (used in third test)

Dim j As Integer

j = 0

For i = 0 To (10 ^ RunSize) - 1 Step (10 ^ (RunSize - 1))

TextValues(j) = TestCode(i)

j = j + 1

Next

 

'make sure this isn't timed

RestartTimer()

End Sub

I'm expecting that arrays will score badly on the last test, since while they're good for getting values by position within the array, they shouldn't be so good for finding text values.  We shall see!

Other Code You'll Need

For anyone trying to reconstruct the tests, you'll need the following variables:

#Region "Variables"

Private TextValues(9) As String

Private RunSize As Integer = Nothing

Private Const MaxRunSize As Integer = 5

Private timer As System.Diagnostics.Stopwatch = Nothing

Private WriteTime As Integer = Nothing

Private SortTime As Integer = Nothing

Private Read1Time As Integer = Nothing

Private Read2Time As Integer = Nothing

#End Region

You'll also need to set the RunSize variable to a single digit (1, 2, 3 ...).

I've also used the StopWatch diagnostic class to time events:

#Region "Timing things"

Sub StartTimer()

'begin timing

timer = New System.Diagnostics.Stopwatch

timer.Start()

End Sub

Sub RestartTimer()

timer.Reset()

timer = New System.Diagnostics.Stopwatch

timer.Start()

End Sub

ReadOnly Property ElapsedTime As Integer

Get

timer.Stop()

Dim TicksGone As Integer = timer.ElapsedTicks

RestartTimer()

Return TicksGone

End Get

End Property

#End Region

The stopwatch measures things in ticks.

Test Machine

One final thing, for anoraks: the tests were performed on a Dell Precision M4500 laptop running Windows 7 Business Edition.  It has an Intel Core i7 processor running at 2.67 ghz, 32-bit operating system, 4 gb of memory.

This blog has 0 threads Add post