Pages
    Calendar
    <<  September 2010  >>
    MoTuWeThFrSaSu
    303112345
    6789101112
    13141516171819
    20212223242526
    27282930123
    45678910

    It’s has been a while since my last post due to vacation and common laziness – but now I’m back with a fresh post. This post concerns something that almost all programmers do on a daily basis: string concatenation.

    If you’re making a lot of string concatenations, you might experience performance problems. The problem with string concatenation is that strings in .NET are immutable.  That means that you discard the old string object and create a new one containing the concatenated string. This process requires some overhead and can have implications on the performance of the program.

    As most programmers know it can be a good idea to use the StringBuilder class when you’re concatenating many times. The rule of thumb is that the speed gained in concatenating with the StringBuilder is exceeded by the overhead in instantiating the StringBuilder object, if the number of concatenating is very low. But how big is the overhead in instantiating the StringBuilder object? And how many concatenations does it require for the StringBuilder to outperform the normal concatenation?

    The StringBuilder uses an array to store the strings and the joins the strings when the ToString() method is called. But what if you use a string array yourself and calls the Join() method when all the strings have been added – will the normal string array outperform the StringBuilder?

    To shed some light on this matter I designed some tests. I decided to test how long it took to make X concatenations and if made a difference how long the text string was.  This is the code I used:

    Imports System.IO

    Imports System.Diagnostics

     

    Partial Public Class _Default

        Inherits System.Web.UI.Page

     

        Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load

            Dim mStr() As String = {"a", "aaaaa", "aaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaa"}

            Dim mNumberOfConcats() As Integer = {10000, 25000, 50000, 100000}

            Dim mStrBuilder As New StringBuilder()

            For Each mStrElement As String In mStr

                For Each mConcatNumber As Integer In mNumberOfConcats

                    Response.Write("Normal concatenating '" & mStrElement & "' " & mConcatNumber & " times:<br />Operation took: " & _

                                        ConcatNormal(mStrElement, mConcatNumber) & "<br /><br />")

                    Response.Write("StrBuilder concatenating '" & mStrElement & "' " & mConcatNumber & " times:<br />Operation took: " & _

                                        ConcatStringBuilder(mStrElement, mConcatNumber) & "<br /><br />")

                    Response.Write("Array concatenating '" & mStrElement & "' " & mConcatNumber & " times:<br />Operation took: " & _

                                        ArrayConcat(mStrElement, mConcatNumber) & "<br />_______________________________________________<br />")

                    Response.Flush()

                Next

            Next

        End Sub

     

        Private Function ArrayConcat(ByVal pStr As String, ByVal pNumberOfConcats As Integer) As Long

            Dim mStopWatch As New Stopwatch()

            Dim mStr As String

            mStr = ""

            GC.Collect()

            mStopWatch.Start()

            Dim mStrArray(pNumberOfConcats) As String

            For i = 1 To pNumberOfConcats

                mStrArray(i - 1) = pStr

            Next

            Dim mFoo As String

            mFoo = [String].Join("", mStrArray)

            ArrayConcat = mStopWatch.ElapsedMilliseconds

            mStopWatch = Nothing

        End Function

     

        Private Function ConcatNormal(ByVal pStr As String, ByVal pNumberOfConcats As Integer) As Long

            Dim mStopWatch As New Stopwatch()

            Dim mStr As String

            mStr = ""

            GC.Collect()

            mStopWatch.Start()

            For i = 1 To pNumberOfConcats

                mStr += pStr

            Next

            ConcatNormal = mStopWatch.ElapsedMilliseconds

            mStopWatch = Nothing

        End Function

     

        Private Function ConcatStringBuilder(ByVal pStr As String, ByVal pNumberOfConcats As Integer) As Long

            Dim mStopWatch As New Stopwatch()

            Dim mStr As String

            mStr = ""

            GC.Collect()

            mStopWatch.Start()

            Dim mStrBuilder As New StringBuilder()

            For i = 1 To pNumberOfConcats

                mStrBuilder.Append(pStr)

            Next

            mStr = mStrBuilder.ToString()

            ConcatStringBuilder = mStopWatch.ElapsedMilliseconds

            mStopWatch = Nothing

        End Function

    End Class

    Method / Number of concatenations

    10000

    25000

    50000

    100000

    Normal 1 char

    79

    432

    2538

    17336

    StringBuilder 1 char

    0

    0

    1

    3

    Array Join 1 char

    0

    41

    1

    3

    Normal 5 chars

    502

    5664

    24183

    95160

    StringBuilder 5 chars

    0

    1

    2

    4

    Array Join 5 chars

    0

    1

    2

    4

    Normal 10 chars

    1716

    11777

    47859

    209215

    StringBuilder 10 chars

    0

    1

    3

    7

    Array Join 10 chars

    0

    1

    2

    5

    Normal 20 chars

    4395

    24340

    105893

    454174

    StringBuilder 20 chars

    1

    2

    5

    16

    Array Join 20 chars

    2

    1

    3

    7

     

    The results are in milliseconds and clearly shows that the overhead of instansiating the StringBuilder has no measureable performance hit compared to the normal concatenation. Even when making 10000 concatenations on a 20 character string the StringBuilder only uses 1 millisecond!

    What is more interesting is the impact on performance the length of the string that you’re concatenating has on the normal concatenation. Concatenating 1 and 5 characters 10000 times is over 6 times slower and when you compare 1 and 10 characters it is 21 times slower.

    The reason that we even bother thinking about performance is that humans do not like to wait to long, before getting a response to their action. Usually we don’t want to wait more than 2-3 seconds for a response. We must asume that our application must do other things than just the concatenation. That properly means that the concatenation process maximum can take between 500ms and 1000ms. Assuming that we use normal concatenation – how many concatenations can be made within that timespan?

    String length / Runtime

    100ms

    250ms

    500ms

    1000ms

    50

    1252

    1815

    2456

    3419

    100

    482

    1228

    1670

    2413

    200

    565

    866

    1206

    4689

    400

    387

    606

    827

    1178

     

    This again shows that the length of the strings that you’re concatenating is extremely important in regards to the performance.

    To sum up my advice would be to always use the StringBuilder class. It might require 2-3 extra lines of code, but there’s no measureable penalty and you will ensure that your application is extremely scalable. The way I see it there’s no reason not to use the StringBuilder. Better safe, than sorry…always use the StringBuilder people!

    What are your thoughts on the subject?

    Is there any aspects that I’ve missed in my tests?

    Feel free to comment!

     

    Yesterday I started a preperation course for Microsofts 70-536 exam (application development foundation). It is the first part of Microsoft’s MCTS in ASP.NET, which I plan to finish during 2010. Even though a lot of the curriculum is fairly basic, it is still very interesting stuff. It turns out that there are a lot of things that I didn’t know about the framework and how it fundamentally works. I figured that the stuff that I find interesting might be interesting for others too and therefore I’ve decided to write a few posts about the stuff that we cover in the course.

    This blog post is about some of the inner workings of .NET: the stack and the heap. Most programmers have a fairly good idea about what it is – but what is the actual difference between the two? And how are they used?

    Basically both the heap and the stack are memory that .NET uses for the different variables and objects. The stack is a memory block which only contains variables that have a constant size – also called value types. Int, double and Booleans are all value types, because they all have a fixed size (32 bits, 64 bits and 1 bit byte). Strings are not a value type. This is because a string does not have a predefined length and therefore the stack does not know how much space is required to store the string. String and all other classes are reference types. They are called reference types, because they are characterized by having a reference on the stack to the place on the heap.

    The heap holds all of our objects and is a much more dynamic type of memory than the stack. The main difference between the two is that the stack is much faster than the heap. It is not slower to access objects in the heap than on the stack as such – but the overhead required to maintain the heap. The maintenance of the heap is called garbage collection and is the operation of going through the objects in the heap, removing all the objects that no longer have a reference in the stack and refragmenting the memory.

    A small twist to the types that can be stored on the stack, is that structures also can be stored on the stack - but only if the structure only contains value types. For instance is System.Drawing.Point is a value type. It represents an ordered pair of integer x- and y-coordinates that defines a point in a two dimensional plane. The structure does not contain any reference types and therefore it is also a value type.