One of the first pieces of efficiency advice most .NET developers learn is "use StringBuilder
to concatenate strings". A little bit like "exceptions are expensive" this is a misunderstood piece of received wisdom. (Fortunately it's not nearly as harmful as the exception performance myth, but it comes up about as often...)
Before reading the rest of this page, you should be aware of the basics of the String
type. For the sake of readability, I'll stick to "string" rather than "String
" or "string
" from here onwards. Please let me know if you find this confusing, and I'll change it.
I've included this in my list of general framework articles rather than in the list of C#-specific articles because I suspect every language targetting .NET is likely to use the same way of concatenating strings under the hood.
The Problem We're Trying To Avoid
There is a very real problem that this wisdom correctly addresses - that of building a load of strings (usually of increasing size) which are never used apart from contributing to the creation of other strings. Here's an example:using System; |
x += "!";
is absolutely equivalent to x = x+"!";
. The concatenation here is creating an entirely new string, allocating enough memory for everything, copying all the data from the existing value of x
and then copying the data from the string being appended ("!"). As the string grows, the amount of data it has to copy each time grows too, which is why the time taken didn't just double when I doubled the number of iterations. This is clearly inefficient. If someone asked you to add something to a shopping list, you wouldn't write a new copy of the shopping list first, would you? Enter
StringBuilder
... The StringBuilder
Solution
Here is an equivalent (in terms of the final value of x
) program, which is much, much faster: using System; |
StringBuilder
maintains an internal buffer and appends to that, only copying its buffer when there isn't room for any more data. (In fact, the internal buffer is just a string - strings are immutable from a public interface perspective, but not from within the mscorlib
assembly.) We could make the above code even more efficient by passing the final size of the string (which we happen to know in this case) to the constructor of StringBuilder
to make it use a buffer of the right size to start with - then there'd be no unnecessary copying at all. Unless you're in a situation where you have that information readily to hand though, it's usually not worth worrying about - StringBuilder
doubles its buffer size when it runs out of room, so it doesn't end up copying the data very many times anyway. So I Should Use StringBuilder
Everywhere, Right?
No, quite simply. The above is an explanation of why the received wisdom of "use StringBuilder
for concatenation" is right some of the time. However, many people take it at face value without understanding the reasoning behind it. They start turning code like this: string name = firstName + " " + lastName; |
// Bad code! Do not use! |
However, the second version is actually less efficient than the first! Not much less efficient - and if the second version were more readable, I'd go with it for the reasons above - but when the entire point of using
StringBuilder
is to improve efficiency, using the above is just plain nuts. The first version (assuming that
firstName
and lastName
are real variables, and not constants - I'll come onto that later) compiles to a call to String.Concat
, like this: string name = String.Concat (firstName, " ", lastName); |
String.Concat
takes a bunch of strings (or objects) and concatenates them together, plain and simple. There are various overloads - some take strings, some take objects (which are just converted into strings), some take arrays of objects or arrays of strings. They all do the same thing though. Now, String.Concat
can work out the lengths of all the strings involved before it concatenates them together (at least if you pass it strings - if you pass it objects, it needs to create temporary strings and then concatenate those together). This means that no extra copying is involved - the data is copied once into the new string, which is of exactly the right length. Compare this with the
StringBuilder
version. It doesn't know at construction time how big to make the buffer (because we haven't told it - doing so would make the code even less readable). That means it may have to copy the buffer, and it's likely to end up with a buffer which is actually larger than it needs to be. Oh, and there's the overhead of an extra object (the StringBuilder
itself). Remind me why this was meant to be a good idea? The important difference between this example and the previous one is that we can easily present all the strings which need to be concatenated together in one call to
String.Concat
. That means that no intermediate strings are needed. StringBuilder
is efficient in the first example because it acts as a container for the intermediate result without having to copy that result each time - when there's no intermediate result anyway, it has no advantage. Constants
Things get even crazier when it comes to string constants (literals,const string
members). What do you suppose string x = "hello" + " " + "there";
is compiled to? It would be reasonable to expect it to be another call to String.Concat
- but it isn't. It's actually compiled to the exact same code as string x = "hello there";
. The compiler knows that all the parts are constant, so it does all the concatenation at compile time, storing the full string in the compiled code. Converting that to use StringBuilder
is inefficient in both memory and speed, as well as reducing readability. Rules Of Thumb
So, when should you useStringBuilder
, and when should you use the string concatenation operators? - Definitely use
StringBuilder
when you're concatenating in a non-trivial loop - especially if you don't know for sure (at compile time) how many iterations you'll make through the loop. For example, reading a file a character at a time, building up a string as you go using the+=
operator is potentially performance suicide. - Definitely use the concatenation operator when you can (readably) specify everything which needs to be concatenated in one statement. (If you have an array of things to concatenate, consider calling
String.Concat
explicitly - orString.Join
if you need a delimiter.) - Don't be afraid to break literals up into several concatenated bits - the result will be the same. You can aid readability by breaking a long literal into several lines, for instance, with no harm to performance.
- If you need the intermediate results of the concatenation for something other than feeding the next iteration of concatenation,
StringBuilder
isn't going to help you. For instance, if you build up a full name from a first name and a last name, and then add a third piece of information (the nickname, maybe) to the end, you'll only benefit from usingStringBuilder
if you don't need the (first name + last name) string for other purpose (as we do in the example which creates aPerson
object). - If you just have a few concatenations to do, and you really want to do them in separate statements, it doesn't really matter which way you go. Which way is more efficient will depend on the number of concatenations the sizes of string involved, and what order they're concatenated in. If you really believe that piece of code to be a performance bottleneck, profile or benchmark it both ways.
0 Comments Received
Leave A Reply