Extension methods for copying or cloning objects

C# 3.0 includes a new feature known as extension methods, and fiddling with it triggered the idea of creating a mechanism for copying or cloning (virtually) any .NET object or graph of objects. The manifestation of that idea has become a rather decent little framework for copying objects. It performs a deep copy as automatically as it possibly can, and provides mechanisms to easily solve many of the cases which cannot be covered automatically. It is great for copying your custom object hierarchies, and saves you the pain of a solution like implementing ICloneable for an entire hierarchy of objects. Click here to grab it now, and read on for a presentation.

Let’s start off with a few words on extension methods. They are best explained through an example. Let’s say we want to be able to calculate area given size. Wouldn’t it be nice to be able to add GetArea to the already existing Size class? Well, let’s do so!

public static class ExtensionMethods
{
  public static int GetArea(this Size size)
  {
    return size.Width * size.Height;
  }
}
As you can see, the new syntax simply allows you to tell the compiler that the this of this method is a Size. This means that the method is an extension of the Size class.

As mentioned, I had the idea of extending the very base of the C# class hierarchy (System.Object) with a method for copying or cloning “any” object. Obviously, the method cannot automatically copy any object, since it cannot possibly know how to construct an object from an arbitrary class. Hence, a small framework needed to be created. The goals were to:

  • Enable copying of many objects automatically.
  • Enable copying of virtually any object with very little effort.
  • Automate and hide away as much as possible (The KISS Principle).

The result is Copyable (pun intended).

The Copyable framework

Copyable is a small framework for copying (or cloning, if you will) objects. The straightforward way of using it is to just reference the assembly it’s in from your project, and start copying!

SomeType instance = new SomeType();
// ...do lots of stuff to the object...
SomeType copy = instance.Copy(); // Create a deep copy

The instance copy is now a deep copy of instance, no matter how complex the object graph for instance is. The relations in the copy graph is the same as in instance, but all objects in the copy object graph are copies of those in instance.

For the automated copy to work, though, one of the following statements must hold for instance:

  • Its type must have a parameterless constructor, or
  • It must be a Copyable, or
  • It must have an IInstanceProvider registered for its type.

Besides the Copy method, The Copyable class and IInstanceProvider interface are the two major building blocks of the Copyable framework. Each of these blocks enable copying of objects that cannot automatically be copied.

The Copyable base class

Copyable is an abstract base class for objects that can be copied. To create a copyable class, you simply subclass Copyable and call its constructor with the arguments of your constructor.

class MyClass : Copyable
{
  public MyClass(int a, double b, string c)
    : base(a, b, c)
  {
  }
}

This code above makes MyClass a copyable class. Note that if MyClass had had a parameterless constructor, subclassing Copyable would not be necessary.

MyClass can now be copied just like the previous example.

MyClass a = new MyClass(1, 2.0, "3");
MyClass b = a.Copy();

The introduction of the Copyable base class solves many problems, but not all. Let’s say you wanted to copy a System.Drawing.SolidBrush. This class does not have a parameterless constructor, which means it cannot be copied “automatically” by the framework. Also, you cannot alter it so that it subclasses Copyable. So, what do you do? You create an instance provider.

The IInstanceProvider interface

An instance provider is defined by the interface IInstanceProvider. As the name clearly states, the implementation is a provider of instances. One instance provider can provide instances of one given type. The Copyable framework automatically detects IInstanceProvider implementations in all assembies in its application domain, so all you need to do to create a working instance provider is to define it. No registration or other additional operations are required. To simplify the implementation of instance providers and the IInstanceProvider interface, an abstract class InstanceProvider is included in the framework.

public class SolidBrushProvider</dt>
<dd>InstanceProvider<SolidBrush>
{
public override SolidBrush CreateTypedCopy(SolidBrush s)
{
return new SolidBrush(s.Color);
}
}

This implementation will be used automatically by the Copyable framework. NOTE: To be usable, the instance provider MUST have a parameterless constructor.

The instance provider pattern does not solve the case where you want different initial states for your SolidBrush instances depending on which context you use them for copying. For those cases, an overload of Copy() exists which takes an already created instance as an argument. This argument will become the copy.

SolidBrush instance = new SolidBrush(Color.Red);
instance.Color = Color.Black;
SolidBrush copy = new SolidBrush(Color.Red);
instance.Copy(copy); // Create a deep copy

In this example, copy is now of the color Color.Black.

Limitations and pitfalls

Although this solution works in most cases, it’s not a silver bullet. Be aware when you copy classes that hold unmanaged resources such as handles. If these classes are designed on the premise that their resources are exclusive to them, they will manage them as they see fit. Imagine if you copied a class which holds a handle, disposed one of the instances, and continued using the copy. The handle will (probably) be freed by the original instance, and the copy will generate an access violation by attempting reading or writing freed memory.

That’s it! The Copyable framework can be downloaded from here. For those interested in reading more on extension methods, For additional information, MSDN provides an excellent explanation in the C# Programming Guide, and Scott Guthrie has an introduction article here.

Enjoy Copyable, and please let me know if you find it useful or come across any problems with it.

UPDATE 2009-12-11: Due to popular demand, I have made the source code for Copyable available under the MIT license. The source can be downloaded here.

UPDATE 2010-01-31: The requirement of parameterless constructors has been removed in the latest version of Copyable available on GitHub. A new release will follow soon.

Posted in C#, Code, Technical | Tagged , , , , , , | 48 Comments

Five advices on implementing a cache

I’ve spent the last few days at work implementing a cache in the data access layer (DAL) of one of our services. The cache works great, and speeds up our service very much in some cases, and somewhat in all cases. I’ve implemented caches before, and experienced many of the difficulties that arise when introducing a cache. It always seems rather easy, and always has unwanted side effects. The general advice is of course not to do it (and the advice from the database guys are always not to do it), but here are my five best advices on what you should consider if you decide to do it.

1. Make sure the cache is transparent.

The system shall not in any way notice that there is a cache handing objects to it instead of the database, nor should the introduction of a cache require changes in any layers above the layer where the cache resides. If you decide that changes are required, be aware that you are making changes to code that does work, and that re-verifying its behavior is a hard and expensive task.

2. If your system is transactional, make sure that the lifetime and mutability of objects in your cache matches your transaction isolation level.

The obvious solution for a cache in a transactional system is a cache that lives per transaction, but this is not guaranteed to work. As an example, let’s say that transaction 1 reads some data and starts operating on them. Meanwhile, transaction 2 reads, alters, and commits some data that partially or fully depends on the data read by transaction 1. After the commit, transaction 1 reads some other data that depend on the new data committed by transaction 1. In this case, a cache in transaction 1 alters the system behavior if a too low isolation level is used (the alteration is most likely correct, but does not need to be, and it most certainly changes the behavior nonetheless) (see this Wikipedia page for an explanation of transaction isolation levels). Be aware of your isolation level, and know also that the default isolation level is different from RDBMS to RDBMS. As an example, MySQL uses repeatable read as the default isolation level, while MS SQL Server uses read committed. With an isolation level less than repeatable read, a cache with any mutable data in it is essentially useless.

3. Use the cache for immutable data as much as possible.

Also, use the cache for mutable data as little as possible, since this significantly increases the difficulty of the cache implementation, and with it the risk of errors.

4. Give the objects in the cache an as short lifetime as possible.

When implementing a cache, you want the objects in your cache to live as long as possible, since accessing the cache is much faster than accessing the database. Well, think about this: A cache with objects that live forever is actually a replacement for your database, which is not what you want to achieve. To avoid an implementation that is difficult, hard to verify, and unnecessary complex, make objects live as short as possible, while maintaining an increase in speed.

5. Be absolutely certain that the keys you use in your cache identify objects uniquely and unambiguously.

This sounds obvious, but with complex object hierarchies and caching of different parts of the hierarchy at different levels, it suddenly becomes very hard. In general, cache either the top-most or the lower-most objects in your hierarchy. The choice of an approach depends on how you access your data. The best way to decide which approach to take is to do a thorough analysis of your data and the objects that represent them, how you access these objects, how you use them, and in which cases you are most likely to gain speed by introducing a cache.

Posted in Software Design, Technical | Tagged | Leave a comment

Silverlight Project Template for Visual Studio 2005

For those of you who do not want to grab the Visual Studio 2008 beta and/or wait for the release, I have created a Silverlight Project Template for Visual Studio 2005. This template makes it possible to develop Silverlight solutions under Visual Studio 2005. Grab the installer here if you can’t wait, or read on for further information.

In case you are not familiar with Silverlight, here’s the short version: Silverlight is Microsoft’s new cross-browser, cross-platform plugin for creating rich interactive applications. Silverlight implements a subset of the .NET runtime and Windows Presentation Foundation in particular. Cross-platform means Windows and Mac OS X, so there is no Linux support yet, but the Mono project are working on their own implementation of Silverlight, dubbed Moonlight.

What I find particularly interesting is that Silverlight implements a new runtime platform known as the DLR, which makes it possible to use (and blend) dynamic languages such as Python, Ruby, and Jscript when creating Silverlight applications. More generally, the DLR is a runtime for dynamic languages of any kind, which makes it one of the most interesting recent additions to the programming universe.

Longing to fiddle with Silverlight, I came across documentation on how to create Silverlight assemblies in Visual Studio 2005 written by Michael Schwarz, and created the project template based on that.

To ease the task of creating Silverlight projects, the template includes a wizard which asks for the path to your Silverlight installation, and remembers it for the future if it is correct. No further input is needed.

Note that debugging Silverlight assemblies is not possible under Visual Studio 2005, to my knowledge. If I am wrong, let me know, and I will try to add support for out-of-the-box debugging when launching the project from Visual Studio 2005. For now, you will have to point your browser to the Default.html file included in each project.

Posted in Announcements, Technical | Tagged | 25 Comments

LINQ vs Loop – A performance test

I just installed Visual Studio 2008 beta 2 to see what the future holds for C#. The addition of LINQ has brought a variety of query keywords to the language. “Anything” can be queried; SQL databases (naturally), XML documents, and regular collections. Custom queryable objects can also be created by implementing IQueryable. Sadly, like every abstraction, these goodies all come at a cost. The question is how much?

I decided to create a simple test to see how much of a performance hit LINQ is. The simple test I deviced finds the numbers in an array that are less than 10. The code is quoted below.

public void LinqTest()
{
    const int SIZE = 10000, RUNS = 1000;
    int[] ints = new int[SIZE];
    for (int i = 0; i &lt; SIZE; i++)
        ints[i] = i;</p>

<pre><code>DateTime start = DateTime.Now;
for (int t = 0; t &lt; RUNS; ++t)
{
    int[] less = (from int i in ints
                     where i &lt; 10
                     select i).ToArray();
}
TimeSpan spent = DateTime.Now - start;
Trace.WriteLine(string.Format("LINQ: {0}, avg. {1}", 
    spent, new TimeSpan(spent.Ticks / RUNS)));

DateTime start2 = DateTime.Now;
for (int t = 0; t &lt; RUNS; ++t)
{
    var l = new List&lt;int&gt;();
    foreach (var i in ints)
        if (i &lt; 10)
            l.Add(i);
    int[] less2 = l.ToArray();
}

TimeSpan spent2 = DateTime.Now - start2;
Trace.WriteLine(string.Format("Loop: {0}, avg. {1}", 
    spent2, new TimeSpan(spent2.Ticks / RUNS)));
</code></pre>

<p>}

Initially, I assumed the performance impact would not be too large, since its equivalent is the straightforward imperative loop, which should not be too hard for a compiler to deduce given static typing and a single collection to iterate across. Or?


LINQ: 00:00:04.1052060, avg. 00:00:00.0041052
Loop: 00:00:00.0790965, avg. 00:00:00.0000790

As you can see, the performance impact is huge. LINQ performs 50 times worse than the traditional loop! This seems rather wild at first glance, but the explanation is this: The keywords introduced by LINQ are syntactic sugar for method invocations to a set of generic routines for iterating across collections and filtering through lambda expressions. Naturally, this will not perform as good as a traditional imperative loop, and less optimization is possible.

Having seen the performance impact, I am still of the view that LINQ is a great step towards a more declarative world for developers. Instead of saying “take these numbers, iterate over all of them, and insert them into this list if they are less then ten”, which is an informal description of a classical imperative loop, you can now say “from these numbers, give me those that are less than ten”. The difference may be subtle, but the latter is in my opinion far more declarative and easy to read.

This may very well be the next big thing, but it comes at a cost. So far, my advice is to create simple performance tests for the cases where you consider adopting LINQ, to spot possible pitfalls as early as possible.

Posted in Code, Technical | Tagged | 20 Comments

…and we’re back!

Cross the ceiling and circle the calendar!

The site is back after being down for close to 6 months. I hope you still enjoy the stuff that are here, and hope to bring you more interesting stuff in the future.

Posted in Announcements | Leave a comment