How to combine Git, Windows, and non-ASCII letters

The default Git installation in Windows works really bad if you’re using cmd.exe and have non-ASCII letters in your commit information and/or code.

Thankfully, Git is highly configurable, and the fix is rather easy:

  1. Set i18n.commitencoding to the codepage you’re on in cmd.exe (I’m on windows-1252)
  2. Set i18n.logoutputencoding to the same codepage.
  3. Set the LESSCHARSET environment variable to a proper name for the code page you’re on (I’m on latin1), either by:
    • Adding a user environment variable in Control Panel > System and Security > System > Advanced System Settings > Advanced > Environment Variables..., or…
    • Setting it your cmd.exe session (e.g. set LESSCHARSET=latin1)

Boilerplate version for your copy-paste convenience (replace encodings as necessary):

git config --global i18n.commitencodig windows-1252
git config --global i18n.logoutputencoding windows-1252
set LESSCHARSET=latin1

The first setting tells Git how your commit messages, including your author information, are encoded. The second tells Git what encoding it should use when writing output from a command like git log. The third and final setting tells less, the pager that Git runs git log output through, what encoding to use.

Diffie-Hellman support in Node.js

Yay! My patch implementing support for Diffie-Hellman key exchange in Node.js has finally been merged into the Node.js master branch. This will simplify the OpenID for Node.js codebase a lot. It will also make the OpenID association phase run a lot faster, since the current code does Diffie-Hellman in Javascript while the Node.js crypto version does it all in native code using OpenSSL.

A brief API overview:

  • crypto.createDiffieHellman(prime_length)
    • Creates a Diffie-Hellman key exchange object and generates a prime of the given bit length. The generator used is 2.
  • crypto.createDiffieHellman(prime, encoding='binary')
    • Creates a Diffie-Hellman key exchange object using the supplied prime. The generator used is 2. Encoding can be 'binary', 'hex', or 'base64'.
  • diffieHellman.generateKeys(encoding='binary')
    • Generates private and public Diffie-Hellman key values, and returns the public key in the specified encoding. This key should be transferred to the other party. Encoding can be 'binary', 'hex', or 'base64'.
  • diffieHellman.computeSecret(other_public_key, input_encoding='binary', output_encoding=input_encoding)
    • Computes the shared secret using other_public_key as the other party’s public key and returns the computed shared secret. Supplied key is interpreted using specified input_encoding, and secret is encoded using specified output_encoding. Encodings can be 'binary', 'hex', or 'base64'. If no output encoding is given, the input encoding is used as output encoding.
  • diffieHellman.getPrime(encoding='binary')
    • Returns the Diffie-Hellman prime in the specified encoding, which can be 'binary', 'hex', or 'base64'.
  • diffieHellman.getGenerator(encoding='binary')
    • Returns the Diffie-Hellman prime in the specified encoding, which can be 'binary', 'hex', or 'base64'.
  • diffieHellman.getPublicKey(encoding='binary')
    • Returns the Diffie-Hellman public key in the specified encoding, which can be 'binary', 'hex', or 'base64'.
  • diffieHellman.getPrivateKey(encoding='binary')
    • Returns the Diffie-Hellman private key in the specified encoding, which can be 'binary', 'hex', or 'base64'.
  • diffieHellman.setPublicKey(public_key, encoding='binary')
    • Sets the Diffie-Hellman public key. Key encoding can be 'binary', 'hex', or 'base64'.
  • diffieHellman.setPrivateKey(public_key, encoding='binary')
    • Sets the Diffie-Hellman private key. Key encoding can be 'binary', 'hex', or 'base64'.

NOTE: The API is still subject to change.

I would appreciate getting a note if you actually do something useful with it. :) Play around with it and let me know what you think!

Mocking HtmlHelper in ASP.NET MVC 2 and 3 using Moq

Still having trouble mocking HtmlHelper? This is an update to my previous post on mocking HtmlHelper way back when ASP.NET MVC RC1 was released. Eric notified me through a comment on the post and a question on StackOverflow that the code for ASP.NET MVC RC1 did not work with ASP.NET MVC 2. The code in this post should work with ASP.NET MVC 2 and ASP.NET MVC 3 Preview 1.


public static HtmlHelper CreateHtmlHelper(ViewDataDictionary vd)
{
    Mock<ViewContext> mockViewContext = new Mock<ViewContext>(
        new ControllerContext(
            new Mock<HttpContextBase>().Object,
            new RouteData(),
            new Mock<ControllerBase>().Object),
        new Mock<IView>().Object,
        vd,
        new TempDataDictionary(),
        new Mock<TextWriter>().Object);
    var mockViewDataContainer = new Mock<IViewDataContainer>();
    mockViewDataContainer.Setup(v => v.ViewData)
        .Returns(vd);
    return new HtmlHelper(mockViewContext.Object,
                            mockViewDataContainer.Object);
}

Copyable available on GitHub

People actually download and use Copyable, and they tend to use it in scenarios I haven’t used it in. This results in bug reports and patch submissions. So far, these have been given to me by e-mail or by blog comment, neither of which is a particularly great way of receiving them. So after receiving another one today, I finally got around to putting Copyable on GitHub.

The version I put up includes several enhancements from the latest release:

  • It uses FormatterServices.GetUninitializedObject and hence does not depend on a parameterless constructor or custom instance provider (but you can of course still create an instance provider if you want to control object initialization)
  • The bug with copy semantics for already visited objects submitted by Walter Oesch has been fixed
  • The bug with inherited fields found by Alex, and the patch submitted for it, has been incorporated

Bleeding edge Copyable can be found at http://github.com/havard/copyable. The clone URL is git://github.com/havard/copyable.git. Now go fix your own bugs! Or even better, enhance the framework.

Minimalistic MapReduce in .NET 4.0 with the new Task Parallel Library (TPL)

Among the news in .NET 4.0 are several additions by the Parallel Computing Platform Team. As I wandered through the documentation of the Task library with cloud computing and parallelism buzz in the back of my head, I got the idea of using tasks to create a minimalistic MapReduce. Here’s the result, a rather crude and simple, but efficient MapReduce for you to play with and utilize!

What is MapReduce?

For those of you who don’t know what MapReduce is: MapReduce is a simplified interface for parallel data processing. MapReduce was initially described by the Google engineers Jeffrey Dean and Sanjay Ghemawat in the 2004 paper titled MapReduce: Simplified data processing on large clusters.

MapReduce processes data by splitting the processing in to a set of transformations (in functional programming, this is called the “map” function (it maps or transforms an input to an output)). The results of the transformations are then combined into a single result (in functional programming, this is called the “reduce” function (it reduces a set of values to a single value)). On a sidenote, Linq has equivalent functions, but the names are different, presumably to make them more familiar to people with SQL knowledge. In Linq, map is called Select, and reduce is called Aggregate.

Shortly put, to process a huge set of data, you split the data into chunks and process each chunk in parallel. This eventually creates a new set of intermediary results, which is reduced to a single result.

Implementing a minimalistic MapReduce in .NET 4.0

The signature of my MapReduce function is


static Task<TResult> Start<TInput, TPartial, TResult>(
  Func<TInput, TPartial> map, 
  Func<TPartial[], TResult> reduce, 
  params TInput[] inputs);</pre>

In other words, to start a MapReduce run, you supply a map function, a reduce function, and a set of inputs. Each input will be turned into an intermediate result (of type TPartial). Inputs are transformed concurrently. When all inputs are transformed, the reduce function is called to transform the partial results into a final result (of type TResult). Cool!

The map part is implemented by starting a task for each supplied input using Task.Factory.StartNew().


Task.Factory.StartNew(() => map(input));

The reduce part is implemented as a continuation of all the map tasks, meaning that the reduce task waits for all the map tasks to complete, and then executes. This is achieved using Task.Factory.ContinueWhenAll.


Task.Factory.ContinueWhenAll(
  mapTasks, 
  tasks => PerformReduce(reduce, tasks));

As you can see, the implementation is minimalistic and simple, and usage is likewise.

Here’s a simple example using MapReduce to calculate the root mean square (MSE) of a set of values:


var task = MapReduce.Start<int, int, double>(
  i => i * i,
  s => Math.Sqrt(s.Aggregate((a, b) => a + b) / 5),
  1, 2, 3, 4, 5);
// Wait for result
task.Wait();
// Prints 3.3166...
Console.WriteLine(task.Result);

Actual applications of MapReduce are of course far more interesting than this simple example.

Applications of MapReduce

MapReduce can essentially be applied to any problem where you need a number of things to be done in parallel. It can even be applied in cases where you don’t need a final result. Just return an arbitrary value as the result (or even better, implement a variant of my MapReduce which uses Action<T>).

A few obvious use cases:

  • Distributed search
  • Distributed sort
  • Tokenization
  • Indexing
  • Log processing
  • Machine learning
  • General artificial intelligence
  • General data mining
  • Large scale image processing

The list goes on and on, these are just a few things off the top of my head.

You can grab the source code for MapReduce here. Since this is done in .NET 4.0, it requires Visual Studio 2010 Beta 2 or later.

As usual, play around with it, have fun, and let me know if you find it useful!