<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Circles and Crosses &#187; Technical</title>
	<atom:link href="http://ox.no/posts/tag/technical/feed" rel="self" type="application/rss+xml" />
	<link>http://ox.no</link>
	<description>Håvard Stranden&#039;s website</description>
	<lastBuildDate>Sat, 20 Aug 2011 00:11:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Copyable available on GitHub</title>
		<link>http://ox.no/posts/copyable-available-on-github</link>
		<comments>http://ox.no/posts/copyable-available-on-github#comments</comments>
		<pubDate>Thu, 10 Dec 2009 23:10:29 +0000</pubDate>
		<dc:creator>Håvard</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[.net]]></category>
		<category><![CDATA[copy]]></category>
		<category><![CDATA[copyable]]></category>
		<category><![CDATA[git]]></category>
		<category><![CDATA[github]]></category>
		<category><![CDATA[open source]]></category>

		<guid isPermaLink="false">http://ox.no/?p=144</guid>
		<description><![CDATA[People actually download and use Copyable, and they tend to use it in scenarios I haven&#8217;t used it in. This results in bug reports and patch submissions. So far, these have been given to me by e-mail or by blog &#8230; <a href="http://ox.no/posts/copyable-available-on-github">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>People actually download and use Copyable, and they tend to use it in scenarios I haven&#8217;t used it in. This results in bug reports and patch submissions. So far, these have been given to me by e-mail or by blog comment, neither of which is a particularly great way of receiving them. So after receiving another one today, I finally got around to putting Copyable on <a href="http://github.com">GitHub</a>.</p>

<p>The version I put up includes several enhancements from the latest release:</p>

<ul>
<li>It uses <code>FormatterServices.GetUninitializedObject</code> and hence does not depend on a parameterless constructor or custom instance provider (but you can of course still create an instance provider if you want to control object initialization)</li>
<li>The bug with copy semantics for already visited objects submitted by Walter Oesch has been fixed</li>
<li>The bug with inherited fields found by Alex, and the patch submitted for it, has been incorporated</li>
</ul>

<p>Bleeding edge Copyable can be found at <a href="http://github.com/havard/copyable">http://github.com/havard/copyable</a>. The clone URL is <a href="git://github.com/havard/copyable.git">git://github.com/havard/copyable.git</a>. Now go fix your own bugs! Or even better, enhance the framework.</p>
]]></content:encoded>
			<wfw:commentRss>http://ox.no/posts/copyable-available-on-github/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Minimalistic MapReduce in .NET 4.0 with the new Task Parallel Library (TPL)</title>
		<link>http://ox.no/posts/minimalistic-mapreduce-in-net-4-0-with-the-new-task-parallel-library-tpl</link>
		<comments>http://ox.no/posts/minimalistic-mapreduce-in-net-4-0-with-the-new-task-parallel-library-tpl#comments</comments>
		<pubDate>Tue, 03 Nov 2009 22:58:52 +0000</pubDate>
		<dc:creator>Håvard</dc:creator>
				<category><![CDATA[C#]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Concurrency]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[.NET 4.0]]></category>
		<category><![CDATA[Parallel]]></category>
		<category><![CDATA[TPL]]></category>

		<guid isPermaLink="false">http://ox.no/?p=119</guid>
		<description><![CDATA[Among the news in .NET 4.0 are several additions by the Parallel Computing Platform Team. As I wandered through the documentation of the Task library with cloud computing and parallelism buzz in the back of my head, I got the &#8230; <a href="http://ox.no/posts/minimalistic-mapreduce-in-net-4-0-with-the-new-task-parallel-library-tpl">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Among the news in .NET 4.0 are several additions by the <a href="http://blogs.msdn.com/pfxteam/">Parallel Computing Platform Team</a>. As I wandered through the documentation of the Task library with cloud computing and parallelism buzz in the back of my head, I got the idea of using tasks to create a minimalistic MapReduce. Here&#8217;s the result, a rather crude and simple, but efficient MapReduce for you to play with and utilize!</p>

<!-- more -->

<h2>What is MapReduce?</h2>

<p>For those of you who don&#8217;t know what MapReduce is: MapReduce is a simplified interface for parallel data processing. MapReduce was initially described by the Google engineers Jeffrey Dean and Sanjay Ghemawat in the 2004 paper titled <a href="http://labs.google.com/papers/mapreduce.html">MapReduce: Simplified data processing on large clusters</a>.</p>

<p>MapReduce processes data by splitting the processing in to a set of transformations (in functional programming, this is called the &#8220;map&#8221; function (it maps or transforms an input to an output)). The results of the transformations are then combined into a single result (in functional programming, this is called the &#8220;reduce&#8221; function (it reduces a set of values to a single value)). On a sidenote, Linq has equivalent functions, but the names are different, presumably to make them more familiar to people with SQL knowledge. In Linq, map is called <code>Select</code>, and reduce is called <code>Aggregate</code>.</p>

<p>Shortly put, to process a huge set of data, you split the data into chunks and process each chunk in parallel. This eventually creates a new set of intermediary results, which is reduced to a single result.</p>

<h2>Implementing a minimalistic MapReduce in .NET 4.0</h2>

<p>The signature of my MapReduce function is
<pre class="brush: csharp; ">

static Task&lt;TResult&gt; Start&lt;TInput, TPartial, TResult&gt;(
  Func&lt;TInput, TPartial&gt; map, 
  Func&lt;TPartial[], TResult&gt; reduce, 
  params TInput[] inputs);&lt;/pre&gt;

</pre></p>

<p>In other words, to start a MapReduce run, you supply a <code>map</code> function, a <code>reduce</code> function, and a set of inputs. Each input will be turned into an intermediate result (of type <code>TPartial</code>). Inputs are transformed concurrently. When all inputs are transformed, the <code>reduce</code> function is called to transform the partial results into a final result (of type <code>TResult</code>). Cool!</p>

<p>The map part is implemented by starting a task for each supplied input using <code>Task.Factory.StartNew()</code>.</p>

<p><pre class="brush: csharp; ">

Task.Factory.StartNew(() =&gt; map(input));

</pre></p>

<p>The reduce part is implemented as a <a href="http://en.wikipedia.org/wiki/Continuation">continuation</a> of all the map tasks, meaning that the reduce task waits for all the map tasks to complete, and then executes. This is achieved using <code>Task.Factory.ContinueWhenAll</code>.</p>

<p><pre class="brush: csharp; ">

Task.Factory.ContinueWhenAll(
  mapTasks, 
  tasks =&gt; PerformReduce(reduce, tasks));

</pre></p>

<p>As you can see, the implementation is minimalistic and simple, and usage is likewise.</p>

<p>Here&#8217;s a simple example using MapReduce to calculate the <a href="http://en.wikipedia.org/wiki/Root_mean_square">root mean square (MSE)</a> of a set of values:
<pre class="brush: csharp; ">

var task = MapReduce.Start&lt;int, int, double&gt;(
  i =&gt; i * i,
  s =&gt; Math.Sqrt(s.Aggregate((a, b) =&gt; a + b) / 5),
  1, 2, 3, 4, 5);
// Wait for result
task.Wait();
// Prints 3.3166...
Console.WriteLine(task.Result);

</pre></p>

<p>Actual applications of MapReduce are of course far more interesting than this simple example.</p>

<h2>Applications of MapReduce</h2>

<p>MapReduce can essentially be applied to any problem where you need a number of things to be done in parallel. It can even be applied in cases where you don&#8217;t need a final result. Just return an arbitrary value as the result (or even better, implement a variant of my MapReduce which uses <code>Action&lt;T&gt;</code>).</p>

<p>A few obvious use cases:</p>

<ul>
<li>Distributed search</li>
<li>Distributed sort</li>
<li>Tokenization</li>
<li>Indexing</li>
<li>Log processing</li>
<li>Machine learning</li>
<li>General artificial intelligence</li>
<li>General data mining</li>
<li>Large scale image processing</li>
<li>&#8230;</li>
</ul>

<p>The list goes on and on, these are just a few things off the top of my head.</p>

<p>You can grab the <a href="http://ox.no/files/MapReduce.cs">source code for MapReduce here</a>. Since this is done in .NET 4.0, it requires Visual Studio 2010 Beta 2 or later.</p>

<p>As usual, play around with it, have fun, and let me know if you find it useful!</p>
]]></content:encoded>
			<wfw:commentRss>http://ox.no/posts/minimalistic-mapreduce-in-net-4-0-with-the-new-task-parallel-library-tpl/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

