.NET 3.5 brought a bunch of great extension methods to the framework, most of which extended the IEnumerable<T> (or IQueryable<T>) interfaces implemented by just about every collection type. These include simple ones such as Where, OrderBy, FirstOrDefault, etc as well as more complex ones like Aggregate, GroupBy, etc. The Enumerable class really covers a lot of ground, but I’ve created a couple more that I use from time to time in my “Common” assembly.

The extension method I’m writing about today is called Batch. It allows you to take any input sequence (array, collection, iterator, etc.) and process it in batches of a specified size.

Many times you will have an input sequence that contains a lot of items but for various reasons, you can’t process the entire sequence at once (memory constraints, you want to update a console with progress, etc.) and processing one at a time is too slow.

Before, I used to write code that would look like this.

var batch = new List();

foreach (var item in sequence) {

    batch.Add(item);

    if (batch.Count == 100) {
        BulkInsert(batch);
        batch.Clear();
    }

}

This isn’t too bad but it’s a lot of ceremony and extension methods are all about turning globs like that into more readable, concise code. The method signature for the Batch method looks like the following.

/// <summary>
/// Enumerates a sequence in chunks, yielding batches of a certain size to the enumerator.
/// </summary>
/// <typeparam name="T">The type of item in the batch.</typeparam>
/// <param name="sequence">The sequence of items to be enumerated.</param>
/// <param name="batchSize">The maximum number of items to include in a batch.</param>
/// <returns>A sequence of arrays, with each array containing at most
/// <paramref name="batchSize"/> elements.</returns>
public static IEnumerable<T[]> Batch<T>( this IEnumerable<T> sequence, int batchSize )

As the summary suggests, it converts an input sequence into a sequence of arrays of a specified size. The last array in the sequence will have fewer items than the batch size if the input count is not evenly divisible by batchSize. Calling it is very simple. The first example can be rewritten as the following.

foreach (Record[] batch in sequence.Batch(100)) {
    BulkInsert(batch);
}

The complete code example is shown below.

/// <summary>
/// Enumerates a sequence in chunks, yielding batches of a certain size to the enumerator.
/// </summary>
/// <typeparam name="T">The type of item in the batch.</typeparam>
/// <param name="sequence">The sequence of items to be enumerated.</param>
/// <param name="batchSize">The maximum number of items to include in a batch.</param>
/// <returns>A sequence of arrays, with each array containing at most
/// <paramref name="batchSize"/> elements.</returns>
public static IEnumerable<T[]> Batch<T>( this IEnumerable<T> sequence, int batchSize )
{

    var batch = new List<T>( batchSize );

    foreach ( var item in sequence ) {

        batch.Add( item );

        // when we've accumulated enough in the
        // batch, send it out
        if ( batch.Count >= batchSize ) {
            yield return batch.ToArray( );
            batch.Clear( );
        }   // if

    }   // foreach

    // send out any leftovers
    if ( batch.Count > 0 ) {
        yield return batch.ToArray( );
        batch.Clear( );
    }   // if

}

Regular Expressions in .NET are pretty easy to use (assuming you understand the Regex syntax which is beside the case) and certainly you can think of some useful extension methods for System.String that would allow you to quickly validate against a particular regular expression pattern. But regular expressions have another great feature that you maybe don’t use as much – that is the ability to capture subexpressions into groups so that you can pull out a piece of the match.

The way this is done using the Regex class is pretty straightforward.

// Find a word that starts with H and W
string input = "Hello World";
string pattern = @"(H\w+) (W\w+)";

Match m = Regex.Match(input, pattern);
if (m.Success) {
    string hWord = m.Groups[1].Value;
    string wWord = m.Groups[2].Value;
}

This is easy enough, but I am annnoyed by the code that accesses the groups. It seems like such a mundane detail to worry about Group and Match objects. What if the code could be simplified like the following:

string input = "Hello World";
string pattern = @"(H\w+) (W\w+)";

string hWord, wWord;
if (input.MatchInto(pattern, out hWord, out wWord)) ...

It may not look like much of a savings in terms of lines of code, but I find that it looks much cleaner and is a lot less explicit. The full code is below.

/// <summary>
/// Performs a regular expression match against the specified string, and places the captured groups into the output parameters.
/// </summary>
/// <param name="input">The input string to match against.</param>
/// <param name="pattern">The regular expression pattern which should contain grouping expressions.</param>
/// <param name="value1">Receives the value of the 1st capture group (Groups[1]) or null if no match was made.</param>
/// <param name="value2">Receives the value of the 2nd capture group (Groups[2]) or null if no match was made.</param>
/// <param name="value3">Receives the value of the 3rd capture group (Groups[3]) or null if no match was made.</param>
/// <param name="value4">Receives the value of the 4th capture group (Groups[4]) or null if no match was made.</param>
/// <param name="value5">Receives the value of the 5th capture group (Groups[5]) or null if no match was made.</param>
/// <returns>True if the pattern matched (not necessarily all groups) otherwise false.</returns>
public static bool MatchInto( this string input, string pattern, out string value1, out string value2, out string value3, out string value4, out string value5 )
{

    value1 = value2 = value3 = value4 = value5 = null;

    var match = Match( input, pattern );
    if ( match.Success ) {

        // Value1
        if ( match.Groups.Count > 1 && match.Groups[1].Success ) {
            value1 = match.Groups[1].Value;
        }   // if

        // Value2
        if ( match.Groups.Count > 2 && match.Groups[2].Success ) {
            value2 = match.Groups[2].Value;
        }   // if

        // Value3
        if ( match.Groups.Count > 3 && match.Groups[3].Success ) {
            value3 = match.Groups[3].Value;
        }   // if

        // Value4
        if ( match.Groups.Count > 4 && match.Groups[4].Success ) {
            value4 = match.Groups[4].Value;
        }   // if

        // Value5
        if ( match.Groups.Count > 5 && match.Groups[5].Success ) {
            value5 = match.Groups[5].Value;
        }   // if

        return true;

    }   // if

    return false;

}

/// <summary>
/// Performs a regular expression match against the specified string, and places the captured groups into the output parameters.
/// </summary>
/// <param name="input">The input string to match against.</param>
/// <param name="pattern">The regular expression pattern which should contain grouping expressions.</param>
/// <param name="value1">Receives the value of the 1st capture group (Groups[1]) or null if no match was made.</param>
/// <param name="value2">Receives the value of the 2nd capture group (Groups[2]) or null if no match was made.</param>
/// <param name="value3">Receives the value of the 3rd capture group (Groups[3]) or null if no match was made.</param>
/// <param name="value4">Receives the value of the 4th capture group (Groups[4]) or null if no match was made.</param>
/// <returns>True if the pattern matched (not necessarily all groups) otherwise false.</returns>
public static bool MatchInto( this string input, string pattern, out string value1, out string value2, out string value3, out string value4 )
{
    string value5;
    return MatchInto( input, pattern, out value1, out value2, out value3, out value4, out value5 );
}

/// <summary>
/// Performs a regular expression match against the specified string, and places the captured groups into the output parameters.
/// </summary>
/// <param name="input">The input string to match against.</param>
/// <param name="pattern">The regular expression pattern which should contain grouping expressions.</param>
/// <param name="value1">Receives the value of the 1st capture group (Groups[1]) or null if no match was made.</param>
/// <param name="value2">Receives the value of the 2nd capture group (Groups[2]) or null if no match was made.</param>
/// <param name="value3">Receives the value of the 3rd capture group (Groups[3]) or null if no match was made.</param>
/// <returns>True if the pattern matched (not necessarily all groups) otherwise false.</returns>
public static bool MatchInto( this string input, string pattern, out string value1, out string value2, out string value3 )
{
    string value4;
    string value5;
    return MatchInto( input, pattern, out value1, out value2, out value3, out value4, out value5 );
}

/// <summary>
/// Performs a regular expression match against the specified string, and places the captured groups into the output parameters.
/// </summary>
/// <param name="input">The input string to match against.</param>
/// <param name="pattern">The regular expression pattern which should contain grouping expressions.</param>
/// <param name="value1">Receives the value of the 1st capture group (Groups[1]) or null if no match was made.</param>
/// <param name="value2">Receives the value of the 2nd capture group (Groups[2]) or null if no match was made.</param>
/// <returns>True if the pattern matched (not necessarily all groups) otherwise false.</returns>
public static bool MatchInto( this string input, string pattern, out string value1, out string value2 )
{
    string value3;
    string value4;
    string value5;
    return MatchInto( input, pattern, out value1, out value2, out value3, out value4, out value5 );
}

/// <summary>
/// Performs a regular expression match against the specified string, and places the captured groups into the output parameters.
/// </summary>
/// <param name="input">The input string to match against.</param>
/// <param name="pattern">The regular expression pattern which should contain grouping expressions.</param>
/// <param name="value">Receives the value of the 1st capture group (Groups[1]) or null if no match was made.</param>
/// <returns>True if the pattern matched (not necessarily all groups) otherwise false.</returns>
public static bool MatchInto( this string input, string pattern, out string value )
{
    string value2;
    string value3;
    string value4;
    string value5;
    return MatchInto( input, pattern, out value, out value2, out value3, out value4, out value5 );
}

Here’s a helpful tip if you frequently find yourself wrestling with AssemblyInfo.cs (or AssemblyInfo.vb, etc.) when working with a solution with a large number of projects.

I find that most of the time, almost all the information except the AssemblyTitle, AssemblyDescription, and GUID are the same across all projects. Even the GUID you can ignore if you’re not worried about COM visibility.

Just add a SolutionInfo.cs file to the solution (not any project in particular) and put your common assembly details in there. See below for an example.

using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

[assembly: AssemblyCompany( "Einstein Technologies" )]
[assembly: AssemblyProduct( "My Product" )]
[assembly: AssemblyCopyright( "Copyright 2009 Einstein Technologies. All rights reserved." )]
[assembly: AssemblyTrademark( "" )]
[assembly: AssemblyCulture( "en-US" )]

[assembly: ComVisible( false )]

[assembly: AssemblyVersion( "1.0.*" )]

Next go to each project, right click –> add existing item. Browse to the SolutionInfo.cs file you created above and click the glyph next to the Add button and choose Add as Link.

Now you can reduce your AssemblyInfo.cs file in that project to the lines below and feel confident that everything else is consistent across the projects.

using System.Reflection;

[assembly: AssemblyTitle( "Plugin Framework" )]
[assembly: AssemblyDescription( "A class library that defines plugin interfaces and stuff." )]

Bonus tip: I tend to drag the linked SolutionInfo.cs file into the special “Properties” folder alongside AssemblyInfo.cs. But Visual Studio won’t let you add files directly to this folder via the UI.