.NET 3.5 brought a bunch of great extension methods to the framework, most of which extended the IEnumerable<T> (or IQueryable<T>) interfaces implemented by just about every collection type. These include simple ones such as Where, OrderBy, FirstOrDefault, etc as well as more complex ones like Aggregate, GroupBy, etc. The Enumerable class really covers a lot of ground, but I’ve created a couple more that I use from time to time in my “Common” assembly.
The extension method I’m writing about today is called Batch. It allows you to take any input sequence (array, collection, iterator, etc.) and process it in batches of a specified size.
Many times you will have an input sequence that contains a lot of items but for various reasons, you can’t process the entire sequence at once (memory constraints, you want to update a console with progress, etc.) and processing one at a time is too slow.
Before, I used to write code that would look like this.
var batch = new List();
foreach (var item in sequence) {
batch.Add(item);
if (batch.Count == 100) {
BulkInsert(batch);
batch.Clear();
}
}
This isn’t too bad but it’s a lot of ceremony and extension methods are all about turning globs like that into more readable, concise code. The method signature for the Batch method looks like the following.
/// <summary> /// Enumerates a sequence in chunks, yielding batches of a certain size to the enumerator. /// </summary> /// <typeparam name="T">The type of item in the batch.</typeparam> /// <param name="sequence">The sequence of items to be enumerated.</param> /// <param name="batchSize">The maximum number of items to include in a batch.</param> /// <returns>A sequence of arrays, with each array containing at most /// <paramref name="batchSize"/> elements.</returns> public static IEnumerable<T[]> Batch<T>( this IEnumerable<T> sequence, int batchSize )
As the summary suggests, it converts an input sequence into a sequence of arrays of a specified size. The last array in the sequence will have fewer items than the batch size if the input count is not evenly divisible by batchSize. Calling it is very simple. The first example can be rewritten as the following.
foreach (Record[] batch in sequence.Batch(100)) {
BulkInsert(batch);
}
The complete code example is shown below.
/// <summary>
/// Enumerates a sequence in chunks, yielding batches of a certain size to the enumerator.
/// </summary>
/// <typeparam name="T">The type of item in the batch.</typeparam>
/// <param name="sequence">The sequence of items to be enumerated.</param>
/// <param name="batchSize">The maximum number of items to include in a batch.</param>
/// <returns>A sequence of arrays, with each array containing at most
/// <paramref name="batchSize"/> elements.</returns>
public static IEnumerable<T[]> Batch<T>( this IEnumerable<T> sequence, int batchSize )
{
var batch = new List<T>( batchSize );
foreach ( var item in sequence ) {
batch.Add( item );
// when we've accumulated enough in the
// batch, send it out
if ( batch.Count >= batchSize ) {
yield return batch.ToArray( );
batch.Clear( );
} // if
} // foreach
// send out any leftovers
if ( batch.Count > 0 ) {
yield return batch.ToArray( );
batch.Clear( );
} // if
}

Posts
Amiable post and this mail helped me alot in my college assignement. Thanks you as your information.
2010-05-05 @ 7:53 am