Technical Ramblings of a .NET Developer

Randomizing Enumerables via Extension Methods

Created on January 22, 2012

Usage:

object randomValue = existingList.Random();

IEnumerable<object> randomizedList = existingList.Randomize();

 

Extension Method to Choose a Random Value

The easiest method to choose a random value from a collection is to say, I've got x amount of values, let's get a random integer corresponding to the number of values and pick that one. However, the IEnumerable interface purposely does not provide us with an item count.

Rather than iterating through the entire Enumerable blindly and counting the number of items, lets do it in one iteration. Luckily, mathematics is on our side, all we have to do is grantee that each item in the collection has the same chance of being picked as all the others. For each item in the list, we can perform a decision to either keep the previously selected value, or to select the new one. The probabilities to select each one, can't be 50/50 (or else we'd have a 50% change of selecting the last item in the list), but must reflect the individual item's weight compared to the number of items already processed.

That weight is 1 / (# of items already processed + 1). The +1 accounts for the item in question. For example, for the forth item in a collection, its probability of being selected will be 1/(3+1) => 1/4 or 25%. When we add the fifth item, it's probability of being selected will be 1/(4+1) => 1/5 or 20% (with an 80% chance of the previous selection still standing). Thus, we preserve the principle that each item in the collection has the same chance of being picked as any other, without having to know how many items are in the collection before processing.

public static T Random<T>(this IEnumerable<T> values)
{
  if (values == null)
  { // Nothing in collection, nothing to return
    return default(T);
  }

  // Seed the random with a GUID so randomizations within close time proximity
  //  result in different randomizations
  var rnd = new Random(Guid.NewGuid().GetHashCode());

  double count = 0;
  T result = default(T);

  foreach (var item in values)
  {
    count++;

    if (rnd.NextDouble() <= 1 / count)
    {
      result = item;
    }
  }

  return result;
}

 

Extension Method to Randomize the order of all Items

The example below is pretty straight forward. We simply associate a random number to each value, sort the values based on those random values and then remove the randomly generated numbers from our results. In the end, the values are in a random order.

public static IEnumerable<T> Randomize<T>(this IEnumerable<T> values)
{
  // Seed the random with a GUID so randomizations within close time proximity
  //  result in different randomizations
  var rnd = new Random(Guid.NewGuid().GetHashCode());

  // Generate a random Key for each value, order by random values, then force iteration
  //   so that we do not randomize the list differently each time the collection is traversed
  return values.Select(v => new KeyValuePair<double, T>(rnd.NextDouble(), v))
               .OrderBy(kvp => kvp.Key)
               .Select(kvp => kvp.Value)
               .ToList();				
}

I would like to bring special attention to the ToList() forcing iteration and storing the output of the Randomize extension method to a new List object. Normally, its not best practice for force iteration for LINQ capable extensions as the goal is to delay execution as long as possible. However, if we do not force processing, the extension method will return a different order, each time the Enumerable is traversed. This will cause problems with any appended LINQ extensions that require the order not to change, or at minimal, confuse a developer on why a particular API call gives him a Enumeration that changes its order when bound to two different controls. However, if randomization per iteration is your goal, simply remove the ToList(), but I would make it clear what your version is doing.

Categories: Microsoft .NET  |  Tags: LINQ  Extensions 
Blog Comments RSS feed