Home > .NET 3.5, C# 3.0, LINQ > Upgrade your C# Skills part 4 – LINQ to Objects

Upgrade your C# Skills part 4 – LINQ to Objects

January 2, 2008

If you’ve been following along recently, you’ll know that I have immersed myself in VS2008 and .NET 3.5. No discussion of these new technologies would be complete without addressing LINQ (Language Integrated Query). My problem is that LINQ has been written about to death by a lot of more qualified people than myself. With the other articles in this series I felt like I had something to offer, but here I am not so sure. So here is what I will do: I will hit the lowlights. Instead of showing you how to use LINQ, which you can get just about anywhere, I’m going to focus on when and why to use LINQ. I’ll also show you how to write your own Types that can be consumed by LINQ.

LINQ actually comes in five flavors:

  • LINQ to Objects – using LINQ to query Collections.
  • LINQ to SQL – using LINQ to directly query SQL Server databases.
  • LINQ to Datasets – using LINQ to query disconnected data stored in a DataSet.
  • LINQ to XML – using LINQ to create, read, manage, and query XML formatted data.
  • LINQ to Entities – using LINQ to query databases via the Entity Data Model Framework.

In this article, I will only cover LINQ to Objects as it provides the foundation for all the others. I will most likely cover LINQ to SQL and LINQ to DataSets in future articles. I have never been much of an XML user, but I may become one with the power of LINQ to XML: if I do, I will write about that at the appropriate time. Same goes for LINQ to Entities: the Entity Data Model Framework should be released sometime in the spring, and I will probably wait until then to cover that topic. So enough of me telling you what I’m not going to write about: let’s get to the real material.

IEnumerable<T>

I teach a weekly class on C# and .NET at my company to a group of procedural programmers, and every time I say something like “IEnumerable of T” I get back a bunch of blank stares. IEnumerable<T> is a generic interface that is implemented by every generic collection type in .NET. It is important to note that this is different from IEnumerator<T>, which we will not be implementing, at least not directly.

IEnumerable<T> is a pretty simple interface. In fact, it only exposes two methods, each called GetEnumerator() – one is generic and one is not. These methods return an IEnumerator<T>, which is just the default looping mechanism for the Collection. You’ll see this implemented in code later in this article.

For now, I want to stress that IEnumerable<T> is really what makes the magic of LINQ work. Anything that implements this interface (or one of its descendants) is queryable by LINQ. For a little terminology, we are going to use the term sequence to refer to anything that implements IEnumerable<T>. This is more accurate than Collection since there are Collections that do not implement this interface. So the answer to the question “When can I use LINQ to Objects?” is “You can use LINQ to Objects against any Sequence”.

Why use LINQ

So now that you know when you can use LINQ, lets talk about why using it is a good idea. Let’s revisit our old favorite, the Person class:

class Person
{
    public event EventHandler<maritalstatuschangedeventargs> MaritalStatusChangedEvent;
    private char _maritalStatus;

    public Person()
    {
    }

    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int Age { get; set; }
    public string DOB { get; set; }
    public bool IsMilitaryVeteran { get; set; }
    public char MaritalStatus
    {
        get
        {
            return _maritalStatus;
        }

        set
        {
            if (_maritalStatus != value)
            {
                _maritalStatus = value;
                if (MaritalStatusChangedEvent != null)
                {
                    MaritalStatusChangedEvent(this,
                        new MaritalStatusChangedEventArgs("Marital Status is now " + _maritalStatus));
                }
            }

        }
    }

    public string GetFullName()
    {
        return string.Format("{0}, {1}", LastName, FirstName);
    }

    public override string ToString()
    {
        return string.Format("{0} {1}", FirstName, LastName);
    }
}

And let’s create a List of Person objects:

var people = new List<Person>()
    {new Person() {FirstName = "John", LastName = "Smith", DOB = "12/31/1968", IsMilitaryVeteran = true, MaritalStatus = 'M'},
     new Person() {FirstName = "Jimmy", LastName = "Crackcorn", DOB = "08/08/1934", IsMilitaryVeteran = true, MaritalStatus = 'S'},
     new Person() {FirstName = "Mary", LastName = "Contrary", DOB = "09/10/1957", IsMilitaryVeteran = false, MaritalStatus = 'D'}};

In a part 3 of this series on Lambda Expressions, we created a custom PersonCollection that inherited from List<Person> because we wanted to be able to filter these objects in specific ways. In that case, we were showing how to use a Lambda Expression with an Extension Method to loop through each Person that matched a certain Predicate<T> and perform some Action<T>. We initially did that by having the PersonCollection return another PersonCollection that was essentially a subset of the original. We could have also passed a more complex Lambda Expression that performed the filtering behavior for us as well as the Action.

In essence, though, the filtering done by the PersonCollection methods is a Query of the sequence. Rather than requiring a custom collection and predetermined methods, LINQ handles the query part for us. This buys us a couple of things. First, it means that we do not need a custom collection just to retrieve a filtered subset. Second, by not embedding this behavior in a Lambda Expression, we are substantively separating filtering behavior from actions on the filtered results. This means that both the filtering and the actions can be modularized.

Let’s look at an example:

var subset = from p in people
             where p.Age > 40
             select p;Console.WriteLine("Over 40:");

foreach (var person in subset)
{
    Console.WriteLine("{0} {1} is {2} years old. DOB: {3}", person.FirstName, person.LastName, person.Age, person.DOB);
}

If you compare these results to the list we created above, you’ll find that only the two over-40 objects are in the subset variable. This is what LINQ to Objects does for you: it allows you to query the Collection for a subset of objects. The resulting list is in fact an IEnumerable<T> for whatever type is in the Collection, so in this case we get back a list of Person objects. It also means I do not need access to the PersonCollection source code: I can execute LINQ statements over the collection directly, even if it is seaed, so I do not need to be able to create custom methods.

This is very powerful. Again, I’m not going to go over the mechanics of LINQ here, but suffice it to say that even this simple example should make you start to salivate. Imagine the possibilities! And it can be done with a very small amount of code.

Something more complex

As you may expect, the queries you write can be much more complex and functional:

var subset = from p in people
             where p.FirstName.StartsWith("J")
                && p.IsMilitaryVeteran
                && p.Age > 40
             select new {
                 LastName = p.LastName,
                 FirstName = p.FirstName,
                 Age = p.Age,
                 DOB = p.DOB
             };

There are some pretty cool things going on here that you need to be aware of. First, notice that I have complete access to all the members of the Person object in my query. Second, while a LINQ statement bears a strong resemblance to SQL, it is not SQL, so don’t judge it based on your SQL experience. Notably, it uses C# syntax and at first glance it appears backwards since the Select statement is last instead of first. This was done in order to support Intellisense, and Kathleen Dollard made a good point at a recent User group presentation: you may start to see SQL switch to this model (at least in the Microsoft World): wouldn’t it be nice to have full Intellisense support when writing SQL statements?

The third item of interest here is also in the Select statement. The first LINQ statement you saw returned an IEnumerable<Person>, but this one does not! In the select statement, we are actually creating an Anonymous Type by selecting only bits and pieces from our Person objects. In fact, these bits and pieces could come from multiple sources or be calculations or even new instances of complex types! This process of creating Anonymous Types from LINQ is known as Projection and it is very intriguing.

We discussed Anonymous Types and Compiler Inference in part 1 of this series, and now we see why they are so crucial. Without them, LINQ would be relegated to simply building subsets of Collections. With Projection though, LINQ becomes enormously powerful. Compiler Inference is used to create and build the Anonymous Type on the fly, and then is used again in order to loop through the newly created sequence. Without it, we would not be able to process the LINQ results, because in the case of projection there is no way to know the resulting Anonymous Type at design time.

Deferred Execution

I would be remiss if I did not at least point this out. A LINQ query is just a statement of execution: in other words, it is only a set of instructions concerning how to query a collection, it does not represent any data itself. No data is processed until the LINQ result object is referenced. This is called deferred execution, and you must be aware of it. If you do not understand this, “unpredictable results may occur”. Here is an example:

// Only define LINQ query - no data yet!
var subset = from p in people
             where p.FirstName.StartsWith("J")
                && p.IsMilitaryVeteran
                && p.Age > 40
             select new {
                 LastName = p.LastName,
                 FirstName = p.FirstName,
                 Age = p.Age,
                 DOB = p.DOB
             };
// "Subset" holds no data at this point

// Add new Person after LINQ query
people.Add(new Person() { FirstName = "Jill", LastName = "Upthehill", DOB = "12/31/1944", IsMilitaryVeteran = true, MaritalStatus = 'S' });Console.WriteLine("Over 40:");

// References LINQ query: NOW the data is gathered, so it includes the new Person
foreach (var person in subset)
{
    Console.WriteLine("{0} {1} is {2} years old. DOB: {3}", person.FirstName, person.LastName, person.Age, person.DOB);
}

Because the data itself is not read at LINQ definition time, it also means you can outsource the LINQ statements themselves to other classes and methods. Just be sure you return IEnumerable of Person:

public IEnumerable<Person> JVets(List<Person> people)
{
    var subset = from p in people
                 where p.FirstName.StartsWith("J")
                    && p.IsMilitaryVeteran
                    && p.Age > 40
                 select p;

    return subset;
}

Build your own Sequence

So far, we have been consuming known Sequences (such as List<Person>), but what if you wanted to perform LINQ over one of your own custom collection types? Hopefully by now the answer is obvious: you need to inherit from IEnumerable<T> or one of its children. The easiest way by far to do this is to inherit from one of the generic collections, such as List<T>. This would be my recommendation, but if you feel like getting into the nuts and bolts a little more, then you can implement IEnumerable<T> directly:

class PersonCollection : IEnumerable<Person>
{
    #region IEnumerable<Person> Members
    public IEnumerator<Person> GetEnumerator()
    {
        foreach (Person p in this)
        {
            yield return p;
        }
    }
#endregion

#region IEnumerable Members
    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
#endregion
}

This is back to creating custom Iterators, which is beyond the scope of this article, but the magic here happens in yield return. This allows the collection to halt at each iteration and retain its position in the collection.

But here is the bad news: if you do this, you also have to implement all the Collection goodies, like Add(). This would be tedious and time consuming, and largely unnecessary since you could simply inherit from one of the Generic classes.

Conclusion

As stated at the beginning, this is not meant to be a LINQ primer. What you should get out of this is the following: you should use LINQ whenever you want to query collections. You can use LINQ to create collections of anonymous types, limiting the data and properties you have to deal with to just the ones you want. You can use LINQ to logically separate your queries from your actions.

And there is much more you can do with LINQ. You can aggregate results, join collections together, and so much more. Try it the next time you need to loop through a colleciton and pull out just certain objects. I think if you use it a couple of times you’ll find it becomes useful in a hurry.

Advertisement
Categories: .NET 3.5, C# 3.0, LINQ
  1. Justin
    January 19, 2010 at 6:54 pm

    thanks! This was very useful and easy to understand as I wanted something a little snazzier than doing a foreach loop over my collection.

  2. January 13, 2011 at 8:27 am

    Nice…New approach for linq to Objects.

  3. May 6, 2011 at 1:07 pm

    sdfsdfsdfsdfsdf

  1. January 7, 2008 at 12:15 pm
  2. January 18, 2008 at 11:31 pm
  3. February 7, 2008 at 1:21 am
  4. June 27, 2008 at 6:41 am
Comments are closed.
%d bloggers like this: