Reviewing my data access layer using the Entity Framework Profiler

Inspired by Oren Eini’s series of blog post code reviews and hearing him talk on This Developers Life Life I decided to review the data access of my latest application using the Entity Framework Profiler.

This is the second application I’ve used the Entity Framework for, and I’ve found it a joy to work with. I’d heard the arguments against ORMs that said they can’t produce good SQL, but I’d also heard the argument that ORMs are made by very smart people and they can probably make better SQL than most people. What I hadn’t heard till I heard Oren’s interview was that a smart ORM will still make stupid SQL if you tell it to do stupid things.

I knew the data access of my application needed some attention and this gave me the kick I needed to actually do something. It also seemed like a great chance to given Oren’s Entity Framework Profiler a test drive to see how much it could help my development process.

Setting up the EF Profiler to work with my .NET MVC project was very straight forward. I just needed to reference a DLL and add a line of code to my Application_Start method. I then ran the profiler while visiting a few pages of my site running on my local machine’s test server.

I was expecting the profiler to spot quite a lot of small problems and one or too big ones. What I actually discovered came as quite a shock. The EF profiler makes it very clear how many SQL queries each page is creating, and my pages were creating a lot.

In my defence I knew the code needed attention, but still it’s still quite embarrassing to have written the code that did this. Here is my screenshot of shame:

Screenshot of my apps data access

In the top left you can see the number of object contexts used on a page and how many queries were made from each one. I’m using one context per page, which I think is a good thing, but probably the only good thing that this screenshot tells us.

Skimming over the simple pages we very quickly come to ones that are creating hundreds of SQL queries and in one particularly horrible case over a thousand. How on earth has this happened? I haven’t written that many queries; I’m not even showing that many records. On the positive side, I’ve spotted this early on and at the perfect time to fix it.

You can see the actual queries produced and how long they took to run in the top right of the profiler. Clicking on one of the queries allows you to see extra details about it, such as the stack trace. In the screenshot above you can see that the highlighted query is getting run from the view, which in itself is a bad thing and something that will need to be fixed.

At this stage the most useful feature was the red dots appearing next to many of the queries. These were telling me that I had something called a SELECT N+1 problem. In fact my application was riddled with them.

What is SELECT N+1

This is something you’d never do if you were writing the SQL yourself because you’d quickly see how much extra work you were creating for yourself. However when the framework is doing the work for you and you don’t see the SQL, it’s surprisingly easy to do.

Here’s an example using the data model of my app:

var members = context.Members;
@foreach(var member in members)
{
  <p>@member.Address.Line1</p>
}

This is where I’d naively assumed the magic of Entity Framework was far smarter than it really is. As the query is not executed till the foreach loop I thought EF would peer inside the loop and figure out that it needs to join with the address table so it can get also show the addresses. Now I’m using the EF Profiler I can see what that it actually does. First it queries to get a list of all the members. Then it runs an extra query for each member to get their address. If there were 100 members, it would run 101 queries, rather than doing a join and doing just one.

In this case it’s very easy to fix. You just need to let it know what extra properties you’ll be querying when you get the members from the context, like this:

var members = context.Members.Include(“Address”);

This lets it know up front what it needs to query and the foreach loop will result in just one query with a very sensible join in it.

Without that include method Entity Framework will just do exactly what you tell it and make lots of queries. Combined with nested loops and a complex object model the number of queries soon start to build up.

SELECT N+1 and my Application

I studied the output from the EF profiler on the queries from my application. Very few pages were as simple as the example above, but I was able to spot several patterns that had created this problem and which I’m not yet completely sure of the best way to fix.

Pattern 1: Starting from the Entity

There are a section of pages which all show information about the currently active Club, such as its members and its events. As all the pages had this in common I put this in a base controller for those pages so all the action methods could access it without me having to repeat the query.

So in an action method for displaying a Club’s Members all I needed to say was:

var members = ActiveClub.Members;
return View(members);

As all my queries start from an Entity rather than the context I’ve lost the ability to use the include method. After asking this question on Stack Overflow I found that I can use CreateSourceQuery like this:

var members = ActiveClub.Members.CreateSourceQuery().Include(“Address”);

This solves the problem in this case, but I’m not convinced it’s the best solution. Every page is still making an extra query for the club when in most cases we only need to know the club’s name and id.

Pattern 2: Filtered collections

There are many cases in my app when I don’t want to see all the entities in a collection. For example I may only want to see a list of members in a club who haven’t paid their yearly subscription. This is essentially just a query, but to make these queries easy to use I added them to my model as a property, like this one on the Club model:

public IEnumerable<Member> UnpaidMembers()
{
  return Members.Where(t => !t.Paid && t.DateResigned == null);
}

This pattern combined with the first one means that I have no control in my action method code to say which entities should be included in this query. You cannot call Include or CreateSourceQuery on IEnumerable.

Possible solutions

Repositories

I could create repositories for the entities being shown. We could have a Members repository with methods like this:

MemberRepository.GetMembersForClub(PianolaEntities context, int clubId, String include = null);
MemberRepository.GetUnpaidMembersForClub(PianolaEntities context, int clubId, String include = null);

Having the option to pass include parameters in should solve my SELECT N+1 problem and I can have different methods for my different filtered collections. I’d also wanted to introduce a repository before to help me with testing.

This seemed like a good idea till I read several posts by Oren which say that putting layers of abstractions over an ORM is a pointless exercise:

I completely see his point, but I’m not sure how and if this applies to the Entity Framework. I’ve not been able to find a good example yet of unit testing the Entity Framework without using a repository pattern. I also think this may be more difficult for me because I’ve generated my model from the database. Perhaps this is one of the reasons code first is better.

Filters

Assuming I can find a way to not use a repository then creating filter extension methods seems like a good way to reuse and label the queries I use.

Here’s an example filter extension:

public static IQueryable<Member> WhereUnpaid(this IQueryable<Member> members)
{
  return members.Where(t.Paid && t.DateResigned == null);
}

I could use this as part of a query from the context and still include anything I needed to:

var unpaid = context.Members.Include(“Address”).WhereUnpaid().ToList();

I really like this idea. Running it through EF Profiler shows that it only makes one query when it loops through and shows addresses. It’s neat, gives me access to the context and should let me chain queries like this:

var unpaidClubMembers = context.Members
 .Include(“Address”)
 .WhereUnpaid()
 .WhereMemberOfClub(clubId)
 .ToList();

I’m just not sure how to unit test it. More research is still needed.

Conclusion

If you’re going to use an ORM then you better keep track of what SQL it’s producing. I’ve still got 26 days left on my trial copy of the Entity Framework Profiler, but this has already become a vital tool in my toolbox. So I will definitely buy it and I suggest you give it a go. As you can see I’ve made some big mistakes in my application and these would have had big consequences if these had lasted through to production. I don’t think you can afford not to use the EF Profiler.

I’ve not decided which of my two possible solutions I will use. Once I do and hopefully fix all my problems I will come back with a follow up post showing the improvements and hopefully an EF Profiler screenshot I can be proud of.

Kashyap said

Give LinqPad a go with your LINQ queries. You can view SQL output of the LINQ queries written against EF Models. Its a cheaper less efficient solution but it also allows you to experiment with the linq queries to see the output.