CopyPastor

Detecting plagiarism made easy.

Score: 1.7200849056243896; Reported for: String similarity, Exact paragraph match Open both answers

Possible Plagiarism

Plagiarized on 2018-08-09
by Gauravsa

Original Post

Original - Posted on 2014-05-13
by Iain Galloway



            
Present in both answers; Present only in the new answer; Present only in the old answer;

**Step 1: Inspect** So what's going on? Why is our query so slow?
To answer that question, I'm going to need to make some assumptions about your model:-
public class Foo { public int Id { get; set; } public int BarId { get; set; } public virtual Bar Bar { get; set; } } public class Bar { public int Id { get; set; } public string Value { get; set; } public virtual ICollection<Foo> Foos { get; set; } }
Now that we've done that, we can have a look at the horrible query that Entity Framework is making for us:-
using (var context = new FooContext()) { context.Database.Log = s => Console.WriteLine(s); var query = context.Foos.FirstOrDefault(x => x.Id == 1).Bar.Value; }
I can see from the log that TWO queries are being run:-
SELECT TOP (1) [Extent1].[Id] AS [Id], [Extent1].[BarId] AS [BarId] FROM [dbo].[Foos] AS [Extent1] WHERE 1 = [Extent1].[Id] SELECT [Extent1].[Id] AS [Id], [Extent1].[Value] AS [Value] FROM [dbo].[Bars] AS [Extent1] WHERE [Extent1].[Id] = @EntityKeyValue1
Wait, what? Why is stupid Entity Framework making two round-trips to the database when all we need is one string?
**Step 2: Analyze** Let's take a step back and look at our query again:-
var query = context.Foos.FirstOrDefault(x => x.Id == 1).Bar.Value;
Given what we know about **Deferred Execution** what can we deduce is going on here?
What deferred execution basically means is that as long as you're working with an `IQueryable`, nothing actually happens - the query is built up in memory and not actually executed until later. This is useful for a number of reasons - in particular it lets us build up our queries in a modular fashion then run the composed query once. Entity Framework would be pretty useless if context.Foos loaded the entire Foo table into memory immediately!
Our queries only get run when we ask for something other than an `IQueryable`, e.g. with `.AsEnumerable(), .ToList(),` or especially `.GetEnumerator()` etc. In this case `.FirstOrDefault()` doesn't return an IQueryable, so this triggers the database call much earlier than we presumably intended.
The query we've made is basically saying:-
Get the first Foo with Id == 1 (or null if there aren't any) Now Lazy Load that Foo's Bar Now tell me that Bar's Value Wow! So not only are we making two round-trips to the database, we're also sending the entire Foo and Bar across the wire! That's not so bad when our entities are tiny like the contrived ones here, but what if they were larger realistic ones?
**Step 3: Optimize** As you've hopefully gleaned from the above, the first two rules of optimisation are 1) "Don't" and 2) "Measure first" The third rule of optimisation is "Avoid unnecessary work". An extra round-trip and a whole bunch of spurious data definitely counts as "unnecessary", so let's do something about that:-
**Attempt 1**
The first thing we want to do is try the declarative approach. "Find me the value of the first Bar that has a Foo with Id == 1".
This is usually the clearest option from a maintainability point of view; the intent of the programmer is obviously captured. However, remembering that we want to delay execution as long as possible, let's pop the .FirstOrDefault() after the .Select():-
var query = context.Bars.Where(x => x.Foos.Any(y => y.Id == 1)) .Select(x => x.Value) .FirstOrDefault();
SELECT TOP (1) [Extent1].[Value] AS [Value] FROM [dbo].[Bars] AS [Extent1] WHERE EXISTS (SELECT 1 AS [C1] FROM [dbo].[Foos] AS [Extent2] WHERE ([Extent1].[Id] = [Extent2].[BarId]) AND (1 = [Extent2].[Id]) )
**Attempt 2**
In both SQL and most O/RMs, a useful trick is to make sure you're querying from the correct "end" of any given relationship. Sure, we're looking for a Bar, but we've got the Id of a Foo, so we can rewrite the query with that as a starting point: "Find me the Value of the Bar of the Foo with Id == 1":-
var query = context.Foos.Where(x => x.Id == 1) .Select(x => x.Bar.Value) .FirstOrDefault(); SELECT TOP (1) [Extent2].[Value] AS [Value] FROM [dbo].[Foos] AS [Extent1] INNER JOIN [dbo].[Bars] AS [Extent2] ON [Extent1].[BarId] = [Extent2].[Id] WHERE 1 = [Extent1].[Id]
Much better. Prima Facie these look preferable to both the original Entity-Framework-generated mess and the original stored procedure. Done!
**Lastly, for There is one helpful thing about LINQ that every developer should know. It is about performance of Join vs Where.**
The full discussion can be seen here [why is join so much faster than where][1]

[1]: https://stackoverflow.com/questions/5551264/why-is-linq-join-so-much-faster-than-linking-with-where
Step 1: Establish a business case ---------------------------------
The first thing we need to do is ask "**How fast does it need to be?**", because if we don't know how fast it needs to be we can't know when we're done. This isn't a technical decision, it's a business one. You need a stakeholder-centric measure of "Fast Enough" to aim for, and you need to bear in mind that Fast Enough is fast enough. We aren't looking for "As Fast As Possible" unless there's a business reason for it. Even then, we're normally looking for "As Fast As Possible Within Budget".
Since you're my stakeholder, and you don't seem to be too upset about the performance of your stored procedure, let's use that as a benchmark!
Step 2: Measure ---------------
The next thing we need to do is measure our system to see if we're Fast Enough.
Thankfully you've already measured (though we'll talk more about this later). **Your stored procedure runs in 0.5 seconds! Is that Fast Enough? Yes it is! <a href="http://ericlippert.com/2012/12/17/performance-rant/">Job done!</a>**
There is no justification for continuing to spend your time (and your boss' money) fixing something that isn't broken. You probably have something better to be doing, so go do that! :D
------
Still here? Ok then. <a href="http://xkcd.com/386/">I'm not on the clock, people are badmouthing tech I like, and optimising Entity Framework queries is fun</a>. **Challenge Accepted!**
Step 3: Inspect ---------------
So what's going on? Why is our query so slow?
To answer that question, I'm going to need to make some assumptions about your model:-
public class Foo { public int Id { get; set; }
public int BarId { get; set; }
public virtual Bar Bar { get; set; } }
public class Bar { public int Id { get; set; }
public string Value { get; set; }
public virtual ICollection<Foo> Foos { get; set; } }
Now that we've done that, we can have a look at the horrible query that Entity Framework is making for us:-
using (var context = new FooContext()) { context.Database.Log = s => Console.WriteLine(s);
var query = context.Foos.FirstOrDefault(x => x.Id == 1).Bar.Value; }
I can see from the log that **TWO** queries are being run:-
SELECT TOP (1) [Extent1].[Id] AS [Id], [Extent1].[BarId] AS [BarId] FROM [dbo].[Foos] AS [Extent1] WHERE 1 = [Extent1].[Id]
SELECT [Extent1].[Id] AS [Id], [Extent1].[Value] AS [Value] FROM [dbo].[Bars] AS [Extent1] WHERE [Extent1].[Id] = @EntityKeyValue1
**Wait, what?** Why is stupid Entity Framework making two round-trips to the database when all we need is one string?
Step 4: Analyze ---------------
Let's take a step back and look at our query again:-
var query = context.Foos.FirstOrDefault(x => x.Id == 1).Bar.Value;
Given what we know about <a href="http://blogs.msdn.com/b/charlie/archive/2007/12/09/deferred-execution.aspx">Deferred Execution</a> what can we deduce is going on here?
What deferred execution *basically* means is that as long as you're working with an `IQueryable`, nothing actually happens - the query is built up in memory and not actually executed until later. This is useful for a number of reasons - in particular it lets us build up our queries in a modular fashion then run the composed query once. Entity Framework would be pretty useless if `context.Foos` loaded the entire `Foo` table into memory immediately!
Our queries only get run when we ask for something other than an `IQueryable`, e.g. with `.AsEnumerable()`, `.ToList()`, or especially `.GetEnumerator()` etc. In this case `.FirstOrDefault()` doesn't return an `IQueryable`, so this triggers the database call much earlier than we presumably intended.
The query we've made is basically saying:-
- Get the first `Foo` with `Id == 1` (or `null` if there aren't any) - Now <a href="http://msdn.microsoft.com/en-us/data/jj574232.aspx">Lazy Load</a> that `Foo`'s `Bar` - Now tell me that `Bar`'s `Value`
Wow! So not only are we making two round-trips to the database, we're also sending the entire `Foo` and `Bar` across the wire! That's not *so* bad when our entities are tiny like the contrived ones here, but what if they were larger realistic ones?
Step 5: Optimize ----------------
As you've hopefully gleaned from the above, the first two rules of optimisation are 1) "*Don't*" and 2) "*Measure first*" The third rule of optimisation is "*Avoid unnecessary work*". An extra round-trip and a whole bunch of spurious data definitely counts as "unnecessary", so let's do something about that:-
**Attempt 1**
The first thing we want to do is try the declarative approach. "Find me the value of the first `Bar` that has a `Foo` with `Id == 1`".
This is usually the clearest option from a maintainability point of view; the intent of the programmer is obviously captured. However, remembering that we want to delay execution as long as possible, let's pop the `.FirstOrDefault()` after the `.Select()`:-
var query = context.Bars.Where(x => x.Foos.Any(y => y.Id == 1)) .Select(x => x.Value) .FirstOrDefault();
SELECT TOP (1) [Extent1].[Value] AS [Value] FROM [dbo].[Bars] AS [Extent1] WHERE EXISTS (SELECT 1 AS [C1] FROM [dbo].[Foos] AS [Extent2] WHERE ([Extent1].[Id] = [Extent2].[BarId]) AND (1 = [Extent2].[Id]) )
**Attempt 2**
In both SQL and most O/RMs, a useful trick is to make sure you're querying from the correct "end" of any given relationship. Sure, we're looking for a `Bar`, but we've *got* the `Id` of a `Foo`, so we can rewrite the query with that as a starting point: "Find me the `Value` of the `Bar` of the `Foo` with `Id == 1`":-
var query = context.Foos.Where(x => x.Id == 1) .Select(x => x.Bar.Value) .FirstOrDefault();
SELECT TOP (1) [Extent2].[Value] AS [Value] FROM [dbo].[Foos] AS [Extent1] INNER JOIN [dbo].[Bars] AS [Extent2] ON [Extent1].[BarId] = [Extent2].[Id] WHERE 1 = [Extent1].[Id]
Much better. Prima Facie these look preferable to both the original Entity-Framework-generated mess *and* the original stored procedure. Done!
Step 6: Measure ---------------
No! Just wait a minute! How do we know if we're Fast Enough? How do we even know if we're faster?
We measure!
And unfortunately you'll have to do this bit on your own. I can tell you that on *my* machine, on *my* network, simulating a realistic load for *my* application, the `INNER JOIN` is the fastest, followed by the two round-trips version ***(!!)***, followed by the `WHERE EXISTS` version, followed by the stored procedure. *I can't tell you which will be fastest on* your *hardware, on* your *network, under a realistic load for* your *application*.
I *can* tell you that I've made this *exact* performance optimization over a dozen times and depending on the characteristics of the network, database server, and schema I've seen all three of `INNER JOIN`, `WHERE EXISTS`, and two round-trips give the best performance.
However, ***I can't even tell you if* any *of these are Fast Enough***. Depending on your needs you might need to hand-roll some hyper-optimized SQL and invoke a stored procedure. You might even need to go further and use a denormalised read-optimized read store. What about using an in-memory cache for your database results? What about using an output cache for your webserver? What if this query isn't even the bottleneck?
Good performance isn't about speeding up Entity Framework queries. ***Good performance, like just about anything in our industry, is about knowing what's important to your customer, and figuring out the best way to get it.***

        
Present in both answers; Present only in the new answer; Present only in the old answer;