Archive for April, 2012

Optimize SQL Queries (Theory an Practice)

This article assumes you already know SQL and want to optimize queries.

The reasons to optimize

Time is money and people don’t like to wait so programs are expected to be fast.

In Internet time and client/server programming, it’s even more true because suddenly a lot of people are waiting for the DB to give them an answer which makes response time even longer.

Even if you use faster servers, this has been proven to be a small factor compared to the speed of the algorithm used. Therefore, the solution lies in optimization.

Theory of optimization

There are many ways to optimize Databases and queries. My method is the following.

Look at the DB Schema and see if it makes sense

Most often, Databases have bad designs and are not normalized. This can greatly affect the speed of your Database. As a general case, learn the 3 Normal Forms and apply them at all times. The normal forms above 3rd Normal Form are often called de-normalization forms but what this really means is that they break some rules to make the Database faster.

What I suggest is to stick to the 3rd normal form except if you are a DBA (which means you know subsequent forms and know what you’re doing). Normalization after the 3rd NF is often done at a later time, not during design.

 Only query what you really need

Filter as much as possible

Your Where Clause is the most important part for optimization.

Select only the fields you need

Never use “Select *” — Specify only the fields you need; it will be faster and will use less bandwidth.

Be careful with joins

Joins are expensive in terms of time. Make sure that you use all the keys that relate the two tables together and don’t join to unused tables — always try to join on indexed fields. The join type is important as well (INNER, OUTER,… ).

Optimize queries and stored procedures (Most Run First)

Queries are very fast. Generally, you can retrieve many records in less than a second, even with joins, sorting and calculations. As a rule of thumb, if your query is longer than a second, you can probably optimize it.

Start with the Queries that are most often used as well as the Queries that take the most time to execute.

Add, remove or modify indexes

If your query does Full Table Scans, indexes and proper filtering can solve what is normally a very time-consuming process. All primary keys need indexes because they makes joins faster. This also means that all tables need a primary key. You can also add indexes on fields you often use for filtering in the Where Clauses.

You especially want to use Indexes on Integers, Booleans, and Numbers. On the other hand, you probably don’t want to use indexes on Blobs, VarChars and Long Strings.

Be careful with adding indexes because they need to be maintained by the database. If you do many updates on that field, maintaining indexes might take more time than it saves.

In the Internet world, read-only tables are very common. When a table is read-only, you can add indexes with less negative impact because indexes don’t need to be maintained (or only rarely need maintenance).

Move Queries to Stored Procedures (SP)

Stored Procedures are usually better and faster than queries for the following reasons:

Stored Procedures are compiled (SQL Code is not), making them faster than SQL code.
SPs don’t use as much bandwidth because you can do many queries in one SP. SPs also stay on the server until the final results are returned.
Stored Procedures are run on the server, which is typically faster.
Calculations in code (VB, Java, C++, …) are not as fast as SP in most cases.
It keeps your DB access code separate from your presentation layer, which makes it easier to maintain (3 tiers model).

Remove unneeded Views

Views are a special type of Query — they are not tables. They are logical and not physical so every time you run select * from MyView, you run the query that makes the view and your query on the view.

If you always need the same information, views could be good.

If you have to filter the View, it’s like running a query on a query — it’s slower.

Tune DB settings

You can tune the DB in many ways. Update statistics used by the optimizer, run optimization options, make the DB read-only, etc… That takes a broader knowledge of the DB you work with and is mostly done by the DBA.

Using Query Analysers

In many Databases, there is a tool for running and optimizing queries. SQL Server has a tool called the Query Analyser, which is very useful for optimizing. You can write queries, execute them and, more importantly, see the execution plan. You use the execution to understand what SQL Server does with your query.


Optimization in Practice


Example 1:

I want to retrieve the name and salary of the employees of the R&D department.


Query : Select * From Employees

In Program : Add a filter on Dept or use command : if Dept = R&D–

Corrected :

Select Name, Salary From Employees Where Dept = R&D–

In the corrected version, the DB filters data because it filters faster than the program.

Also, you only need the Name and Salary, so only ask for that.

The data that travels on the network will be much smaller, and therefore your performances will improve.

Example 2 (Sorting):


Select Name, Salary

From Employees

Where Dept = ‘R&D’

Order By Salary

Do you need that Order By Clause? Often, people use Order By in development to make sure returned data are ok; remove it if you don’t need it.

If you need to sort the data, do it in the query, not in the program.

Example 3:


For i = 1 to 2000

Call Query : Select salary From Employees Where EmpID = Parameter(i)


Select salary From Employees Where EmpID >= 1 and EmpID <= 2000

The original Query involves a lot of network bandwidth and will make your whole system slow.

You should do as much as possible in the Query or Stored Procedure. Going back and forth is plain stupid.

Although this example seems simple, there are more complex examples on that theme.

Sometimes, the processing is so great that you think it’s better to do it in the code but it’s probably not.

Sometimes, your Stored Procedure will be better off creating a temporary table, inserting data in it and returning it than going back and forth 10,000 times. You might have a slower query that saves time on a greater number of records or that saves bandwidth.

Example 4 (Weak Joins):

You have two tables Orders and Customers. Customers can have many orders.


Select O.ItemPrice, C.Name

From Orders O, Customers C


Select O.ItemPrice, C.Name

From Orders O, Customers C

Where O.CustomerID = C.CustomerID

In that case, the join was not there at all or was not there on all keys. That would return so many records that your query might take hours. It’s a common mistake for beginners.

Corrected 2:

Depending on the DB you use, you will need to specify the Join type you want in different ways.

In SQL Server, the query would need to be corrected to:

Select O.ItemPrice, C.Name

From Orders O INNER JOIN Customers C ON O.CustomerID = C.CustomerID

Choose the good join type (INNER, OUTER, LEFT, …).

Note that in SQL Server, Microsoft suggests you use the joins like in the Corrected 2 instead of the joins in the Where Clause because it will be more optimized.

Example 5 (Weak Filters):

This is a more complicated example, but it illustrates filtering at its best.

We have two tables — Products (ProductID, DescID, Price) and Description(DescID, LanguageID, Text). There are 100,000 Products and unfortunately we need them all.

There are 100 languages (LangID = 1 = English). We only want the English descriptions for the products.

We are expecting 100 000 Products (ProductName, Price).

First try:

Select D.Text As ProductName, P.Price

From Products P INNER JOIN Description D On P.DescID = D.DescID

Where D.LangID = 1

That works but it will be really slow because your DB needs to match 100,000 records with 10,000,000 records and then filter that Where LangID = 1.

The solution is to filter On LangID = 1 before joining the tables.


Select D.Text As ProductName, P.Price

From (Select DescID, Text From Description Where D.LangID = 1) D

INNER JOIN Products P On D.DescID = P.DescID

Now, that will be much faster. You should also make that query a Stored Procedure to make it faster.

Example 6 (Views):

Create View v_Employees AS

Select * From Employees

Select * From v_Employees

This is just like running Select * From Employees twice.

You should not use the view in that case.

If you were to always use the data for employees of R&D and would not like to give the rights to everyone on that table because of salaries being confidential, you could use a view like that:

Create View v_R&DEmployees AS

Select Name, Salary From Employees Where Dept = 1

(Dept 1 is R&D).

You would then give the rights to View v_R&DEmployees to some people and would restrict the rights to Employees table to the DBA only.

That would be a possibly good use of views.


I hope this will help you make your queries faster and your databases more optimized. This should make your program look better and can possibly mean money, especially for high load web applications where it means your program can serve more transactions per hour and you often get paid by transaction.

While you can put the above examples to practice in your database of choice, the preceding tips are especially true for major Databases like Oracle or SQL Server.


SQL Tuning/SQL Optimization Techniques:

1) The sql query becomes faster if you use the actual columns names in SELECT statement instead of than ‘*’.

For Example: Write the query as

SELECT id, first_name, last_name, age, subject FROM student_details;

Instead of:

SELECT * FROM student_details;


2) HAVING clause is used to filter the rows after all the rows are selected. It is just like a filter. Do not use HAVING clause for any other purposes.
For Example: Write the query as

SELECT subject, count(subject)
FROM student_details
WHERE subject != ‘Science’
AND subject != ‘Maths’
GROUP BY subject;

Instead of:

SELECT subject, count(subject)
FROM student_details
GROUP BY subject
HAVING subject!= ‘Vancouver’ AND subject!= ‘Toronto’;


3) Sometimes you may have more than one subqueries in your main query. Try to minimize the number of subquery block in your query.
For Example: Write the query as

FROM employee
WHERE (salary, age ) = (SELECT MAX (salary), MAX (age)
FROM employee_details)
AND dept = ‘Electronics’;

Instead of:

FROM employee
WHERE salary = (SELECT MAX(salary) FROM employee_details)
AND age = (SELECT MAX(age) FROM employee_details)
AND emp_dept = ‘Electronics’;


4) Use operator EXISTS, IN and table joins appropriately in your query.
a) Usually IN has the slowest performance.
b) IN is efficient when most of the filter criteria is in the sub-query.
c) EXISTS is efficient when most of the filter criteria is in the main query.

For Example: Write the query as

Select * from product p
where EXISTS (select * from order_items o
where o.product_id = p.product_id)

Instead of:

Select * from product p
where product_id IN
(select product_id from order_items


5) Use EXISTS instead of DISTINCT when using joins which involves tables having one-to-many relationship.
For Example: Write the query as

SELECT d.dept_id, d.dept
FROM dept d
WHERE EXISTS ( SELECT ‘X’ FROM employee e WHERE e.dept = d.dept);

Instead of:

SELECT DISTINCT d.dept_id, d.dept
FROM dept d,employee e
WHERE e.dept = e.dept;


6) Try to use UNION ALL in place of UNION.
For Example: Write the query as

SELECT id, first_name
FROM student_details_class10
SELECT id, first_name
FROM sports_team;

Instead of:

SELECT id, first_name, subject
FROM student_details_class10
SELECT id, first_name
FROM sports_team;


7) Be careful while using conditions in WHERE clause.
For Example: Write the query as

SELECT id, first_name, age FROM student_details WHERE age > 10;

Instead of:

SELECT id, first_name, age FROM student_details WHERE age != 10;

Write the query as

SELECT id, first_name, age
FROM student_details
WHERE first_name LIKE ‘Chan%’;

Instead of:

SELECT id, first_name, age
FROM student_details
WHERE SUBSTR(first_name,1,3) = ‘Cha’;

Write the query as

SELECT id, first_name, age
FROM student_details
WHERE first_name LIKE NVL ( :name, ‘%’);

Instead of:

SELECT id, first_name, age
FROM student_details
WHERE first_name = NVL ( :name, first_name);

Write the query as

SELECT product_id, product_name
FROM product
WHERE unit_price BETWEEN MAX(unit_price) and MIN(unit_price)

Instead of:

SELECT product_id, product_name
FROM product
WHERE unit_price >= MAX(unit_price)
and unit_price <= MIN(unit_price)

Write the query as

SELECT id, name, salary
FROM employee
WHERE dept = ‘Electronics’
AND location = ‘Bangalore’;

Instead of:

SELECT id, name, salary
FROM employee
WHERE dept || location= ‘ElectronicsBangalore’;

Use non-column expression on one side of the query because it will be processed earlier.

Write the query as

SELECT id, name, salary
FROM employee
WHERE salary < 25000;

Instead of:

SELECT id, name, salary
FROM employee
WHERE salary + 10000 < 35000;

Write the query as

SELECT id, first_name, age
FROM student_details
WHERE age > 10;

Instead of:

SELECT id, first_name, age
FROM student_details
WHERE age NOT = 10;

8) Use DECODE to avoid the scanning of same rows or joining the same table repetitively. DECODE can also be made used in place of GROUP BY or ORDER BY clause.
For Example: Write the query as

SELECT id FROM employee
WHERE name LIKE ‘Ramesh%’
and location = ‘Bangalore’;

Instead of:

SELECT DECODE(location,’Bangalore’,id,NULL) id FROM employee
WHERE name LIKE ‘Ramesh%’;

9) To store large binary objects, first place them in the file system and add the file path in the database.

10) To write queries which provide efficient performance follow the general SQL standard rules.

a) Use single case for all SQL verbs
b) Begin all SQL verbs on a new line
c) Separate all words with a single space
d) Right or left aligning verbs within the initial SQL verb

Categories: SQL Tags:

Lock, Monitor, Mutex, Semaphore

April 17, 2012 1 comment


Exclusive locking is used to ensure that only one thread can enter particular sections of code at a time. The two main exclusive locking constructs are lock and Mutex. Of the two, the lock construct is faster and more convenient.Mutex, though, has a niche in that its lock can span applications in different processes on the computer.

Let’s start with the following class:

class ThreadUnsafe
  static int _val1 = 1, _val2 = 1;

  static void Go()
    if (_val2 != 0) Console.WriteLine (_val1 / _val2);
    _val2 = 0;

This class is not thread-safe: if Go was called by two threads simultaneously, it would be possible to get a division-by-zero error, because _val2 could be set to zero in one thread right as the other thread was in between executing the if statement and Console.WriteLine.

Here’s how lock can fix the problem:

class ThreadSafe
  static readonly object _locker = new object();
  static int _val1, _val2;

  static void Go()
    lock (_locker)
      if (_val2 != 0) Console.WriteLine (_val1 / _val2);
      _val2 = 0;

Only one thread can lock the synchronizing object (in this case, _locker) at a time, and any contending threads areblocked until the lock is released. If more than one thread contends the lock, they are queued on a “ready queue” and granted the lock on a first-come, first-served basis (a caveat is that nuances in the behavior of Windows and the CLR mean that the fairness of the queue can sometimes be violated). Exclusive locks are sometimes said to enforceserialized access to whatever’s protected by the lock, because one thread’s access cannot overlap with that of another. In this case, we’re protecting the logic inside the Go method, as well as the fields _val1 and _val2.

Monitor.Enter and Monitor.Exit

C#’s lock statement is in fact a syntactic shortcut for a call to the methods Monitor.Enter and Monitor.Exit, with atry/finally block. Here’s (a simplified version of) what’s actually happening within the Go method of the preceding example:

Monitor.Enter (_locker);
  if (_val2 != 0) Console.WriteLine (_val1 / _val2);
  _val2 = 0;
finally { Monitor.Exit (_locker); }

Calling Monitor.Exit without first calling Monitor.Enter on the same object throws an exception.


Mutex is like a C# lock, but it can work across multiple processes. In other words, Mutex can be computer-wideas well as application-wide.

Acquiring and releasing an uncontended Mutex takes a few microseconds — about 50 times slower than a lock.

With a Mutex class, you call the WaitOne method to lock and ReleaseMutex to unlock. Closing or disposing aMutex automatically releases it. Just as with the lock statement, a Mutex can be released only from the same thread that obtained it.

A common use for a cross-process Mutex is to ensure that only one instance of a program can run at a time. Here’s how it’s done:

class OneAtATimePlease
  static void Main()
    // Naming a Mutex makes it available computer-wide. Use a name that's
    // unique to your company and application (e.g., include your URL).

    using (var mutex = new Mutex (false, " OneAtATimeDemo"))
      // Wait a few seconds if contended, in case another instance
      // of the program is still in the process of shutting down.

      if (!mutex.WaitOne (TimeSpan.FromSeconds (3), false))
        Console.WriteLine ("Another app instance is running. Bye!");

  static void RunProgram()
    Console.WriteLine ("Running. Press Enter to exit");

If running under Terminal Services, a computer-wide Mutex is ordinarily visible only to applications in the same terminal server session. To make it visible to all terminal server sessions, prefix its name with Global\.


A semaphore is like a nightclub: it has a certain capacity, enforced by a bouncer. Once it’s full, no more people can enter, and a queue builds up outside. Then, for each person that leaves, one person enters from the head of the queue. The constructor requires a minimum of two arguments: the number of places currently available in the nightclub and the club’s total capacity.

A semaphore with a capacity of one is similar to a Mutex or lock, except that the semaphore has no “owner” — it’sthread-agnostic. Any thread can call Release on a Semaphore, whereas with Mutex and lock, only the thread that obtained the lock can release it.

There are two functionally similar versions of this class: Semaphore and SemaphoreSlim. The latter was introduced in Framework 4.0 and has been optimized to meet the low-latency demands of parallel programming. It’s also useful in traditional multithreading because it lets you specify acancellation token when waiting. It cannot, however, be used for interprocess signaling.

Semaphore incurs about 1 microsecond in calling WaitOne or ReleaseSemaphoreSlim incurs about a quarter of that.

Semaphores can be useful in limiting concurrency — preventing too many threads from executing a particular piece of code at once. In the following example, five threads try to enter a nightclub that allows only three threads in at once:

class TheClub      // No door lists!
  static SemaphoreSlim _sem = new SemaphoreSlim (3);    // Capacity of 3

  static void Main()
    for (int i = 1; i <= 5; i++) new Thread (Enter).Start (i);

  static void Enter (object id)
    Console.WriteLine (id + " wants to enter");
    Console.WriteLine (id + " is in!");           // Only three threads
    Thread.Sleep (1000 * (int) id);               // can be here at
    Console.WriteLine (id + " is leaving");       // a time.

1 wants to enter
1 is in!
2 wants to enter
2 is in!
3 wants to enter
3 is in!
4 wants to enter
5 wants to enter
1 is leaving
4 is in!
2 is leaving
5 is in!

If the Sleep statement was instead performing intensive disk I/O, the Semaphore would improve overall performance by limiting excessive concurrent hard-drive activity.

Semaphore, if named, can span processes in the same way as a Mutex.

Categories: C# Tags: , , ,


A thread pool takes away all the need to manage your threads – all you have to do is essentially say “hey! someone should go do this work!”, and a thread in the process’ thread pool will pick up the task and go execute it. And that is all there is to it. Granted, you still have to keep threads from stepping on each other’s toes, and you probably care about when these ‘work items’ are completed – but it is at least a really easy way to queue up a work item.

In fact, working with the ThreadPool is so easy, I’m going to throw all the code at you at once. Below is a pretty simple test app that gives 5 (or NumThreads) work items to the ThreadPool, waits for them all to complete, and then prints out all the answers. I will walk through the code step by step below:


using System;
using System.Threading;namespace ThreadPoolTest
class Program
private const int NumThreads = 5;

private static int[] inputArray;
private static double[] resultArray;
private static ManualResetEvent[] resetEvents;

private static void Main(string[] args)
inputArray = new int[NumThreads];
resultArray = new double[NumThreads];
resetEvents = new ManualResetEvent[NumThreads];

Random rand = new Random();
for (int s = 0; s < NumThreads; s++)
inputArray[s] = rand.Next(1,5000000);
resetEvents[s] = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem(new WaitCallback(DoWork), (object)s);



Console.WriteLine(“And the answers are: “);
for (int i = 0; i < NumThreads; i++)
Console.WriteLine(inputArray[i] + ” -> ” + resultArray[i]);

private static void DoWork(object o)
int index = (int)o;

for (int i = 1; i < inputArray[index]; i++)
resultArray[index] += 1.0 / (i * (i + 1));



We have three arrays at the top of the program: one for input to the work items (inputArray), one for the results (resultArray), and one for the ManualResetEvents (resetEvents). The first two are self explanatory, but what is aManualResetEvent? Well, it is an object that allows one thread to signal another thread when something happens. In the case of this code, we use these events to signal the main thread that a work item has been completed.

So we initialize these arrays, and then we get to a for loop, which is where we will be pushing out these work items. First, we make a random value for the initial input (cause random stuff is always more fun!), then we create aManualResetEvent with its signaled state initially set to false, and then we queue the work item. Thats right, all you have to do to push a work item out for the ThreadPool to do is call ThreadPool.QueueUserWorkItem.

So what are we queuing here? Well, we are saying that a thread in the thread pool should run the method DoWork, with the argument s. Any method that you want to queue up for the thread pool to run needs to take one argument, an object, and return void. The argument will end up being whatever you passed in as the second argument to theQueueUserWorkItem call – and in this case is the ‘index’ of this work item (the index in the various arrays that it needs to work with). And it makes sense that the method would have to return void – because it isn’t actually returning ‘to’ anything, it is running out there all on its own as a separate thread.

So what are we doing in this DoWork function? Not that much in this case, just a simple summation. The important part is the very last call of the function, which is hit when all the work for this work item is done –resetEvents[index].Set(). This triggers the ManualResetEvent for this work item – signaling the main thread that the work is all done here.

Back up in main thread land, after it has pushed all these work items onto the ThreadPool queue, we hit the very important call WaitHandle.WaitAll(resetEvents). This causes the main thread to block here until all theManualResetEvent objects in the resetEvents array signal. When all of them have signaled, that means that all the work units have been completed, and so we continue on and print out all the results. The results change because we are seeding with random values, but here is one example output:


And the answers are:
3780591 -> 0.991001809831479
3555614 -> 0.991163782231558
2072717 -> 0.989816715560308
2264396 -> 0.989982111762391
544144 -> 0.99066981542858


Pretty simple, eh? There are a couple things to note, though. The default thread pool size for a process is 25 threads, and while you can change this number, this resource is not infinite. If all of the threads in the pool are currently occupied with other tasks, new work items will be queued up, but they won’t get worked on until one of the occupied threads finishes its current task. This generally isn’t a problem unless you are giving the pool very large quantities of work. And really, you should never assume that a task is executed immediately after you queue it, because there is no guarantee of that at all.

That’s it for this intro to thread pools in C#. If there are any questions, leave them below – especially if they push on the more advanced aspects of threads and thread pools (cause then I’ll have an excuse to write some more threading tutorials!).

Categories: C# Tags:

EventWaitHandler: AutoResetEvent vs. ManualResetEvent

Threads can communicate using WaitHandlers by signaling. Mutex, Semapore and EventWaitHandle are derived from WaitHandle class.

There are two types of EventWaitHandlers. AutoResetEvent and ManualResetEvent. AutoResetEvent lets one waiting thread at a time when Set() is called but ManualResetEvent lets all waiting threads to pass by when Set() is called. ManualResetEvent starts blocking when Reset() is called.

This acts like a turnstile which lets one at a time. When a thread hits WaitOne(), it waits till some other thread calls Set(). Take a look at the following picture. Thread1, Thread2 and Thread3 are waiting after calling WaitOne(). For every Set() call from another thread, one thread will pass the turnstile.

I have created a simple application to test this. There are two buttons to span a thread DoWork. DoWork has WaitOne call and it blocks threads. Third button calls Set() to release one thread at a time. Click first two buttons to span thread and click third button twice to release blocked threads.

private EventWaitHandle wh = new AutoResetEvent(false);
private void DoWork()
    Console.WriteLine(Thread.CurrentThread.Name + ": Waiting for Set() notification");
    // Wait for notification
    Console.WriteLine(Thread.CurrentThread.Name + ": Notified");
private void buttonCreateThreadOne_Click(object sender, EventArgs e)
    Thread a = new Thread(DoWork);
    // You can name the thread!.. for debugging purpose
    a.Name = "A";
private void buttonCreateSecondThread_Click(object sender, EventArgs e)
    Thread b = new Thread(DoWork);
    // You can name the thread!.. for debugging purpose
    b.Name = "B";
private void buttonReleaseOneThread_Click(object sender, EventArgs e)


Please note that the code after WaitOne call in DoWork is not thread safe. A call to Set will let next waiting thread to enter even the first thread is still executing the code.


This is like a gate which lets more than one at a time. When a thread hits WaitOne(), it waits till someother thread calls Set(). Take a look at the following picture. Thread1, Thread2 and Thread3 are waiting after calling WaitOne(). When Set is called from another thread, all waiting thereads will pass the gate.

Code snippet to illustrate the above.

private void buttonFirstThread_Click(object sender, EventArgs e)
    Thread a = new Thread(DoWork);
    // You can name the thread!.. for debugging purpose
    a.Name = "A";
private void buttonSecondThread_Click(object sender, EventArgs e)
    Thread b = new Thread(DoWork);
    // You can name the thread!.. for debugging purpose
    b.Name = "B";
private void buttonCallSet_Click(object sender, EventArgs e)
private void buttonCallReset_Click(object sender, EventArgs e)
Categories: C#, WPF Tags: ,

User-defined function in SQL

Functions in programming languages are subroutines used to encapsulate frequently performed logic. Any code that must perform the logic incorporated in a function can call the function rather than having to repeat all of the function logic

— Input dimensions in centimeters.
(@CubeLength decimal(4,1), @CubeWidth decimal(4,1),
@CubeHeight decimal(4,1) )
RETURNS decimal(12,3) — Cubic Centimeters.
RETURN ( @CubeLength * @CubeWidth * @CubeHeight )

A user-defined function that returns a table can also replace stored procedures that return a single result set. The table returned by a user-defined function can be referenced in the FROM clause of a Transact-SQL statement, whereas stored procedures that return result sets cannot. For example, fn_EmployeesInDept is a user-defined function that returns a table and can be invoked by a SELECT statement:

FROM tb_Employees AS E,
dbo.fn_EmployeesInDept(‘shipping’) AS EID
WHERE E.EmployeeID = EID.EmployeeID
This is an example of a statement that creates a function in the Northwind database that will return a table:

CREATE FUNCTION LargeOrderShippers ( @FreightParm money )
RETURNS @OrderShipperTab TABLE
ShipperID int,
ShipperName nvarchar(80),
OrderID int,
ShippedDate datetime,
Freight money
INSERT @OrderShipperTab
SELECT S.ShipperID, S.CompanyName,
O.OrderID, O.ShippedDate, O.Freight
FROM Shippers AS S
INNER JOIN Orders AS O ON (S.ShipperID = O.ShipVia)
WHERE O.Freight > @FreightParm

 Difference between Stored Procedure and Functions

1. UDF can be used in the SQL statements anywhere in the WHERE/HAVING/SELECT section where as Stored procedures cannot be.
2. UDFs that return tables can be treated as another rowset. This can be used in JOINs with other tables.

3. Inline UDF’s can be though of as views that take parameters and can be used in JOINs and other Rowset operations.

4. Stored Procedure retuns more than one value at a time while funtion returns only one value at a time.

5. We can call the functions in sql statements (select max(sal) from emp). where as sp is not so.

6. Function do not return the images,text whereas sp returns all.

7. Function and sp both can return the values. But function returns 1 value only. Procedure can return multiple values(max. 1024) we can select the fields from function. in the case of procdure we cannot select the fields.

8. Functions MUST return a value, procedures need not be.

9. You can have DML(insert, update, delete) statements in a function. But, you cannot call such a function in a SQL suppose, if u have a function that is updating a table.. you can’t call that function in any sql query.

10. SP can call function but vice-versa not possible.

Categories: SQL Tags:

Clustered and Non-Clustered Index in SQL

1. Introduction

We all know that data entered in the tables are persisted in the physical drive in the form of database files. Think about a table, say Customer (For any leading bank India), that has around 16 million records. When we try to retrieve records for two or three customers based on their customer id, all 16 million records are taken and comparison is made to get a match on the supplied customer ids. Think about how much time that will take if it is a web application and there are 25 to 30 customers that want to access their data through internet. Does the database server do 16 million x 30 searches? The answer is no because all modern databases use the concept of index.

2. What is an Index

Index is a database object, which can be created on one or more columns (16 Max column combination). When creating the index will read the column(s) and forms a relevant data structure to minimize the number of data comparisons. The index will improve the performance of data retrieval and adds some overhead on data modification such as create, delete and modify. So it depends on how much data retrieval can be performed on table versus how much of DML (InsertDelete and Update) operations.

In this article, we will see creating the Index.

3. First Create Two Tables

To explain these constraints, we need two tables. First, let us create these tables. Run the below scripts to create the tables. Copy paste the code on the new Query Editor window, then execute it.

CREATE TABLE Student(StudId smallint, StudName varchar(50), Class tinyint);
CREATE TABLE TotalMarks(StudentId smallint, TotalMarks smallint);

Note that there are no constraints at present on these tables. We will add the constraints one by one.

4. Primary Key Constraint

A table column with this constraint is called as the key column for the table. This constraint helps the table to make sure that the value is not repeated and also no null entries. We will mark the StudId column of the Studenttable as primary key. Follow these steps:

  1. Right click the student table and click on the modify button.
  2. From the displayed layout, select the StudId row by clicking the Small Square like button on the left side of the row.
  3. Click on the Set Primary Key toolbar button to set the StudId column as primary key column.


Now this column does not allow null values and duplicate values. You can try inserting values to violate these conditions and see what happens. A table can have only one Primary key. Multiple columns can participate on the primary key column. Then, the uniqueness is considered among all the participant columns by combining their values.

5. Clustered Index

The primary key created for the StudId column will create a clustered index for the Studid column. A table can have only one clustered index on it.

When creating the clustered index, SQL server 2005 reads the Studid column and forms a Binary tree on it. This binary tree information is then stored separately in the disc. Expand the table Student and then expand theIndexes. You will see the following index created for you when the primary key is created:


With the use of the binary tree, now the search for the student based on the studid decreases the number of comparisons to a large amount. Let us assume that you had entered the following data in the table student:


The index will form the below specified binary tree. Note that for a given parent, there are only one or two Childs. The left side will always have a lesser value and the right side will always have a greater value when compared to parent. The tree can be constructed in the reverse way also. That is, left side higher and right side lower.


Now let us assume that we had written a query like below:

Select * from student where studid = 103;
Select * from student where studid = 107;

Execution without index will return value for the first query after third comparison.
Execution without index will return value for the second query at eights comparison.

Execution of first query with index will return value at first comparison.
Execution of second query with index will return the value at the third comparison. Look below:

  1. Compare 107 vs 103 : Move to right node
  2. Compare 107 vs 106 : Move to right node
  3. Compare 107 vs 107 : Matched, return the record

If numbers of records are less, you cannot see a different one. Now apply this technique with a Yahoo email user accounts stored in a table called say YahooLogin. Let us assume there are 33 million users around the world that have Yahoo email id and that is stored in the YahooLogin. When a user logs in by giving the user name and password, the comparison required is 1 to 25, with the binary tree that is clustered index.

Look at the above picture and guess yourself how fast you will reach into the level 25. Without Clustered index, the comparison required is 1 to 33 millions.

Got the usage of Clustered index? Let us move to Non-Clustered index.

6. Non Clustered Index

A non-clustered index is useful for columns that have some repeated values. Say for example, AccountTypecolumn of a bank database may have 10 million rows. But, the distinct values of account type may be 10-15. Aclustered index is automatically created when we create the primary key for the table. We need to take care of the creation of the non-clustered index.

Follow the steps below to create a Non-clustered index on our table Student based on the column class.

  1. After expanding the Student table, right click on the Indexes. And click on the New Index.Pic05.jpg
  2. From the displayed dialog, type the index name as shown below and then click on the Add button to select the column(s) that participate in the index. Make sure the Index type is Non-Clustered.Pic06.jpg
  3. In the select column dialog, place a check mark for the column class. This tells that we need a non-clusteredindex for the column Student.Class. You can also combine more than one column to create the Index. Once the column is selected, click on the OK button. You will return the dialog shown above with the selected column marked in blue. Our index has only one column. If you selected more than one column, using the MoveUpand MoveDown button, you can change order of the indexed columns. When you are using the combination of columns, always use the highly repeated column first and more unique columns down in the list. For example, let use assume the correct order for creating the Non-clustered index is: ClassDateOfBirth,PlaceOfBirth.Pic07.jpg
  4. Click on the Index folder on the right side and you will see the non-clustered index based on the column class is created for you.Pic08.jpg

7. How Does a Non-Clustered Index Work?

A table can have more than one Non-Clustered index. But, it should have only one clustered index that works based on the Binary tree concept. Non-Clustered column always depends on the Clustered column on the database.

This can be easily explained with the concept of a book and its index page at the end. Let us assume that you are going to a bookshop and found a big 1500 pages of C# book that says all about C#. When you glanced at the book, it has all beautiful color pages and shiny papers. But, that is not only the eligibility for a good book right? One you are impressed, you want to see your favorite topic of Regular Expressions and how it is explained in the book. What will you do? I just peeped at you from behind and recorded what you did as below:

  1. You went to the Index page (it has total 25 pages). It is already sorted and hence you easily picked up Regular Expression that comes on page Number 17.
  2. Next, you noted down the number displayed next to it which is 407, 816, 1200-1220.
  3. Your first target is Page 407. You opened a page in the middle, the page is greater than 500.
  4. Then you moved to a somewhat lower page. But it still reads 310.
  5. Then you moved to a higher page. You are very lucky you exactly got page 407. [Yes man you got it. Otherwise I need to write more. OK?]
  6. That’s all, you started exploring what is written about Regular expression on that page, keeping in mind that you need to find page 816 also.

In the above scenario, the Index page is Non-Clustered index and the page numbers are clustered index arranged in a binary tree. See how you came to the page 407 very quickly. Your mind actually traversed the binary tree way left and right to reach the page 407 quickly.

Here, the class column with distinct values 1,2,3..12 will store the clustered index columns value along with it. Say for example; Let us take only class value of 1. The Index goes like this:

1: 100, 104, 105

So here, you can easily get all the records that have value for class = 1. Map this with the Book index example now.

Inversion Of Control(IOC) or Dependency Injection(DI)?

In designing an object-oriented application, a major tenet of design is “loose coupling”. Loosely, not meant for the pun, “loose coupling” means that objects should only have as many dependencies as is needed to do their job – and the dependencies should be few. Furthermore, an object’s dependencies should be on interfaces and not on “concrete” objects, when possible. (A concrete object is any object created with the keyword new.) Loose coupling promotes greater reusability, easier maintainability, and allows you to easily provide “mock” objects in place of expensive services, such as a socket-communicator. “Dependency Injection” (DI), also more cryptically known as “Inversion of Control” (IoC), can be used as a technique for encouraging this loose coupling. There are two primary approaches to implementing DI: constructor injection and setter injection. Obviously, at some point, something must be responsible for creating the concrete objects that will be injected into another object. The injector can be a parent object, which I’ll call the “DI controller”, or can be externalized and handled by a “DI container” framework. What follows is a brief overview of the various approaches for using dependency injection techniques.

Constructor Injection

Constructor Injection is the DI technique of passing an object’s dependencies to its constructor. The below example includes a class, Customer, that exposes a method for retrieving every sales-order that the customer made on a particular date. Consequently, the Customer class needs a data-access object for communicating with the database. Assume, an OrderDao (“order data-access object”) exists which implements the interface IOrderDao. One way that a Customer object can get this dependency is by executing the following within the: IOrderDao orderDao = new OrderDao();. The primary disadvantage of this is two-fold:

1. the benefit of having the interface in the first place has been negated since the concrete instance was created locally, and
2. OrderDao cannot easily be replaced by a mock object for testing purposes. (Mock objects will be discussed shortly.)

The aforementioned example follows:
public class Customer {
public Customer(IOrderDao orderDao) {
if (orderDao == null)
throw new ArgumentNullException(“orderDao may not be null”);

this.orderDao = orderDao;

public IList GetOrdersPlacedOn(DateTime date) {
// … code that uses the orderDao member
// get orders from the datasource …

private IOrderDao orderDao;
In the example, note that the constructor accepts an interface; it does not accept a concrete object. Also, note that an exception is thrown if the orderDao parameter is null. This emphasizes the importance of receiving a valid dependency. Constructor Injection is, in my opinion, the preferred mechanism for giving an object its dependencies. It is clear to the developer invoking the object which dependencies need to be given to the Customer object for proper execution. But consider the following example… Suppose you have a class with ten methods that have no dependencies, but you’re adding a new method that does have a dependency on IOrderDao. You could change the constructor to use Constructor Injection, but this may force you to change constructor calls all over the place. Alternatively, you could just add a new constructor that takes the dependency, but then how does a developer easily know when to use one constructor over the other. Finally, if the dependency is very expensive to create, why should it be created and passed to the constructor when it may only be used rarely? “Setter Injection” is another DI technique that can be used in situations such as this.

Setter Injection

Setter Injection does not force dependencies to be passed to the constructor. Instead, the dependencies are set onto public properties exposed by the object in need. As implied previously, the primary motivators for doing this include:

1. supporting dependency injection without having to modify the constructor of a legacy class, and
2. allowing expensive resources or services to be created as late as possible and only when needed.
The code below modifies the Constructor Injection example to use

Setter Injection instead:
public class Customer {
public Customer() {}

public IOrderDao OrderDao {
set { orderDao = value; }
get {
if (orderDao == null)
throw new MemberAccessException(“orderDao” +
” has not been initialized”);
return orderDao;

public IList GetOrdersPlacedOn(DateTime date) {
//… code that uses the OrderDao public
//… property to get orders from the datasource …

// Should not be called directly;
// use the public property instead
private IOrderDao orderDao;
In the above example, the constructor accepts no arguments. Instead, the invoking object is responsible for setting the IOrderDao dependency before the method GetOrdersPlacedOn is called. With Constructor Injection, an exception is thrown if the dependency is not set immediately, i.e., upon creation. With Setter Injection, an exception isn’t thrown until a method actually attempts to use the dependency. Make note of the fact that GetOrdersPlacedOn uses the public OrderDao property; it does not call the private orderDao directly. This is so that the getter method has an opportunity to validate if the dependency has yet been initialized. Setter Injection should be used sparingly in place of Constructor Injection, because it:
1. does not make it clear to the developer which dependencies are needed when, at least until a “has not been initialized” exception is thrown, and
2. makes it a bit more difficult to track down where the exception came from and why it got thrown. With this said, Setter Injection can save on modifying a lot of legacy code when introducing new methods, and can provide a performance boost if the dependency is expensive or not easily accessible.
The Injectors

The next logical question is, what actually creates the dependencies that are to be injected into “injectees”? There are two appropriate places for adding creation logic: controllers and containers.

DI Controllers

The “DI controller” approach is the simpler to understand and implement. In a properly tiered architecture, an application has distinct layers for handling logic. The simplest layering usually consists of a data-layer for talking to the database, a presentation-layer for displaying the UI, and a domain-logic layer for performing business logic. A “controller” layer always exists, even if not well defined, for coordinating UI events to the domain and data layers, and vice versa. For example, in ASP.NET, the code-behind page acts as a rudimentary controller layer. More formalized controller-layer approaches exist: Struts and Spring for Java; Front Controller and Spring .NET for .NET. All of these approaches follow some form of variant of the Model-View-Controller pattern. Regardless of what you use as your controller, the controller is an appropriate location for performing Dependency Injection “wiring”. This is where concrete objects are created and injected as dependencies. What follows are two examples of DI performed by a controller. The first is an illustrative example of “production code” – code that you’d end up deploying. The second is an example of “test code” – code that’s used to test the application, but is not deployed and does not have the need to have a live database. Controller code performing the dependency injection (e.g., from an ASP.NET code-behind page):
//… code performed when the controller is loaded …

IOrderDao orderDao = new OrderDao();
// Using Setter Injection on a pre-existing customer
someCustomer.OrderDao = orderDao;
IList ordersPlacedToday =

Unit-test code performing dependency injection:
IOrderDao orderDao = new MockOrderDao();
// Using Setter Injection on a pre-existing customer
someCustomer.OrderDao = orderDao;
IList ordersPlacedToday =
One of the major benefits of using a DI-controller to inject dependencies is that it’s straightforward and easy to point to where the creation is occurring. The drawback to using DI-controllers is that the dependencies are still hard-coded somewhere; albeit, they’re hard-coded in a location that is often subject to frequent changes anyway. Another drawback is that now the DI-controllers themselves can’t be easily unit-tested with mock objects. (Granted, a powerful tool such as TypeMock can do just about anything when it comes to injecting mock objects. But a tool such as TypeMock should be used only when absolutely necessary as it can lead to habits of not programming-to-interface. In fact, I’d recommend only considering the use of it on very difficult to test, legacy applications.) In ASP.NET, I prefer to use the Model-View-Presenter (MVP) pattern, and have the ASP.NET code-behind page create dependencies and inject them to the presenter via Construction Injection. Additionally, I use UserControls as the View part of the pattern, so the ASP.NET code-behind acts purely as an MVP “dependency initializer” between the UserControls (View) and their presenters. Another option to implementing constructor or setter DI is the use of an application container…

DI Containers

Inversion-of-Control/Dependency-Injection “containers” can be used to watch an application and inject dependencies whenever a particular event occurs. For example, whenever a Customer instance is created, it automatically gets injected with its needed dependencies. It’s a strange concept at first, but can be useful for managing large applications with many service dependencies. Different container providers each have their own mechanism for managing dependency injection settings.

A very good example can be found at :

Categories: C# Tags: ,
%d bloggers like this: