Archive

Archive for the ‘SQL’ Category

Select Find Nth Highest Salary Record In Sql Server


1st Method :

SELECT TOP 1 [Salary]
FROM
(
SELECT DISTINCT TOP N [Salary]
FROM [dbo].[Employee]
ORDER BY [Salary] DESC
) temp
ORDER BY [Salary]

 
2nd Method :

SELECT * FROM
(
SELECT DENSE_RANK() OVER(ORDER BY [Salary] DESC)AS RowId, *
FROM [dbo].[Employee]
) AS e1
WHERE e1.RowId = N

 
http://csharpdotnetfreak.blogspot.com/2011/09/select-nth-highest-record-sql-server.html
Advertisements
Categories: SQL Tags:

Database Normalization


Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.

Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one place must be changed, the data must be changed in exactly the same way in all locations. A customer address change is much easier to implement if that data is stored only in the Customers table and nowhere else in the database.

What is an “inconsistent dependency”? While it is intuitive for a user to look in the Customers table for the address of a particular customer, it may not make sense to look there for the salary of the employee who calls on that customer. The employee’s salary is related to, or dependent on, the employee and thus should be moved to the Employees table. Inconsistent dependencies can make data difficult to access because the path to find the data may be missing or broken.

There are a few rules for database normalization. Each rule is called a “normal form.” If the first rule is observed, the database is said to be in “first normal form.” If the first three rules are observed, the database is considered to be in “third normal form.” Although other levels of normalization are possible, third normal form is considered the highest level necessary for most applications.

As with many formal rules and specifications, real world scenarios do not always allow for perfect compliance. In general, normalization requires additional tables and some customers find this cumbersome. If you decide to violate one of the first three rules of normalization, make sure that your application anticipates any problems that could occur, such as redundant data and inconsistent dependencies.

The following descriptions include examples.

First Normal Form

  • Eliminate repeating groups in individual tables.
  • Create a separate table for each set of related data.
  • Identify each set of related data with a primary key.

Do not use multiple fields in a single table to store similar data. For example, to track an inventory item that may come from two possible sources, an inventory record may contain fields for Vendor Code 1 and Vendor Code 2.

What happens when you add a third vendor? Adding a field is not the answer; it requires program and table modifications and does not smoothly accommodate a dynamic number of vendors. Instead, place all vendor information in a separate table called Vendors, then link inventory to vendors with an item number key, or vendors to inventory with a vendor code key.

Second Normal Form

  • Create separate tables for sets of values that apply to multiple records.
  • Relate these tables with a foreign key.

Records should not depend on anything other than a table’s primary key (a compound key, if necessary). For example, consider a customer’s address in an accounting system. The address is needed by the Customers table, but also by the Orders, Shipping, Invoices, Accounts Receivable, and Collections tables. Instead of storing the customer’s address as a separate entry in each of these tables, store it in one place, either in the Customers table or in a separate Addresses table.

Third Normal Form

  • Eliminate fields that do not depend on the key.

Values in a record that are not part of that record’s key do not belong in the table. In general, any time the contents of a group of fields may apply to more than a single record in the table, consider placing those fields in a separate table.

For example, in an Employee Recruitment table, a candidate’s university name and address may be included. But you need a complete list of universities for group mailings. If university information is stored in the Candidates table, there is no way to list universities with no current candidates. Create a separate Universities table and link it to the Candidates table with a university code key.

EXCEPTION: Adhering to the third normal form, while theoretically desirable, is not always practical. If you have a Customers table and you want to eliminate all possible interfield dependencies, you must create separate tables for cities, ZIP codes, sales representatives, customer classes, and any other factor that may be duplicated in multiple records. In theory, normalization is worth pursing. However, many small tables may degrade performance or exceed open file and memory capacities.

It may be more feasible to apply third normal form only to data that changes frequently. If some dependent fields remain, design your application to require the user to verify all related fields when any one is changed.


Other Normalization Forms

Fourth normal form, also called Boyce Codd Normal Form (BCNF), and fifth normal form do exist, but are rarely considered in practical design. Disregarding these rules may result in less than perfect database design, but should not affect functionality.

Normalizing an Example Table

These steps demonstrate the process of normalizing a fictitious student table.

  1. Unnormalized table:
    Student# Advisor Adv-Room Class1 Class2 Class3
    1022 Jones 412 101-07 143-01 159-02
    4123 Smith 216 201-01 211-02 214-01

     

  2. First Normal Form: No Repeating GroupsTables should have only two dimensions. Since one student has several classes, these classes should be listed in a separate table. Fields Class1, Class2, and Class3 in the above records are indications of design trouble.Spreadsheets often use the third dimension, but tables should not. Another way to look at this problem is with a one-to-many relationship, do not put the one side and the many side in the same table. Instead, create another table in first normal form by eliminating the repeating group (Class#), as shown below:
    Student# Advisor Adv-Room Class#
    1022 Jones 412 101-07
    1022 Jones 412 143-01
    1022 Jones 412 159-02
    4123 Smith 216 201-01
    4123 Smith 216 211-02
    4123 Smith 216 214-01

     

  3. Second Normal Form: Eliminate Redundant DataNote the multiple Class# values for each Student# value in the above table. Class# is not functionally dependent on Student# (primary key), so this relationship is not in second normal form.The following two tables demonstrate second normal form:Students:
    Student# Advisor Adv-Room
    1022 Jones 412
    4123 Smith 216

    Registration:

    Student# Class#
    1022 101-07
    1022 143-01
    1022 159-02
    4123 201-01
    4123 211-02
    4123 214-01

     

  4. Third Normal Form: Eliminate Data Not Dependent On KeyIn the last example, Adv-Room (the advisor’s office number) is functionally dependent on the Advisor attribute. The solution is to move that attribute from the Students table to the Faculty table, as shown below:Students:
    Student# Advisor
    1022 Jones
    4123 Smith

    Faculty:

    Name Room Dept
    Jones 412 42
    Smith 216 42
Categories: SQL Tags:

Optimize SQL Queries (Theory an Practice)


This article assumes you already know SQL and want to optimize queries.

The reasons to optimize

Time is money and people don’t like to wait so programs are expected to be fast.

In Internet time and client/server programming, it’s even more true because suddenly a lot of people are waiting for the DB to give them an answer which makes response time even longer.

Even if you use faster servers, this has been proven to be a small factor compared to the speed of the algorithm used. Therefore, the solution lies in optimization.

Theory of optimization

There are many ways to optimize Databases and queries. My method is the following.

Look at the DB Schema and see if it makes sense

Most often, Databases have bad designs and are not normalized. This can greatly affect the speed of your Database. As a general case, learn the 3 Normal Forms and apply them at all times. The normal forms above 3rd Normal Form are often called de-normalization forms but what this really means is that they break some rules to make the Database faster.

What I suggest is to stick to the 3rd normal form except if you are a DBA (which means you know subsequent forms and know what you’re doing). Normalization after the 3rd NF is often done at a later time, not during design.

 Only query what you really need

Filter as much as possible

Your Where Clause is the most important part for optimization.

Select only the fields you need

Never use “Select *” — Specify only the fields you need; it will be faster and will use less bandwidth.

Be careful with joins

Joins are expensive in terms of time. Make sure that you use all the keys that relate the two tables together and don’t join to unused tables — always try to join on indexed fields. The join type is important as well (INNER, OUTER,… ).

Optimize queries and stored procedures (Most Run First)

Queries are very fast. Generally, you can retrieve many records in less than a second, even with joins, sorting and calculations. As a rule of thumb, if your query is longer than a second, you can probably optimize it.

Start with the Queries that are most often used as well as the Queries that take the most time to execute.

Add, remove or modify indexes

If your query does Full Table Scans, indexes and proper filtering can solve what is normally a very time-consuming process. All primary keys need indexes because they makes joins faster. This also means that all tables need a primary key. You can also add indexes on fields you often use for filtering in the Where Clauses.

You especially want to use Indexes on Integers, Booleans, and Numbers. On the other hand, you probably don’t want to use indexes on Blobs, VarChars and Long Strings.

Be careful with adding indexes because they need to be maintained by the database. If you do many updates on that field, maintaining indexes might take more time than it saves.

In the Internet world, read-only tables are very common. When a table is read-only, you can add indexes with less negative impact because indexes don’t need to be maintained (or only rarely need maintenance).

Move Queries to Stored Procedures (SP)

Stored Procedures are usually better and faster than queries for the following reasons:

Stored Procedures are compiled (SQL Code is not), making them faster than SQL code.
SPs don’t use as much bandwidth because you can do many queries in one SP. SPs also stay on the server until the final results are returned.
Stored Procedures are run on the server, which is typically faster.
Calculations in code (VB, Java, C++, …) are not as fast as SP in most cases.
It keeps your DB access code separate from your presentation layer, which makes it easier to maintain (3 tiers model).

Remove unneeded Views

Views are a special type of Query — they are not tables. They are logical and not physical so every time you run select * from MyView, you run the query that makes the view and your query on the view.

If you always need the same information, views could be good.

If you have to filter the View, it’s like running a query on a query — it’s slower.

Tune DB settings

You can tune the DB in many ways. Update statistics used by the optimizer, run optimization options, make the DB read-only, etc… That takes a broader knowledge of the DB you work with and is mostly done by the DBA.

Using Query Analysers

In many Databases, there is a tool for running and optimizing queries. SQL Server has a tool called the Query Analyser, which is very useful for optimizing. You can write queries, execute them and, more importantly, see the execution plan. You use the execution to understand what SQL Server does with your query.

 

Optimization in Practice

 

Example 1:

I want to retrieve the name and salary of the employees of the R&D department.

Original:

Query : Select * From Employees

In Program : Add a filter on Dept or use command : if Dept = R&D–

Corrected :

Select Name, Salary From Employees Where Dept = R&D–

In the corrected version, the DB filters data because it filters faster than the program.

Also, you only need the Name and Salary, so only ask for that.

The data that travels on the network will be much smaller, and therefore your performances will improve.

Example 2 (Sorting):

Original:

Select Name, Salary

From Employees

Where Dept = ‘R&D’

Order By Salary

Do you need that Order By Clause? Often, people use Order By in development to make sure returned data are ok; remove it if you don’t need it.

If you need to sort the data, do it in the query, not in the program.

Example 3:

Original:

For i = 1 to 2000

Call Query : Select salary From Employees Where EmpID = Parameter(i)

Corrected:

Select salary From Employees Where EmpID >= 1 and EmpID <= 2000

The original Query involves a lot of network bandwidth and will make your whole system slow.

You should do as much as possible in the Query or Stored Procedure. Going back and forth is plain stupid.

Although this example seems simple, there are more complex examples on that theme.

Sometimes, the processing is so great that you think it’s better to do it in the code but it’s probably not.

Sometimes, your Stored Procedure will be better off creating a temporary table, inserting data in it and returning it than going back and forth 10,000 times. You might have a slower query that saves time on a greater number of records or that saves bandwidth.

Example 4 (Weak Joins):

You have two tables Orders and Customers. Customers can have many orders.

Original:

Select O.ItemPrice, C.Name

From Orders O, Customers C

Corrected:

Select O.ItemPrice, C.Name

From Orders O, Customers C

Where O.CustomerID = C.CustomerID

In that case, the join was not there at all or was not there on all keys. That would return so many records that your query might take hours. It’s a common mistake for beginners.

Corrected 2:

Depending on the DB you use, you will need to specify the Join type you want in different ways.

In SQL Server, the query would need to be corrected to:

Select O.ItemPrice, C.Name

From Orders O INNER JOIN Customers C ON O.CustomerID = C.CustomerID

Choose the good join type (INNER, OUTER, LEFT, …).

Note that in SQL Server, Microsoft suggests you use the joins like in the Corrected 2 instead of the joins in the Where Clause because it will be more optimized.

Example 5 (Weak Filters):

This is a more complicated example, but it illustrates filtering at its best.

We have two tables — Products (ProductID, DescID, Price) and Description(DescID, LanguageID, Text). There are 100,000 Products and unfortunately we need them all.

There are 100 languages (LangID = 1 = English). We only want the English descriptions for the products.

We are expecting 100 000 Products (ProductName, Price).

First try:

Select D.Text As ProductName, P.Price

From Products P INNER JOIN Description D On P.DescID = D.DescID

Where D.LangID = 1

That works but it will be really slow because your DB needs to match 100,000 records with 10,000,000 records and then filter that Where LangID = 1.

The solution is to filter On LangID = 1 before joining the tables.

Corrected:

Select D.Text As ProductName, P.Price

From (Select DescID, Text From Description Where D.LangID = 1) D

INNER JOIN Products P On D.DescID = P.DescID

Now, that will be much faster. You should also make that query a Stored Procedure to make it faster.

Example 6 (Views):

Create View v_Employees AS

Select * From Employees

Select * From v_Employees

This is just like running Select * From Employees twice.

You should not use the view in that case.

If you were to always use the data for employees of R&D and would not like to give the rights to everyone on that table because of salaries being confidential, you could use a view like that:

Create View v_R&DEmployees AS

Select Name, Salary From Employees Where Dept = 1

(Dept 1 is R&D).

You would then give the rights to View v_R&DEmployees to some people and would restrict the rights to Employees table to the DBA only.

That would be a possibly good use of views.

Conclusion

I hope this will help you make your queries faster and your databases more optimized. This should make your program look better and can possibly mean money, especially for high load web applications where it means your program can serve more transactions per hour and you often get paid by transaction.

While you can put the above examples to practice in your database of choice, the preceding tips are especially true for major Databases like Oracle or SQL Server.


****************************************************************************************************

SQL Tuning/SQL Optimization Techniques:

1) The sql query becomes faster if you use the actual columns names in SELECT statement instead of than ‘*’.

For Example: Write the query as

SELECT id, first_name, last_name, age, subject FROM student_details;

Instead of:

SELECT * FROM student_details;

 

2) HAVING clause is used to filter the rows after all the rows are selected. It is just like a filter. Do not use HAVING clause for any other purposes.
For Example: Write the query as

SELECT subject, count(subject)
FROM student_details
WHERE subject != ‘Science’
AND subject != ‘Maths’
GROUP BY subject;

Instead of:

SELECT subject, count(subject)
FROM student_details
GROUP BY subject
HAVING subject!= ‘Vancouver’ AND subject!= ‘Toronto’;

 

3) Sometimes you may have more than one subqueries in your main query. Try to minimize the number of subquery block in your query.
For Example: Write the query as

SELECT name
FROM employee
WHERE (salary, age ) = (SELECT MAX (salary), MAX (age)
FROM employee_details)
AND dept = ‘Electronics’;

Instead of:

SELECT name
FROM employee
WHERE salary = (SELECT MAX(salary) FROM employee_details)
AND age = (SELECT MAX(age) FROM employee_details)
AND emp_dept = ‘Electronics’;

 

4) Use operator EXISTS, IN and table joins appropriately in your query.
a) Usually IN has the slowest performance.
b) IN is efficient when most of the filter criteria is in the sub-query.
c) EXISTS is efficient when most of the filter criteria is in the main query.

For Example: Write the query as

Select * from product p
where EXISTS (select * from order_items o
where o.product_id = p.product_id)

Instead of:

Select * from product p
where product_id IN
(select product_id from order_items

 

5) Use EXISTS instead of DISTINCT when using joins which involves tables having one-to-many relationship.
For Example: Write the query as

SELECT d.dept_id, d.dept
FROM dept d
WHERE EXISTS ( SELECT ‘X’ FROM employee e WHERE e.dept = d.dept);

Instead of:

SELECT DISTINCT d.dept_id, d.dept
FROM dept d,employee e
WHERE e.dept = e.dept;

 

6) Try to use UNION ALL in place of UNION.
For Example: Write the query as

SELECT id, first_name
FROM student_details_class10
UNION ALL
SELECT id, first_name
FROM sports_team;

Instead of:

SELECT id, first_name, subject
FROM student_details_class10
UNION
SELECT id, first_name
FROM sports_team;

 

7) Be careful while using conditions in WHERE clause.
For Example: Write the query as

SELECT id, first_name, age FROM student_details WHERE age > 10;

Instead of:

SELECT id, first_name, age FROM student_details WHERE age != 10;

Write the query as

SELECT id, first_name, age
FROM student_details
WHERE first_name LIKE ‘Chan%’;

Instead of:

SELECT id, first_name, age
FROM student_details
WHERE SUBSTR(first_name,1,3) = ‘Cha’;

Write the query as

SELECT id, first_name, age
FROM student_details
WHERE first_name LIKE NVL ( :name, ‘%’);

Instead of:

SELECT id, first_name, age
FROM student_details
WHERE first_name = NVL ( :name, first_name);

Write the query as

SELECT product_id, product_name
FROM product
WHERE unit_price BETWEEN MAX(unit_price) and MIN(unit_price)

Instead of:

SELECT product_id, product_name
FROM product
WHERE unit_price >= MAX(unit_price)
and unit_price <= MIN(unit_price)

Write the query as

SELECT id, name, salary
FROM employee
WHERE dept = ‘Electronics’
AND location = ‘Bangalore’;

Instead of:

SELECT id, name, salary
FROM employee
WHERE dept || location= ‘ElectronicsBangalore’;

Use non-column expression on one side of the query because it will be processed earlier.

Write the query as

SELECT id, name, salary
FROM employee
WHERE salary < 25000;

Instead of:

SELECT id, name, salary
FROM employee
WHERE salary + 10000 < 35000;

Write the query as

SELECT id, first_name, age
FROM student_details
WHERE age > 10;

Instead of:

SELECT id, first_name, age
FROM student_details
WHERE age NOT = 10;

8) Use DECODE to avoid the scanning of same rows or joining the same table repetitively. DECODE can also be made used in place of GROUP BY or ORDER BY clause.
For Example: Write the query as

SELECT id FROM employee
WHERE name LIKE ‘Ramesh%’
and location = ‘Bangalore’;

Instead of:

SELECT DECODE(location,’Bangalore’,id,NULL) id FROM employee
WHERE name LIKE ‘Ramesh%’;

9) To store large binary objects, first place them in the file system and add the file path in the database.

10) To write queries which provide efficient performance follow the general SQL standard rules.

a) Use single case for all SQL verbs
b) Begin all SQL verbs on a new line
c) Separate all words with a single space
d) Right or left aligning verbs within the initial SQL verb

Categories: SQL Tags:

User-defined function in SQL


Functions in programming languages are subroutines used to encapsulate frequently performed logic. Any code that must perform the logic incorporated in a function can call the function rather than having to repeat all of the function logic

CREATE FUNCTION CubicVolume
— Input dimensions in centimeters.
(@CubeLength decimal(4,1), @CubeWidth decimal(4,1),
@CubeHeight decimal(4,1) )
RETURNS decimal(12,3) — Cubic Centimeters.
AS
BEGIN
RETURN ( @CubeLength * @CubeWidth * @CubeHeight )
END

A user-defined function that returns a table can also replace stored procedures that return a single result set. The table returned by a user-defined function can be referenced in the FROM clause of a Transact-SQL statement, whereas stored procedures that return result sets cannot. For example, fn_EmployeesInDept is a user-defined function that returns a table and can be invoked by a SELECT statement:

SELECT *
FROM tb_Employees AS E,
dbo.fn_EmployeesInDept(‘shipping’) AS EID
WHERE E.EmployeeID = EID.EmployeeID
This is an example of a statement that creates a function in the Northwind database that will return a table:

CREATE FUNCTION LargeOrderShippers ( @FreightParm money )
RETURNS @OrderShipperTab TABLE
(
ShipperID int,
ShipperName nvarchar(80),
OrderID int,
ShippedDate datetime,
Freight money
)
AS
BEGIN
INSERT @OrderShipperTab
SELECT S.ShipperID, S.CompanyName,
O.OrderID, O.ShippedDate, O.Freight
FROM Shippers AS S
INNER JOIN Orders AS O ON (S.ShipperID = O.ShipVia)
WHERE O.Freight > @FreightParm
RETURN
END

 Difference between Stored Procedure and Functions

1. UDF can be used in the SQL statements anywhere in the WHERE/HAVING/SELECT section where as Stored procedures cannot be.
2. UDFs that return tables can be treated as another rowset. This can be used in JOINs with other tables.

3. Inline UDF’s can be though of as views that take parameters and can be used in JOINs and other Rowset operations.

4. Stored Procedure retuns more than one value at a time while funtion returns only one value at a time.

5. We can call the functions in sql statements (select max(sal) from emp). where as sp is not so.

6. Function do not return the images,text whereas sp returns all.

7. Function and sp both can return the values. But function returns 1 value only. Procedure can return multiple values(max. 1024) we can select the fields from function. in the case of procdure we cannot select the fields.

8. Functions MUST return a value, procedures need not be.

9. You can have DML(insert, update, delete) statements in a function. But, you cannot call such a function in a SQL query.eg: suppose, if u have a function that is updating a table.. you can’t call that function in any sql query.

10. SP can call function but vice-versa not possible.

Categories: SQL Tags:

Clustered and Non-Clustered Index in SQL


1. Introduction

We all know that data entered in the tables are persisted in the physical drive in the form of database files. Think about a table, say Customer (For any leading bank India), that has around 16 million records. When we try to retrieve records for two or three customers based on their customer id, all 16 million records are taken and comparison is made to get a match on the supplied customer ids. Think about how much time that will take if it is a web application and there are 25 to 30 customers that want to access their data through internet. Does the database server do 16 million x 30 searches? The answer is no because all modern databases use the concept of index.

2. What is an Index

Index is a database object, which can be created on one or more columns (16 Max column combination). When creating the index will read the column(s) and forms a relevant data structure to minimize the number of data comparisons. The index will improve the performance of data retrieval and adds some overhead on data modification such as create, delete and modify. So it depends on how much data retrieval can be performed on table versus how much of DML (InsertDelete and Update) operations.

In this article, we will see creating the Index.

3. First Create Two Tables

To explain these constraints, we need two tables. First, let us create these tables. Run the below scripts to create the tables. Copy paste the code on the new Query Editor window, then execute it.

CREATE TABLE Student(StudId smallint, StudName varchar(50), Class tinyint);
CREATE TABLE TotalMarks(StudentId smallint, TotalMarks smallint);
Go

Note that there are no constraints at present on these tables. We will add the constraints one by one.

4. Primary Key Constraint

A table column with this constraint is called as the key column for the table. This constraint helps the table to make sure that the value is not repeated and also no null entries. We will mark the StudId column of the Studenttable as primary key. Follow these steps:

  1. Right click the student table and click on the modify button.
  2. From the displayed layout, select the StudId row by clicking the Small Square like button on the left side of the row.
  3. Click on the Set Primary Key toolbar button to set the StudId column as primary key column.

Pic01.JPG

Now this column does not allow null values and duplicate values. You can try inserting values to violate these conditions and see what happens. A table can have only one Primary key. Multiple columns can participate on the primary key column. Then, the uniqueness is considered among all the participant columns by combining their values.

5. Clustered Index

The primary key created for the StudId column will create a clustered index for the Studid column. A table can have only one clustered index on it.

When creating the clustered index, SQL server 2005 reads the Studid column and forms a Binary tree on it. This binary tree information is then stored separately in the disc. Expand the table Student and then expand theIndexes. You will see the following index created for you when the primary key is created:

Pic02.jpg

With the use of the binary tree, now the search for the student based on the studid decreases the number of comparisons to a large amount. Let us assume that you had entered the following data in the table student:

Pic03.jpg

The index will form the below specified binary tree. Note that for a given parent, there are only one or two Childs. The left side will always have a lesser value and the right side will always have a greater value when compared to parent. The tree can be constructed in the reverse way also. That is, left side higher and right side lower.

Pic04.JPG

Now let us assume that we had written a query like below:

Select * from student where studid = 103;
Select * from student where studid = 107;

Execution without index will return value for the first query after third comparison.
Execution without index will return value for the second query at eights comparison.

Execution of first query with index will return value at first comparison.
Execution of second query with index will return the value at the third comparison. Look below:

  1. Compare 107 vs 103 : Move to right node
  2. Compare 107 vs 106 : Move to right node
  3. Compare 107 vs 107 : Matched, return the record

If numbers of records are less, you cannot see a different one. Now apply this technique with a Yahoo email user accounts stored in a table called say YahooLogin. Let us assume there are 33 million users around the world that have Yahoo email id and that is stored in the YahooLogin. When a user logs in by giving the user name and password, the comparison required is 1 to 25, with the binary tree that is clustered index.

Look at the above picture and guess yourself how fast you will reach into the level 25. Without Clustered index, the comparison required is 1 to 33 millions.

Got the usage of Clustered index? Let us move to Non-Clustered index.

6. Non Clustered Index

A non-clustered index is useful for columns that have some repeated values. Say for example, AccountTypecolumn of a bank database may have 10 million rows. But, the distinct values of account type may be 10-15. Aclustered index is automatically created when we create the primary key for the table. We need to take care of the creation of the non-clustered index.

Follow the steps below to create a Non-clustered index on our table Student based on the column class.

  1. After expanding the Student table, right click on the Indexes. And click on the New Index.Pic05.jpg
  2. From the displayed dialog, type the index name as shown below and then click on the Add button to select the column(s) that participate in the index. Make sure the Index type is Non-Clustered.Pic06.jpg
  3. In the select column dialog, place a check mark for the column class. This tells that we need a non-clusteredindex for the column Student.Class. You can also combine more than one column to create the Index. Once the column is selected, click on the OK button. You will return the dialog shown above with the selected column marked in blue. Our index has only one column. If you selected more than one column, using the MoveUpand MoveDown button, you can change order of the indexed columns. When you are using the combination of columns, always use the highly repeated column first and more unique columns down in the list. For example, let use assume the correct order for creating the Non-clustered index is: ClassDateOfBirth,PlaceOfBirth.Pic07.jpg
  4. Click on the Index folder on the right side and you will see the non-clustered index based on the column class is created for you.Pic08.jpg

7. How Does a Non-Clustered Index Work?

A table can have more than one Non-Clustered index. But, it should have only one clustered index that works based on the Binary tree concept. Non-Clustered column always depends on the Clustered column on the database.

This can be easily explained with the concept of a book and its index page at the end. Let us assume that you are going to a bookshop and found a big 1500 pages of C# book that says all about C#. When you glanced at the book, it has all beautiful color pages and shiny papers. But, that is not only the eligibility for a good book right? One you are impressed, you want to see your favorite topic of Regular Expressions and how it is explained in the book. What will you do? I just peeped at you from behind and recorded what you did as below:

  1. You went to the Index page (it has total 25 pages). It is already sorted and hence you easily picked up Regular Expression that comes on page Number 17.
  2. Next, you noted down the number displayed next to it which is 407, 816, 1200-1220.
  3. Your first target is Page 407. You opened a page in the middle, the page is greater than 500.
  4. Then you moved to a somewhat lower page. But it still reads 310.
  5. Then you moved to a higher page. You are very lucky you exactly got page 407. [Yes man you got it. Otherwise I need to write more. OK?]
  6. That’s all, you started exploring what is written about Regular expression on that page, keeping in mind that you need to find page 816 also.

In the above scenario, the Index page is Non-Clustered index and the page numbers are clustered index arranged in a binary tree. See how you came to the page 407 very quickly. Your mind actually traversed the binary tree way left and right to reach the page 407 quickly.

Here, the class column with distinct values 1,2,3..12 will store the clustered index columns value along with it. Say for example; Let us take only class value of 1. The Index goes like this:

1: 100, 104, 105

So here, you can easily get all the records that have value for class = 1. Map this with the Book index example now.

%d bloggers like this: