Indices contribute to faster query output retrievals. This sect1 will explain the different types of indices available, as well as when you should use one type over the other.
Indices are used to enhance database performance, but if overly used, then it could inhibit database performance. Upon performing a search query, the database needs to check every row in the table. If you have large tables, then this can take a long time for it to perform the check. An index allows the database to quickly thumb through the table rows and quickly find the rows that you're looking for.
A common database has tables which often need to be queried based on a certain column. For example, the booktown database has an invoice table with the following structure:
Table 5-1. Invoice Table
| Col Name | Description | Type |
|---|---|---|
| prod_id | product identifier | integer |
| descript | description of product | text |
| amt_stock | number of products in stock | integer |
The booktown company daily performs a search to find products based on the product identifier (prod_id). One of the most commonly used search conditions are:
SELECT amt_stock FROM invoice WHERE prod_id = 7280;
When the database performs the above command, it scans every row in the invoice table to find a result that returns true. As you can see, this can take a long time if the table is large, and contain only a few entries with product id 7280.
The solution to this problem is to use an index. The logic behind an index is similar to an index within a book. If you were looking for a particular term in a book, the best way to search for it is to look in the index. This way, you can find the term without having to read the entire book. All you need to do is scan through the index and skip to the place where the term is listed.
When the author writes that book, they have to anticipate which terms must be included in the index, and which term should be excluded from the index. Likewise, as the database programmer or administrator, you will need to anticipate which table columns should be indexed, and which shouldn't.
Therefore, the best way to perform faster searches using product identifier is to create an index on the prod_id column of the invoice table. Use the following command:
This index is automatically used by the system. The system may decide to use this index if it is more efficient than a sequential table scan. It makes this decision based on the VACUUM ANALYZE command. You can run the VACUUM command to update the statistics stored.
You can also remove an index from the database using the DROP INDEX command. Using our product identifier index, we can drop it with:
![]() | Key Difference |
|---|---|
This is where indices and primary keys differ. Indices can be created or destroyed at any time. Primary keys can only be created during table creation and destroyed when the table is removed from the database. |
There are three main index types. Each type is used for a specific query because they rely upon different algorithm calculations. The three types of indices are:
The B-tree index originated with Lehman-Yao's high-concurrency B-trees. This index is used to calculate comparison queries which involve the operators in the following chart.
B-tree is also the default index created by the CREATE INDEX command.
This type of index performs searches for special data. It uses Guttman's quadratic split algorithm to perform an R-tree search. Creating an R-tree index requires you to specify USING RTREE in the command. To create an R-tree index, use the following syntax:
The table below shows the operators used in a query that would initialize the system to use this R-tree index:
A hash index uses the implementaion of Litwin's linear hashing. The only operator that requires a hash index to perform a comparison is the equal to (=) operator. Similar to an R-tree index, a hash index requires that you specify hash in the command statement. Use the following to create a hash index:
Hash indices are rarely used. You may have noticed that B-tree indices are also used when the equal to (=) operator is specified in a query. Currently, there is no evidence to support that a hash index is faster than using a B-tree index.
An index is not limited to one per table. It can be defined for more than one column in a table. For instance, the shipped orders table follows the specifications listed in the chart below:
Table 5-4. Shipped Orders
| Col Name | Description | Data type |
|---|---|---|
| invoice_num | the number which identifies the invoice | integer |
| cust_id | customer identification number | integer |
| subtotal | the total price of the order, include shipping and handling charges | money |
| ship_date | the date the order was shipped on | date |
The Book Town store typically uses these queries on the shipped orders table to find out the subtotal for orders placed by customers:
SELECT subtotal FROM shipped_orders
WHERE invoice_num = 272 AND cust_id = 28;
If a company had a similar situation to the Book Town store and frequently made queries which involves a comparison of two columns using the AND operator, then we can create a multiple column index for these transactions. To create the index, use:
Example 5-5. Creating an Index
CREATE INDEX custid_ordnum_index
ON shipped_orders (cust_id, invoice_num);
The query optimizer will decide to use a multi-column index when the query involves only the first n consecutive columns in the index. For instance, an index with column x, y, and z can be used in queries which involve these combinations:
all of x, y, and z
both x and y
only x
If a query was used involving x and z, then the optimizer might choose to use an index for only x and treat z like an ordinary unindexed column.
Some rules about indices that you should keep in mind are:
Multiple column indices can only be implemented using the B-tree index type.
Multiple column indices can only be used with queries joining the indexed columns with an AND operator.
For instance, the following does not use a multi-column index to perform the search:
The number of columns a multi-column index can have is limited to 16. You can change this limit when building PostgreSQL. It is in the config.h file.
Multiple column indices should be used whenever necessary. Usually, a single column index is adequate to perform a search which saves space and time. Indices with more than three columns can cause problems and are unnecessary.
Similar to keys, indices can be specified as unique. This means that the specified column does not contain rows with a repeating value. Multi-columns can also be made unique. When using unique multi-columns, the combined values of the columns in the index cannot have a repeating combined value.
![]() | Note: NULL |
|---|---|
NULLs are a special case. An index with several NULL values still constitute it unique, even though there are repeated NULL values in the table. This is because a NULL is not equal to another NULL. Since NULL is unknown, there is no way to compare an unknown value with another unknown value and conclude that both values are equal to each other. |
To create a unique index, use the following command:
CREATE UNIQUE INDEX name
ON table (column [, ...]);
Unique indices only works for the B-tree index type. Once a table is declared with the optional values: unique or primary key, a unique index is automatically created by PostgreSQL. If a table was created without a primary key or the unique option, then a unique index could be added at a later time. But a primary key cannot be added after the table has been created.
A functional index is defined as an index that points to the result of a function. The function is a calculation that was applied to one or more columns of a single table. Functional indices allows the user to search quickly through the calculated results of a function.
Using the shipped_orders table, we can create a functional index which indices the months that the orders were shipped on. Use the extract() function to retrieve only the month from the date string. If we were to use a select to perform this operation, it will look like this:
SELECT extract (month FROM ship_date)
FROM shipped_orders;
The original dates were:
2001-02-15 2001-03-01 2001-04-20After having executed the above SELECT statement, the output was:
date_part
-----------
2
3
4
(3 rows)
This is only a small part of a large database filled with dates that we often perform searches for the months in which orders were shipped. If you have a similar situation, then it is a good idea to create a functional index. In order to create a functional index with the example above, use the following command:
Example 5-7. Creating a Simple Functional Index
CREATE INDEX month_shipped_index
ON shipped_orders(extract(month FROM ship_date));
If you wanted to use more than one function to achieve a specific result and then have an index pointing to that result, then you can define a user-created function.
For instance, the customer table contains a list of customer identification numbers, their last name, and their first name. The customer first names all have a leading Mr, Mrs, or Dr, followed by a period. We want to drop the leading title and only display the names. If we were to perform a select statement, it outputs this:
SELECT substr
(
firstname, strpos(firstname, '.')+1,
length(firstname)-strpos(firstname, '.')
)
FROM customer;
To create an index of first names without their titles will require creating a function and then creating an index which points to the result of that function. To create this type of function, insert the SQL statement into the body of a function statement. Using our previous example, we can create a function called dropTitle.
Example 5-8. Creating a Function
CREATE FUNCTION dropTitle(text)
RETURNS text AS
'SELECT substr
(
firstname, strpos(firstname, \'.\')+1,
length(firstname)-strpos(firstname, \'.\')
)
FROM customer WHERE firstname =$1'
LANGUAGE 'SQL';
This function uses another function called substring to search for the position of the period (.) in the string, then extract that period along with any string located to the left of the period (.). We need to use the backslash when quoting the periods because the system may think it is the end quote for the SQL SELECT statement if the backslash is omitted. For more information on functions, refer to the where functions and operators are explained in detail.
To create an index on the resulting column requires calling the dropTitle function. The command below creates a functional index on the results of the dropTitle function:
Example 5-9. Creating a Complex Functional Index
CREATE INDEX non_titled_names_index
ON customer(dropTitle(firstname));
![]() | Index Names |
|---|---|
The functional index created with the name "non_titled_names_index" is long, but it describes to you what it is. When naming indices, you need to give it meaningful names. Don't worry about having to use the index names, because indices are used automatically by PostgreSQL whenever it thinks that the index is the faster route to returning a result for a query. The only time you are required to use the index name after creating it is during destruction of the index from the database. |
A primary key is a field used to identify a row. Unique specifies that there cannot be any repeating values. A table can contains columns which have the following values:
be unique
have a primary key
be unique and have a primary key
The difference is that a unique column only contains non-repeating values, but a value does not exist for indexing to occur. A primary key column will have indexing capability, but the column may have repeating data values.
There are cases where you may want to have a unique primary key column. For instance, the inventory table contains the product identification numbers for each product offered. The product number should be unique because if two products had the same number, it is hard to decipher which product a customer was ordering. It also needs to be a primary key because this column is used to perform searches on queries and to distinguish this row from other rows. Some differences could be summed up in the table below:
Table 5-5. Primary/Unique Difference
| Primary Key | Unique |
|---|---|
| identifies the row | alternative access to the row |
| difficult to update | can easily be updated, if the new value is also unique |
| Does not allow NULL values | Allows NULL values |
An index or primary key functionality is provided to aid the system to become more efficient. Some queries may require an index or primary key to perform a faster search, but others may just be as fast using a sequential search. As the database administrator or manager, you may need to anticipate for these situations.