Indices

Indices contribute to faster query output retrievals. This sect1 will explain the different types of indices available, as well as when you should use one type over the other.

Indices are used to enhance database performance, but if overly used, then it could inhibit database performance. Upon performing a search query, the database needs to check every row in the table. If you have large tables, then this can take a long time for it to perform the check. An index allows the database to quickly thumb through the table rows and quickly find the rows that you're looking for.

Using an Index

A common database has tables which often need to be queried based on a certain column. For example, the booktown database has an invoice table with the following structure:

Table 5-1. Invoice Table

Col NameDescriptionType
prod_idproduct identifierinteger
descriptdescription of producttext
amt_stocknumber of products in stockinteger

The booktown company daily performs a search to find products based on the product identifier (prod_id). One of the most commonly used search conditions are:

  SELECT amt_stock FROM invoice WHERE prod_id = 7280;
   

When the database performs the above command, it scans every row in the invoice table to find a result that returns true. As you can see, this can take a long time if the table is large, and contain only a few entries with product id 7280.

The solution to this problem is to use an index. The logic behind an index is similar to an index within a book. If you were looking for a particular term in a book, the best way to search for it is to look in the index. This way, you can find the term without having to read the entire book. All you need to do is scan through the index and skip to the place where the term is listed.

When the author writes that book, they have to anticipate which terms must be included in the index, and which term should be excluded from the index. Likewise, as the database programmer or administrator, you will need to anticipate which table columns should be indexed, and which shouldn't.

Therefore, the best way to perform faster searches using product identifier is to create an index on the prod_id column of the invoice table. Use the following command:

Example 5-1. Creating an Index

  CREATE INDEX prod_id_index ON invoice(prod_id);
   

This index is automatically used by the system. The system may decide to use this index if it is more efficient than a sequential table scan. It makes this decision based on the VACUUM ANALYZE command. You can run the VACUUM command to update the statistics stored.

You can also remove an index from the database using the DROP INDEX command. Using our product identifier index, we can drop it with:

Example 5-2. Dropping an Index

  DROP INDEX prod_id_index;
   

NoteKey Difference
 

This is where indices and primary keys differ. Indices can be created or destroyed at any time. Primary keys can only be created during table creation and destroyed when the table is removed from the database.

Index Types

There are three main index types. Each type is used for a specific query because they rely upon different algorithm calculations. The three types of indices are:

B-tree Index

The B-tree index originated with Lehman-Yao's high-concurrency B-trees. This index is used to calculate comparison queries which involve the operators in the following chart.

Table 5-2. Operators which require B-tree

B-tree is also the default index created by the CREATE INDEX command.

R-tree Index

This type of index performs searches for special data. It uses Guttman's quadratic split algorithm to perform an R-tree search. Creating an R-tree index requires you to specify USING RTREE in the command. To create an R-tree index, use the following syntax:

Example 5-3. RTREE index

        CREATE INDEX name ON table USING RTREE (column); 
        

The table below shows the operators used in a query that would initialize the system to use this R-tree index:

Table 5-3. Operators which require R-tree

Hash Index

A hash index uses the implementaion of Litwin's linear hashing. The only operator that requires a hash index to perform a comparison is the equal to (=) operator. Similar to an R-tree index, a hash index requires that you specify hash in the command statement. Use the following to create a hash index:

Example 5-4. Creating a Hash Index

   CREATE INDEX name 
             ON table 
     USING HASH (column); 
        

Hash indices are rarely used. You may have noticed that B-tree indices are also used when the equal to (=) operator is specified in a query. Currently, there is no evidence to support that a hash index is faster than using a B-tree index.

Multi Column Indices

An index is not limited to one per table. It can be defined for more than one column in a table. For instance, the shipped orders table follows the specifications listed in the chart below:

Table 5-4. Shipped Orders

Col NameDescriptionData type
invoice_numthe number which identifies the invoiceinteger
cust_idcustomer identification numberinteger
subtotalthe total price of the order, include shipping and handling chargesmoney
ship_datethe date the order was shipped ondate

The Book Town store typically uses these queries on the shipped orders table to find out the subtotal for orders placed by customers:

   SELECT subtotal FROM shipped_orders 
    WHERE invoice_num = 272 AND cust_id = 28; 
   

If a company had a similar situation to the Book Town store and frequently made queries which involves a comparison of two columns using the AND operator, then we can create a multiple column index for these transactions. To create the index, use:

Example 5-5. Creating an Index

   CREATE INDEX custid_ordnum_index 
       ON shipped_orders (cust_id, invoice_num);
   

The query optimizer will decide to use a multi-column index when the query involves only the first n consecutive columns in the index. For instance, an index with column x, y, and z can be used in queries which involve these combinations:

If a query was used involving x and z, then the optimizer might choose to use an index for only x and treat z like an ordinary unindexed column.

Some rules about indices that you should keep in mind are:

Unique Indices

Similar to keys, indices can be specified as unique. This means that the specified column does not contain rows with a repeating value. Multi-columns can also be made unique. When using unique multi-columns, the combined values of the columns in the index cannot have a repeating combined value.

NoteNote: NULL
 

NULLs are a special case. An index with several NULL values still constitute it unique, even though there are repeated NULL values in the table. This is because a NULL is not equal to another NULL. Since NULL is unknown, there is no way to compare an unknown value with another unknown value and conclude that both values are equal to each other.

To create a unique index, use the following command:

       CREATE UNIQUE INDEX name 
           ON table (column [, ...]);
   

Unique indices only works for the B-tree index type. Once a table is declared with the optional values: unique or primary key, a unique index is automatically created by PostgreSQL. If a table was created without a primary key or the unique option, then a unique index could be added at a later time. But a primary key cannot be added after the table has been created.

Functional Indices

A functional index is defined as an index that points to the result of a function. The function is a calculation that was applied to one or more columns of a single table. Functional indices allows the user to search quickly through the calculated results of a function.

Using the shipped_orders table, we can create a functional index which indices the months that the orders were shipped on. Use the extract() function to retrieve only the month from the date string. If we were to use a select to perform this operation, it will look like this:

  SELECT extract (month FROM ship_date) 
    FROM shipped_orders;   
   

The original dates were:

   2001-02-15
   2001-03-01
   2001-04-20
   
After having executed the above SELECT statement, the output was:
 date_part
-----------
         2
         3
         4
(3 rows)
   

This is only a small part of a large database filled with dates that we often perform searches for the months in which orders were shipped. If you have a similar situation, then it is a good idea to create a functional index. In order to create a functional index with the example above, use the following command:

Example 5-7. Creating a Simple Functional Index

  CREATE INDEX month_shipped_index 
      ON shipped_orders(extract(month FROM ship_date)); 
   

If you wanted to use more than one function to achieve a specific result and then have an index pointing to that result, then you can define a user-created function.

For instance, the customer table contains a list of customer identification numbers, their last name, and their first name. The customer first names all have a leading Mr, Mrs, or Dr, followed by a period. We want to drop the leading title and only display the names. If we were to perform a select statement, it outputs this:

  SELECT substr
           (
           firstname, strpos(firstname, '.')+1, 
           length(firstname)-strpos(firstname, '.')
           ) 
           FROM customer;
   

To create an index of first names without their titles will require creating a function and then creating an index which points to the result of that function. To create this type of function, insert the SQL statement into the body of a function statement. Using our previous example, we can create a function called dropTitle.

Example 5-8. Creating a Function

  CREATE FUNCTION dropTitle(text)
           RETURNS text AS
           'SELECT substr
           (
           firstname, strpos(firstname, \'.\')+1, 
           length(firstname)-strpos(firstname, \'.\')
           ) 
           FROM customer WHERE firstname =$1'
           LANGUAGE 'SQL';
   

This function uses another function called substring to search for the position of the period (.) in the string, then extract that period along with any string located to the left of the period (.). We need to use the backslash when quoting the periods because the system may think it is the end quote for the SQL SELECT statement if the backslash is omitted. For more information on functions, refer to the where functions and operators are explained in detail.

To create an index on the resulting column requires calling the dropTitle function. The command below creates a functional index on the results of the dropTitle function:

Example 5-9. Creating a Complex Functional Index

  CREATE INDEX non_titled_names_index
      ON customer(dropTitle(firstname)); 
   

NoteIndex Names
 

The functional index created with the name "non_titled_names_index" is long, but it describes to you what it is. When naming indices, you need to give it meaningful names. Don't worry about having to use the index names, because indices are used automatically by PostgreSQL whenever it thinks that the index is the faster route to returning a result for a query. The only time you are required to use the index name after creating it is during destruction of the index from the database.

Primary Keys

A primary key is a field used to identify a row. Unique specifies that there cannot be any repeating values. A table can contains columns which have the following values:

The difference is that a unique column only contains non-repeating values, but a value does not exist for indexing to occur. A primary key column will have indexing capability, but the column may have repeating data values.

There are cases where you may want to have a unique primary key column. For instance, the inventory table contains the product identification numbers for each product offered. The product number should be unique because if two products had the same number, it is hard to decipher which product a customer was ordering. It also needs to be a primary key because this column is used to perform searches on queries and to distinguish this row from other rows. Some differences could be summed up in the table below:

Table 5-5. Primary/Unique Difference

Primary KeyUnique
identifies the rowalternative access to the row
difficult to updatecan easily be updated, if the new value is also unique
Does not allow NULL valuesAllows NULL values

An index or primary key functionality is provided to aid the system to become more efficient. Some queries may require an index or primary key to perform a faster search, but others may just be as fast using a sequential search. As the database administrator or manager, you may need to anticipate for these situations.