Always Update: The Importance of Updating Sybase Statistics

By: Jeff Garbus, Lyle Fernquist


When we go to a performance and tuning gig, one of the first questions we always ask is, “When is the last time you updated your statistics?” You might be surprised at how frequently our assignment is cut short by simply updating the client’s statistics. If you’ve ever had a straightforward application with small tables, but terrible performance, or a complex application with erratic performance, or a seemingly well-performing application that suddenly became very slow, there’s a good chance the problem was out-of-date statistics.

SAP Sybase Adaptive Server Enterprise (ASE) uses a cost-based optimizer which balances the cost of finding the best query execution plan with the costs of running the query in different ways. To do this, it checks a histogram that identifies the data distribution, and uses this information to determine join orders, index selection, and more. You might think that the server would keep this information up-to-date, but because of the resource cost, it does not. You will need to perform this critical task (updating statistics) as often as your data skews.

NOTE

In high-volume environments, you can update statistics on a restored database, and then import them into the production server using the optdiag utility, but that’s a lot more work.

Options for updating statistics

ASE provides a number of options for updating statistics, depending on the specific needs you have.

To update information about the distribution of key values in specified indexes, for all columns in an index, table, or partition, the syntax is:

update statistics table_name

              [[ partitiondata_partition_name ] [ (column_list ) ] |

              index_name[ partitionindex_partition_name ] ]

              [ using step values ]

              [ with consumers = consumers ][, sampling=Npercent ]

 

To update the statistics for all columns in an index, the syntax is:

update index statistics             

table_name [[ partitiondata_partition_name ] |             

[ index_name [ partition index_partition_name ] ] ]             

[ usingstep values ]             

[ with consumers = consumers ] [, sampling=N percent]

 

To update all statistics information for a given table, the syntax is:

 

update all statistics table_name [ partition data_partition_name]

 

NOTE

You can run update all statistics on a single data partition.

To update statistics that are stored in sys­tabstats table, such as rowcount, cluster ratios, and so on, the syntax is:

update table statistics

              table_name [ partition data_partition_name ]

NOTE

Performing update table statistics does not affect column statistics stored in sys­statistics.

Which update statistics option is best?

We normally use update index statistics to get current statistics for all the columns in the indexes; we typically aren’t so concerned about statistics for non-indexes columns (if you’re going to search for something, index it, and if you’re not going to search for something, why use the resources to collect the statistics?)

Your situation, of course, may be different. Here are a few considerations when choosing which option will work best in your circumstances:

  1. You can update statistics for a specific partition. This can be really useful if you’re using semantic partitioning. Why? Consider this: if you are partitioned by date, and have your most recent data on the current partition, you can choose to update statistics for ONLY the more recent data, rather than the entire table of >1TB.
  2. You can update statistics for either a specific index or all of the indexes on a specific table. In general, you would run this on an entire table; for very large tables, you can update statistics on just the columns that skew the most.
  3. You can change the number of histogram steps to create (the default is 20 steps). You can configure this at the server level (new in late ASE 15). We typically build it into the scripts as a parameter so that changes of server-wide defaults at up­grade time don’t undo our decision-making processes.

When you increase the steps, it changes the number of histogram points that the optimizer has to work with. When would you do this? If you ever find your performance to be erratic, increasing the number of steps may be a good place to start.

Be sure to note that the random distribution of data doesn’t imply a steady distribution of data; in practice, it often promises the opposite.

  1. Using the consumers option, you can control the degree of parallelism, which can dramatically speed up the scans (depending on whether you’re already swamping your I/O subsystem). Of course, this is limited to the level of parallelism you’ve already set up on your server.
  2. The sampling allows you to reduce the number of rows you have to scan in order to build the histogram. This can save a lot of time if you have multi-terabyte tables or databases.

You can examine the statistics (including time since the last update) by querying the systabstats and sysstatistics table, or by using the optdiag utility from the OS prompt.

 

Does updating statistics cause locking and contention?

Some have claimed the update statistics command causes blocking and contention. In earlier versions of ASE, this may have sometime happened; however, in the newer releases of ASE, locking is minimal and this should not be a significant concern.

In general, if you are updating statistics for a table that has allpages locking defined, the server places an intent lock at the table level, and a page lock on the page currently being scanned. (Locking level 1). Otherwise, it does not lock (that is, with dirty reads / locking level 0).

 

It’s important to be up-to-date

As noted at the beginning, out-of-date statistics is a frequent cause of poor performance. Assess your circumstances and develop a plan for keeping your statistics up to date, so that the query optimizer can do its job. It’s more than worth the time you’ll invest.

Comments

  1. Hi I have a set of sybase questions.Any one review it and answer those questions pls

    1)
    Choose the steps followed by optimization
    to analyze a query (choose 3)

    a.
    the final access method is determined

    b.
    Each table is analyzed

    c.
    Adds optimized search arguments and join
    clauses

    d.
    the cost of using each index that matches a
    search a argument or join columns

    e.
    The query is prased and normalized

    2) Which
    of the following is not true about this query?

    Select
    * from sales_order

    Where
    order_amt/12 > 15000

    And
    salesman = ?Pablo?

    a. The
    query optimizer will consider performing a table scan.

    b. The
    query optimizer will consider using the index on salesman

    c. The
    query optimizer will consider using the index on order_amt

    d. The
    query optimizer will choose an access method based on cost

    3) The
    optimizer considers the size of the table in determining which aspects of a
    query

    Plan?
    (Choose 3)

    a. Serial
    Scan versus Parallel Scan

    b. Inner
    table versus Outer Table

    c. Index
    selection versus table scan

    d. Partition
    based versus hash based scan

    4) Given
    the following query:

    Select T.title, P.pub_name from
    titles T, Publishers P

    Where T.price > $10.00 and
    T.pub_id = P.pub_id abd P.pub_state = ?NY?

    Which of the following indexes will
    the ASE optimizer consider using? (Choose 3)

    a. Create
    unique clustered index idx4 on publishers (pub_id)

    b. Create
    clustered index idx5 on publishers (pub_name)

    c. Create
    index idx1 on titles (price)

    d. Create
    unique index idx3 on titles (title_id)

    e. Create
    index idx2 on titles (pub_id)

    5) Create
    table T1

    (Col1
    int NULL)

    There
    is an insert trigger on the table T1. What will happen if we fire the following
    statement?

    Insert
    into T1 values (“abc”)

    a. Adaptive
    server will give an error and the trigger will not fire.

    b. “abc”
    will not be inserted but the trigger will be fired without giving any error.

    c. “abc”
    will not be inserted and trigger will not be fired without giving any error.

    d. Adaptive
    server will give and error but the trigger will be fired.

    6) How
    many times will the loop execute?

    Declare
    @ int

    While
    @ b))

    It
    is then decided that the check constraint should be changed to ‘(a>b)’

    Which
    commands and/ or stored procedure must be executed, and in which order, to
    accomplish this change?

    a. Sp_help;
    alter table; alter table

    b. Sp_help;
    alter table

    c. alter
    table; alter table

    d. alter
    table

    e. sp_helpconstraint;
    alter table

    f. sp_helpconstraint;
    alter table; alter table

    9) Which
    of the following statements about optimizer statistics are true? (Choose 3)

    a. By
    default, dropping then recreating an index on a table with data overwrites any
    statistics on its leading column.

    b. By
    default creating an index on an empty table generates statistics on its leading
    column

    c. Dropping
    an index does not delete any column statistics

    By
    default creating an index on a table with data generates statistics on its
    leading column

Leave a Reply