SQL Monitor

When faced with a particularly slow running query it can be difficult to figure out the cause of the problem. Obtaining the relevant execution plan is a significan step in diagnosing where the problem lies but at times this is not enough. Skewed data distributions and the use of bind variables means that a plan that is suitable for one set of variables may be totally inappropriate for other variables. To help us with this type of problem Oracle introduced SQL monitoring in 11g that tells us specifically what a query is up to and where plan calculations may have gone astray.

You’ll find details about SQL monitoring in the Performance Tuning Guide but essentially Oracle will automatically start monitoring a query if it’s run in parallel or if the CPU or I/O time exceeds 5 seconds. Alternately, Oracle will monitor a query if it’s hinted with the monitoring hint:

SELECT /*+ monitor */
...

Once Oracle is monitoring a query then it populates two internal views, v$sql_monitor and v$sql_plan_monitor, with details of the execution. Note, the details pertain to a specific execution of a query and are not aggregate figures across multiple executions so we get can get the actual details of what Oracle was doing for our query execution. Of course, in a busy system Oracle may be monitoring many statement executions. Oracle will retain the details in the views for at least 1 minute after execution finishes so it’s important to grab the details as soon as possible. Keep in mind that once monitoring starts then the views are being populated in real time (well, every second) and not only at the end of the execution. This is what makes the monitoring capability so appropriate to diagnosing a currently executing slow query.

The v$sql_monitor view contains one row for each statement that is being monitored. Identifying the specific query that you are interested in is generally a matter of locating the session executing your query from v$session, obtaining details such as the SID and SQL_ID and querying v$sql_monitor with these details. Of course, if your application is properly instrumented with things like CLIENT_IDENTIFIER, MODULE and ACTION then tracking down your query in v$sql_monitor will be even easier. v$sql_monitor contains a number of time-based statistics such as elapsed time, CPU time, application wait time, concurrency wait time, IO wait time and more. These statistics should help you figure out the type of activity your query is spending its time on. It also contains logical and physical I/O details that can help point to whether you have a query plan issue.

In my opinion, the real attraction in SQL monitoring comes from v$sql_plan_monitor. Once you’ve identified your query within v$sql_monitor, you can use the KEY column to query the v$sql_plan_monitor view. For the statement being monitored, this view contains one entry for each line of the execution plan. The view contains details you will find from v$sql_plan, such as the plan operation, object owner, object name and position. It also contains the plan cost and cardinality. These details you will normally obtain the from execution plan but what v$sql_plan_monitor gives you as well is the actual execution details for each step; the number of times the operation was executed, the rows output, the first and last time a row was generated, the physical I/O details and the workarea memory consumption.

Using v$sql_plan_monitor, a simple comparison of the plan cardinality to the actual rows output will tell you if Oracle has made reasonable assumptions about the plan it generated and subsequently whether the join operations it selected are appropriate. Let’s take a look at an example, using a table of 1,000,000 rows with a category column containing 5 distinct values with the following distribution:

  • value 1: 1,000 rows
  • value 2: 5,000 rows
  • value 3: 50,000 rows
  • value 4: 100,000 rows
  • value 5: 844,000 rows

First we create and populate out table, making sure that our values are distributed throughout the table:

CREATE TABLE monitor_test
  (id       NUMBER (10) NOT NULL
  ,category NUMBER(1)   NOT NULL
  ,padding  CHAR(30)    NOT NULL
  ,CONSTRAINT monitor_test_pk PRIMARY KEY (id))
/

INSERT INTO monitor_test
SELECT rownum
,      CASE 
          WHEN MOD(rownum,1000)  = 0                 THEN 1
          WHEN MOD(rownum,1000)  BETWEEN 1  AND 5    THEN 2
          WHEN MOD(rownum,1000)  BETWEEN 6  AND 55   THEN 3
          WHEN MOD(rownum,1000)  BETWEEN 56 AND 155  THEN 4
          ELSE 5
       END
,      'X'
FROM   dual
CONNECT BY level <= 1000000
/

COMMIT
/

CREATE INDEX monitor_test_i1
ON monitor_test (category)
/

EXEC dbms_stats.gather_table_stats ('','monitor_test',estimate_percent=>NULL,method_opt=>'FOR ALL INDEXED COLUMNS SIZE AUTO')

-- check the distribution of CATEOGRY values
SELECT category
,      count(*) AS no_entries
FROM   monitor_test
GROUP  BY
       category
ORDER  BY
       category
/

When we run an EXPLAIN PLAN for queries that select from the table and filter on the CATEGORY column using values of:

  • 1
  • 5
  • bind variable

we get:

SQL>EXPLAIN PLAN
  2  FOR
  3  SELECT *
  4  FROM   monitor_test
  5  WHERE  category = 1
  6  /

Explained.

SQL>
SQL>SELECT *
  2  FROM   TABLE(dbms_xplan.display())
  3  /
Plan hash value: 1473390912

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |  1000 | 39000 |    15   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MONITOR_TEST    |  1000 | 39000 |    15   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | MONITOR_TEST_I1 |  1000 |       |     4   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CATEGORY"=1)

14 rows selected.

SQL>
SQL>EXPLAIN PLAN
  2  FOR
  3  SELECT *
  4  FROM   monitor_test
  5  WHERE  category = 5
  6  /

Explained.

SQL>
SQL>SELECT *
  2  FROM   TABLE(dbms_xplan.display())
  3  /
Plan hash value: 2944410690

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |   844K|    31M|  1657   (1)| 00:00:20 |
|*  1 |  TABLE ACCESS FULL| MONITOR_TEST |   844K|    31M|  1657   (1)| 00:00:20 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CATEGORY"=5)

13 rows selected.

SQL>
SQL>EXPLAIN PLAN
  2  FOR
  3  SELECT *
  4  FROM   monitor_test
  5  WHERE  category = :cat
  6  /

Explained.

SQL>
SQL>SELECT *
  2  FROM   TABLE(dbms_xplan.display())
  3  /
Plan hash value: 2944410690

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |   200K|  7617K|  1661   (2)| 00:00:20 |
|*  1 |  TABLE ACCESS FULL| MONITOR_TEST |   200K|  7617K|  1661   (2)| 00:00:20 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CATEGORY"=TO_NUMBER(:CAT))

13 rows selected.

We can see that Oracle will choose to do an index lookup into the table when we select a CATEGORY value of 1 and choose to do a full table scan when we select a CATEGORY value of 5. When we provide a bind variable instead of a hard coded literal for the CATEGORY filter Oracle calculates it will return 200,000 rows and chooses to do a full table scan. Obviously no CATEGORY value would actually return this number of rows but it’s the best Oracle can do with the data it has available to it.

So, without knowing the value of the bind variable, Oracle shown that it will choose to scan the table. In a proper system this kind of assumption by the optimiser may manifest as a query that takes far too long to execute. It may be picked up through an examination of the execution plan but monitoring allows us to look into the query execution itself to see exactly where Oracle has made its optimisation error.

Let’s see what SQL monitoring tells us if we actually run the query, providing bind variable values of 5 and then again with a value of 1 (using AUTOTRACE to suppress the actual output):

SQL>SET LINESIZE 132
SQL>SET PAGESIZE 9999
SQL>SET ARRAYSIZE 1000
SQL>
SQL>COLUMN plan_line_id     FORMAT 90
SQL>COLUMN plan_operation   FORMAT A20
SQL>COLUMN plan_options     FORMAT A20
SQL>COLUMN plan_object_name FORMAT A15
SQL>COLUMN plan_object_type FORMAT A10
SQL>COLUMN plan_cost        FORMAT 9999990
SQL>COLUMN plan_cardinality FORMAT 9999990
SQL>COLUMN starts           FORMAT 9999990
SQL>COLUMN output_rows      FORMAT 9999990
SQL>
SQL>VARIABLE cat NUMBER
SQL>VARIABLE client_id VARCHAR2(30)
SQL>
SQL>EXEC :cat := 5

PL/SQL procedure successfully completed.

SQL>EXEC :client_id := 'monitor test cat 5'

PL/SQL procedure successfully completed.

SQL>
SQL>EXEC dbms_session.set_identifier (:client_id)

PL/SQL procedure successfully completed.

SQL>
SQL>SET AUTOTRACE TRACEONLY STATISTICS
SQL>
SQL>SELECT /*+ MONITOR */
  2  	    *
  3  FROM   monitor_test
  4  WHERE  category = :cat
  5  /

844000 rows selected.

Statistics
----------------------------------------------------------
         79  recursive calls
          0  db block gets
       6988  consistent gets
       6066  physical reads
          0  redo size
    8578594  bytes sent via SQL*Net to client
       9792  bytes received via SQL*Net from client
        845  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
     844000  rows processed

SQL>
SQL>SET AUTOTRACE OFF
SQL>
SQL>EXEC dbms_session.set_identifier (NULL)

PL/SQL procedure successfully completed.

SQL>
SQL>SELECT pm.plan_line_id
  2  ,	    pm.plan_operation
  3  ,	    pm.plan_options
  4  ,	    pm.plan_object_name
  5  ,	    pm.plan_object_type
  6  ,	    pm.plan_cost
  7  ,	    pm.plan_cardinality
  8  ,	    pm.starts
  9  ,	    pm.output_rows
 10  FROM   v$sql_monitor      m
 11  ,	    v$sql_plan_monitor pm
 12  WHERE  m.key = pm.key
 13  AND    m.client_identifier = :client_id
 14  ORDER  BY
 15  	    m.sql_exec_start
 16  ,	    pm.key
 17  ,	    pm.sql_id
 18  ,	    pm.plan_line_id
 19  /

PLAN_LINE_ID PLAN_OPERATION       PLAN_OPTIONS         PLAN_OBJECT_NAM PLAN_OBJEC PLAN_COST PLAN_CARDINALITY   STARTS OUTPUT_ROWS
------------ -------------------- -------------------- --------------- ---------- --------- ---------------- -------- -----------
           0 SELECT STATEMENT                                                                                       1      844000
           1 TABLE ACCESS         FULL                 MONITOR_TEST    TABLE           1656           200000        1      844000

SQL>
SQL>EXEC :cat := 1

PL/SQL procedure successfully completed.

SQL>EXEC :client_id := 'monitor test cat 1'

PL/SQL procedure successfully completed.

SQL>
SQL>EXEC dbms_session.set_identifier (:client_id)

PL/SQL procedure successfully completed.

SQL>
SQL>SET AUTOTRACE TRACEONLY STATISTICS
SQL>
SQL>SELECT /*+ MONITOR */
  2  	    *
  3  FROM   monitor_test
  4  WHERE  category = :cat
  5  /

1000 rows selected.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       6064  consistent gets
       6060  physical reads
          0  redo size
       9656  bytes sent via SQL*Net to client
        519  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1000  rows processed

SQL>
SQL>SET AUTOTRACE OFF
SQL>
SQL>EXEC dbms_session.set_identifier (NULL)

PL/SQL procedure successfully completed.

SQL>
SQL>SELECT pm.plan_line_id
  2  ,	    pm.plan_operation
  3  ,	    pm.plan_options
  4  ,	    pm.plan_object_name
  5  ,	    pm.plan_object_type
  6  ,	    pm.plan_cost
  7  ,	    pm.plan_cardinality
  8  ,	    pm.starts
  9  ,	    pm.output_rows
 10  FROM   v$sql_monitor      m
 11  ,	    v$sql_plan_monitor pm
 12  WHERE  m.key = pm.key
 13  AND    m.client_identifier = :client_id
 14  ORDER  BY
 15  	    m.sql_exec_start
 16  ,	    pm.key
 17  ,	    pm.sql_id
 18  ,	    pm.plan_line_id
 19  /

PLAN_LINE_ID PLAN_OPERATION       PLAN_OPTIONS         PLAN_OBJECT_NAM PLAN_OBJEC PLAN_COST PLAN_CARDINALITY   STARTS OUTPUT_ROWS
------------ -------------------- -------------------- --------------- ---------- --------- ---------------- -------- -----------
           0 SELECT STATEMENT                                                                                       1        1000
           1 TABLE ACCESS         FULL                 MONITOR_TEST    TABLE           1656           200000        1        1000

SQL>
SQL>SPOOL OFF

It can easily be seen from the last column of the queries against v$sql_monitor and v$sql_plan_monitor exactly what went on with our queries. The first query, with a bind variable value of 5, returned 844,000 rows from the full table scan of the table and the second query returned 1,000 rows. In both scenarios, Oracle’s calculated cardinality was 200,000.

Obviously the example used here is quite trivial. In a more complex query the above technique can be used to detect differences in particular steps of the overall execution plan, which helps pinpoint where a query may be “going wrong”. The difference between the FIRST_CHANGE_TIME and LAST_CHANGE_TIME is another metric for gauging which activities are taking a long time within the query execution since this helps pinpoint where the time is spent.

One point to bear in mind when using v$sql_plan_monitor is that if you are relying on SQL monitoring kicking in after 5 seconds of CPU or I/O activity then some details of those first 5 seconds are lost and don’t get reported. It may be best to explicitly use the MONITOR hint for queries that you believe will be problematic.


Download scripts shown in this post.

Advertisements

One thought on “SQL Monitor

  1. Pingback: Bind variable peeking & adaptive cursor sharing | Mark Hoxey

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s