Some Relational Algebra

Today a developer sent a SQL piece doing some insert select into task suffering from performance problem. A batch job has crashed. The problem was that SQL performing FILTER operation rather than a HASH_AJ. Suddenly I see the problem and I have rewrittenn the SQL:

Initial SQL was

insert into A select SUBSCRIBER_ID, REVENUE, 200710 year_month
from B t1
where (t1.subscriber_id, 200710) not in
(select /*+ hash_aj*/ t2.subscriber_id, t2.year_month from A t2) ;

Intiutively I’ve rewrite the query as

insert into A select SUBSCRIBER_ID, REVENUE, 200710 year_month
from B t1
where (t1.subscriber_id) not in
(select t2.subscriber_id from A t2 where t2.year_month = 200710) ;

Naturally Oracle change the execution plan to perform HASH_AJ and the query terminated with in 6 seconds or so. And developer thanked me and close the phone. Suddenly I asked myself “Yes it was obvious that the query did the right thing but can you PROVE it?” Gooosh. There was the real problem. How can you prove that two queries with totally different execution plans do the same thing. Let’s turn back to relational algebra and prove it maybe this improves a better understanding about anti hash join. If we can prove that where clauses of two queries will produce the same result set we are done. So I will be acting where clauses as actual datasets. Here is the proof-like illustration:

Proof By Relational Algebra

I feel confortable now 🙂

 Note: I beg pardon of Mathematicians (as being the brother of one) for the informal(ambigious, unrigirous etc.) parts of the proof.


About kocakahin

Just a computer engineer

Posted on November 27, 2007, in Oracle. Bookmark the permalink. 2 Comments.

  1. Also NULL’s may be problematic with anti-join access paths since some believe “never use constraints with olap” –

    -– original query
    SELECT /*+ parallel (q1,16) */ count(*)
    FROM party PARTITION(party_01) q1
    WHERE party_id NOT IN (SELECT /*+ hash_aj */ party_id
    FROM party_new PARTITION(party_01))

    .. no respond after several hours ..

    -– show CBO there won’t be any NULL coming from the sub-query
    SELECT /*+ parallel (a 8) */ COUNT(*)
    FROM party PARTITION(party_01) a
    WHERE party_id NOT IN (SELECT /*+ hash_aj parallel(b 8) */ party_id
    FROM party_new PARTITION(party_01) b
    WHERE party_id IS NOT NULL) ;

    Elapsed: 00:02:29.00

  2. Yes Yoda. If you notice the proof, it explains the reason why NOT NULL constraint or predicate is crucial for ANTI type of joins. In the proof we require to make an assumption about beta3(one of the disjoint subsets) of beta that it’s an empty set. In order to make the same assumption Oracle CBO needs some proof that that set is empty. This can be achive either by a metadata set on required column(NOT NULL constrint) or an explicit predicate statement “column is not null”. It’s not a Oracle inferiority as some may think, it is the Oracle way of doing things because {x|x is an element of A and x = s} union {x|x is an element of A and x != s} doesn’t span the set A. You need to union {x|x is an element of A and x is NULL} or state that this is an empty set.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: