Category Archives: Oracle

Open World 2012 Session

The time came and it is Oracle Open World time once again. I fly San Francisco next week to join the largest technology event in the world. This will be the Complex Event Processing, Advanced Analytics and Data Warehousing year for me in Open World.

After spending a non-speaker year (2011), this year I will be presenting on Oracle Enterprise R and ODM by going over several use cases. Here are session details:

  • What: Database Data Mining: Practical Enterprise R and Oracle Advanced Analytics
  • When: 1st of October 2012 16:45 – 17:45
  • Where: Moscone West – 3016

Join me and learn that BI is not about filtering tables, rotating tables, combining and aggregating tables without knowing the actual reason of doing them :)

Book Review: Oracle Warehouse Builder 11gR2: Getting Started 2011

As being the embedded option of Oracle relational database engine, OWB is still my favourite ETL (or to correct ELT) tool. In addition, although Oracle positions ODI as their strategic tool for this purpose, they still keep investing on OWB  at least in 11g Release 2.

I have recently had the chance to review a new book by Packt Publishing, namely Oracle Warehouse Builder 11gR2: Getting Started 2011. Book is  a good introductory book for newbies willingness to learn more on OWB by playing with it in a playground.

Book is organised to let you build your own OWB environment and work on a toy problem namely ACME Toy and Gizmos. You design and create your source and target structures and implement you ELT.

The book will help those who wish to learn more on OWB and look for a practical guide.

Book Review: Oracle GoldenGate 11g Implementer’s Guide

goldengate

As being a consultant in VLDB domain, one of the most popular questions of today is “How can I feed my data warehouse/reporting environment in real-time?” Yet another one is “How can I offload my reporting activity over my OTP environment without generating any time gap between reality and what is being reported?” In addition, I am a bit depressed to see people taking CDC (Change Data Capture) products as DR (Disaster Recovery) solutions.

It was two years ago in ACED briefing when Thomas Kurain declared that GoldenGate is Oracle’s strategic real-time integration solution. After that, I have spent quite a few time to understand merits and drawbacks of this product. I have talked customers want to use it, already using it, and suffering it. Almost all sites annoyed with the same problems

  • It is very hard to configure GoldenGate
  • It is very hard to monitor&manage GoldenGate
  • Oracle documentation is still not sufficient for them.

To be honest it is hard to say that they are wrong.

Last week I have read John P. Jeffries’s Oracle GoldenGate 11g Implementer’s Guide and I can easily say that it is a nice piece of material built just to make reader a successful GoldenGate implementer. There is no dictionary-based definition of GoldenGate concepts like Extract, Trail File, Data Pump, etc. as it is in Oracle formal documentations. The book is structured in “Let me show what I define above” fashion. The book is full of details to show you the way of implementing up and running GoldenGate systems. However, I will continue to write on my favorite sections.

Chapter 6: Configuring Golden Gate for HA is on how to configure GoldenGate on a RAC database. Chapter covers to integrate GoldenGate with clusterware software to enable automatic failover. I have seen customer sites still writing custom scripts for this. Therefore, this chapter is a good how to for RAC implementers.

Chapter 8: Managing Oracle GoldenGate is a chapter mainly about reporting performance metrics and errors of functional GoldenGate systems. Chapter explains to report your GoldenGate errors, latency, and throughput and interpret them.

Chapter 9: Performance Tuning is on tuning GoldenGate using parallel extracts and replicates and tuning GoldenGate storage and network for best performance.

To sum up, if you are a newbie in GoldenGate but responsible with implementing a new GoldenGate environment or maintaining an existing one, this book might be a good resource for you.

An English Man in Istanbul

A gentle man… An Oracle expert … An esteemed community figure…
Yes. As the reader of this Oracle focused blog, you should figure out about whom I am talking. Last week (on Thursday), we have achieved to set first Turkey Oracle User Group (TROUG) Day at Bahcesehir University.

We have just discussed on the keynote speaker as the Turkish community members and agreed on one name: Jonathan Lewis
The problem was that as being such a popular figure in community, it was almost impossible to find a proper gap for our event in Jonathan’s schedule. As I have sent the invitation to him and asked for his attendance on one Sunday afternoon, Jonathan just replied that he would be with us and perform the keynote speech. Almost a month after our mailing, Jonathan was on stage at Bahcesehir University auditorium as our keynote speaker talking “just about joining two tables”.
It was amazing that we have finally achieved with this user group day and the man whose books are my Oracle library’s masterpiece was on stage.
Thank you Jonathan personally and as the founding member of TROUG for honoring us and hope to see you soon again…

The First Oracle Exadata Certified Implementation Specialist of Turkey

After migrating more than 120 TB of data on Exadata v2 and delivering 4 Exadata Handson Courses in Europe (Germany, Russia, and Belgium), last week my friend Zekeriya Besiroglu told me that they have recently opened up an Exadata certification exam. I have taken the exam last week and pass with 93% score.

Just to guide you guys who will take this exam, it seems that I/O Resource Manager is the most important topic of the exam although this is not true for Exadata customers in my region :)

Create Your Own Oracle TPC-H Playground on Linux

I believe sometimes all of us suffer from the limitations of playing with Oracle’s SH, SCOTT, etc. schemas to generate a sufficiently large playground for our tests. In this post you will find how to create your own TPC-H playground database on Linux.

Download TPC-H Data Generator (dbgen)

TPC as being the council for TPC-H benchmarks delivers a standardized data generation tool for all benchmarks. You can download this tool from http://www.tpc.org/tpch/default.asp (The version I will be using can be downloaded from http://www.tpc.org/tpch/spec/tpch_2_12_0_b5.zip). This bundle contains a bunch of C files to be compiled to form dbgen. Copy the zip file into one of your folders and ensure that your Linux environment has the necessary toolkit to compile C language (gcc, make, etc.)

[oracle@localhost ~]$ mkdir tpch
[oracle@localhost ~]$ mv tpch_2_12_0_b5.zip ./tpch
[oracle@localhost ~]$ cd tpch/
[oracle@localhost tpch]$ unzip tpch_2_12_0_b5.zip
	Archive: tpch_2_12_0_b5.zip
	inflating: build.c
	inflating: driver.c
	inflating: bm_utils.c
	inflating: rnd.c
...
[oracle@localhost tpch]$

Build DBGEN

Next thing you should do is to cp makefile.suite file in tpch directory and  change some parameters in copied file.

  • CC
  • DATABASE
  • MACHINE
  • WORKLOAD
[oracle@localhost tpch]$ cp makefile.suite makefile

[oracle@localhost tpch]$ vi make
...
################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC      = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
#                                  SQLSERVER, SYBASE, ORACLE
# Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
#                                  SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are:  TPCH
DATABASE= ORACLE
MACHINE = LINUX
WORKLOAD = TPCH
...

Now run make command

[oracle@localhost tpch]$ make
chmod 755 update_release.sh
./update_release.sh 2 12 0
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o build.o build.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o driver.o driver.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bm_utils.o bm_utils.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rnd.o rnd.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o print.o print.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o load_stub.o load_stub.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bcd2.o bcd2.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o speed_seed.o speed_seed.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o text.o text.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o permute.o permute.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rng64.o rng64.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o dbgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o -lm
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o qgen.o qgen.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o varsub.o varsub.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o qgen build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o speed_seed.o rng64.o -lm

After a successful compilation you should have dbgen executable in your current working director:

[oracle@localhost tpch]$ ./dbgen -h
TPC-H Population Generator (Version 2.12.0 build 5)
Copyright Transaction Processing Performance Council 1994 - 2010
USAGE:
dbgen [-{vf}][-T {pcsoPSOL}]
[-s <scale>][-C <procs>][-S <step>]
dbgen [-v] [-O m] [-s <scale>] [-U <updates>]

Basic Options
===========================
-C <n> -- separate data set into <n> chunks (requires -S, default: 1)
-f     -- force. Overwrite existing files
-h     -- display this message
-q     -- enable QUIET mode
-s <n> -- set Scale Factor (SF) to  <n> (default: 1)
-S <n> -- build the <n>th step of the data/update set (used with -C or -U)
-U <n> -- generate <n> update sets
-v     -- enable VERBOSE mode

Advanced Options
===========================
-b <s> -- load distributions for <s> (default: dists.dss)
-d <n> -- split deletes between <n> files (requires -U)
-i <n> -- split inserts between <n> files (requires -U)
-T c   -- generate cutomers ONLY
-T l   -- generate nation/region ONLY
-T L   -- generate lineitem ONLY
-T n   -- generate nation ONLY
-T o   -- generate orders/lineitem ONLY
-T O   -- generate orders ONLY
-T p   -- generate parts/partsupp ONLY
-T P   -- generate parts ONLY
-T r   -- generate region ONLY
-T s   -- generate suppliers ONLY
-T S   -- generate partsupp ONLY

To generate the SF=1 (1GB), validation database population, use:
       dbgen -vf -s 1

To generate updates for a SF=1 (1GB), use:
       dbgen -v -U 1 -s 1

Generate Your Playgroun Data

Now you are ready to create your playground database. For those of you who are not familiar with TPC-H model, refer to below Relational Model to have an idea of what it looks like.

TPC-H Data Model

Use dbgen to generate a 4G of TPC-H benchmark data. In order to be able to load generated files in parallel by using Oracle External tables, we will be using file split feature of dbgen (Remember that this step might take some time and can be parallelized depending on your CPU & I/O capacity):

[oracle@localhost tpch]$ ./dbgen -s 4 -S 1 -C 8 -v
TPC-H Population Generator (Version 2.12.0)
Copyright Transaction Processing Performance Council 1994 – 2010
Starting to load stage 1 of 8 for suppliers table…/
Preloading text … 100%
done.
Starting to load stage 1 of 8 for customers table…done.
Starting to load stage 1 of 8 for orders/lineitem tables…done.
Starting to load stage 1 of 8 for part/partsupplier tables…done.
Generating data for nation tabledone.
Generating data for region tabledone.

[oracle@localhost tpch]$ ./dbgen -s 4 -S 2 -C 8 -v
TPC-H Population Generator (Version 2.12.0)
Copyright Transaction Processing Performance Council 1994 – 2010
Starting to load stage 2 of 8 for suppliers table…/
Preloading text … 100%
done.
Starting to load stage 2 of 8 for customers table…done.
Starting to load stage 2 of 8 for orders/lineitem tables…done.
Starting to load stage 2 of 8 for part/partsupplier tables…done.
Generating data for nation tableDo you want to overwrite ./nation.tbl ? [Y/N]: y
done.
Generating data for region tableDo you want to overwrite ./region.tbl ? [Y/N]: y
done.

[oracle@localhost tpch]$ ./dbgen -s 4 -S 8 -C 8 -v
TPC-H Population Generator (Version 2.12.0)
Copyright Transaction Processing Performance Council 1994 – 2010
Starting to load stage 8 of 8 for suppliers table…/
Preloading text … 100%
done.
Starting to load stage 8 of 8 for customers table…done.
Starting to load stage 8 of 8 for orders/lineitem tables…done.
Starting to load stage 8 of 8 for part/partsupplier tables…done.
Generating data for nation tableDo you want to overwrite ./nation.tbl ? [Y/N]: y
done.
Generating data for region tableDo you want to overwrite ./region.tbl ? [Y/N]: y
done.

When you are done with all 8 executions you will have *tbl* files in your current working directory. Those are pipe separated files which you will be loading into your database.

DBGEN Options

–s 4 specifies that we are using a scale factor of 4 meaning that we are generating approximately 4GB of benchmark data. –S 1 instructs dbgen to generate first of 8 chunks. –C 8 is the total number of files for each large dataset (excluding nation and region tables). –v is setting the verbosity for dbgen.

DBGEN Output

In total you will see that all *tbl* files will be approximately 4 GB in size.

[oracle@localhost tpch]$ du -ch *.tbl* | tail -1
4.2G       total

A good idea is to compress all those files with gzip so that they will consume minimum disk space and optimize read I/O in case of CPU power abundance.

[oracle@localhost tpch]$ gzip -4 -v *.tbl*
customer.tbl.1:     61.4% -- replaced with customer.tbl.1.gz
customer.tbl.2:     61.6% -- replaced with customer.tbl.2.gz
customer.tbl.3:     61.6% -- replaced with customer.tbl.3.gz
…

[oracle@localhost tpch]$ du -hc *.tbl.* | tail -1
1.3G    total

As you see above I have achieved more than 1:3 compression ratio for all external files on the average.

Create Database Objects & Load External Files

Follow the guide lines for creating permanent and auxiliary database objects.

  1. Create a TPCH user and give necessary grant to it.
  2. Create an Oracle DIRECTORY (TPCH_DIR) object to point the directory that contains your *.tbl*.gz files.
  3. Create the External tables that you will be using as a read source for your load process
    • See that for the sake of simplicity I have used VARCHAR2 column for DATE columns those will be casted to DATE type during load operation.
    • Use inline gzip decompression with PREPROCESSOR option.
    • Create your external tables with proper parallelism to read external files concurrently.
    • Remember to use APPEND hint for INSERT … SELECT statements.
    • Don’t create PK&FK constraints so that you can load in parallel.
  4. Load your data from auxiliary External tables to your permanent heap tables.
  5. Create all necessary constraints on your permanent tables.
  6. Gather a schema statistics to guide CBO correctly.

Start with Auxiliary Loading Objects

We will start by creating TPCH user and TPCH_DIR directory

create tablespace tpch_ts datafile size 100m autoextend on next 100m nologging;

create user tpch identified by tpch temporary tablespace temp default tablespace tpch_ts ;

grant connect,resource to tpch;

grant create any directory to tpch;

conn tpch/tpch

create directory tpch_dir as '/home/oracle/tpch';

create directory zcat_dir as '/bin';

Now create your external tables and ensure that all works perfectly:

set timing on

spool load_object.log

drop table region_ext;
drop table nation_ext;
drop table supplier_ext;
drop table customer_ext;
drop table order_ext;
drop table part_ext;
drop table partsupp_ext;
drop table lineitem_ext;

CREATE TABLE region_ext (r_regionkey  NUMBER(10),
                         r_name varchar2(25),
                         r_comment varchar(152))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
       LOCATION ('region.tbl.gz'))
NOPARALLEL
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE nation_ext (n_nationkey  NUMBER(10),
                         n_name varchar2(25),
                         n_regionkey number(10),
                         n_comment varchar(152))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
       LOCATION ('nation.tbl.gz'))
NOPARALLEL
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE supplier_ext (S_SUPPKEY NUMBER(10),
                           S_NAME VARCHAR2(25),
                           S_ADDRESS VARCHAR2(40),
                           S_NATIONKEY NUMBER(10),
                           S_PHONE VARCHAR2(15),
                           S_ACCTBAL NUMBER,
                           S_COMMENT VARCHAR2(101))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('supplier.tbl.1.gz','supplier.tbl.2.gz','supplier.tbl.3.gz','supplier.tbl.4.gz',
          'supplier.tbl.5.gz','supplier.tbl.6.gz','supplier.tbl.7.gz','supplier.tbl.8.gz') )
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE customer_ext (C_CUSTKEY NUMBER(10),
                           C_NAME VARCHAR2(25),
                           C_ADDRESS VARCHAR2(40),
                           C_NATIONKEY NUMBER(10),
                           C_PHONE VARCHAR2(15),
                           C_ACCTBAL NUMBER,
                           C_MKTSEGMENT VARCHAR2(10),
                           C_COMMENT VARCHAR2(117))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('customer.tbl.1.gz','customer.tbl.2.gz','customer.tbl.3.gz','customer.tbl.4.gz',
          'customer.tbl.5.gz','customer.tbl.6.gz','customer.tbl.7.gz','customer.tbl.8.gz') )
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE order_ext (O_ORDERKEY NUMBER(10),
                        O_CUSTKEY NUMBER(10),
                        O_ORDERSTATUS CHAR(1),
                        O_TOTALPRICE NUMBER,
                        O_ORDERDATE VARCHAR2(10),
                        O_ORDERPRIORITY VARCHAR2(15),
                        O_CLERK VARCHAR2(15),
                        O_SHIPPRIORITY NUMBER(38),
                        O_COMMENT VARCHAR2(79))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('orders.tbl.1.gz','orders.tbl.2.gz','orders.tbl.3.gz','orders.tbl.4.gz',
          'orders.tbl.5.gz','orders.tbl.6.gz','orders.tbl.7.gz','orders.tbl.8.gz'))
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE part_ext (P_PARTKEY NUMBER(10),
                       P_NAME VARCHAR2(55),
                       P_MFGR VARCHAR2(25),
                       P_BRAND VARCHAR2(10),
                       P_TYPE VARCHAR2(25),
                       P_SIZE NUMBER(38),
                       P_CONTAINER VARCHAR2(10),
                       P_RETAILPRICE NUMBER,
                       P_COMMENT VARCHAR2(23))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('part.tbl.1.gz','part.tbl.2.gz','part.tbl.3.gz','part.tbl.4.gz',
          'part.tbl.5.gz','part.tbl.6.gz','part.tbl.7.gz','part.tbl.8.gz') )
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE partsupp_ext (PS_PARTKEY NUMBER(10),
                           PS_SUPPKEY NUMBER(10),
                           PS_AVAILQTY NUMBER(38),
                           PS_SUPPLYCOST NUMBER,
                           PS_COMMENT VARCHAR2(199))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('partsupp.tbl.1.gz','partsupp.tbl.2.gz','partsupp.tbl.3.gz','partsupp.tbl.4.gz',
          'partsupp.tbl.5.gz','partsupp.tbl.6.gz','partsupp.tbl.7.gz','partsupp.tbl.8.gz'))
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE lineitem_ext (L_ORDERKEY  NUMBER(10),
                           L_PARTKEY NUMBER(10),
                           L_SUPPKEY NUMBER(10),
                           L_LINENUMBER  NUMBER(38),
                           L_QUANTITY NUMBER,
                           L_EXTENDEDPRICE   NUMBER,
                           L_DISCOUNT NUMBER,
                           L_TAX  NUMBER,
                           L_RETURNFLAG  CHAR(1),
                           L_LINESTATUS CHAR(1),
                           L_SHIPDATE  VARCHAR2(10),
                           L_COMMITDATE VARCHAR2(10),
                           L_RECEIPTDATE  VARCHAR2(10),
                           L_SHIPINSTRUCT  VARCHAR2(25),
                           L_SHIPMODE VARCHAR2(10),
                           L_COMMENT VARCHAR2(44))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('lineitem.tbl.1.gz','lineitem.tbl.2.gz','lineitem.tbl.3.gz','lineitem.tbl.4.gz',
          'lineitem.tbl.5.gz','lineitem.tbl.6.gz','lineitem.tbl.7.gz','lineitem.tbl.8.gz'))
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

-- Count all
-- 5 rows
select count(*) region_count from region_ext;
-- 25 rows
select count(*) nation_count from nation_ext;
-- 40000 rows
select  count(*) supplier_count from supplier_ext;
-- 600000 rows
select count(*) customer_count from customer_ext;
-- 6000000 rows
select count(*) order_count from order_ext;
-- 800000 rows
select count(*) part_count from part_ext;
-- 3200000 rows
select count(*) partsupp_count from partsupp_ext;
--23996604 rows
select count(*) lineitem_count from lineitem_ext;

spool off

Continue with Persistent Ones

Next step is to create your persistent objects  (heap tables) without PK and FK constraints on them.

set timing on
spool heap_objects.log
DROP TABLE H_CUSTOMER CASCADE CONSTRAINTS ;
DROP TABLE H_LINEITEM CASCADE CONSTRAINTS ;
DROP TABLE H_NATION CASCADE CONSTRAINTS ;
DROP TABLE H_ORDER CASCADE CONSTRAINTS ;
DROP TABLE H_PART CASCADE CONSTRAINTS ;
DROP TABLE H_PARTSUPP CASCADE CONSTRAINTS ;
DROP TABLE H_REGION CASCADE CONSTRAINTS ;
DROP TABLE H_SUPPLIER CASCADE CONSTRAINTS ;

CREATE TABLE H_CUSTOMER (c_custkey NUMBER(10) NOT NULL,
                         c_name VARCHAR2(25) NOT NULL,
                         c_address VARCHAR2(40) NOT NULL,
                         c_nationkey NUMBER(10) NOT NULL ,
                         c_phone VARCHAR2(15) NOT NULL,
                         c_acctbal NUMBER NOT NULL,
                         c_mktsegment VARCHAR2(10) NOT NULL,
                         c_comment VARCHAR2(117) NOT NULL)
PARALLEL 2;

CREATE TABLE H_LINEITEM (l_orderkey NUMBER(10) NOT NULL,
                         l_partkey NUMBER(10) NOT NULL,
                         l_suppkey NUMBER(10) NOT NULL ,
                         l_linenumber INTEGER  NOT NULL ,
                         l_quantity NUMBER NOT NULL,
                         l_extendedprice NUMBER NOT NULL,
                         l_discount NUMBER NOT NULL,
                         l_tax NUMBER NOT NULL,
                         l_returnflag CHAR(1) NOT NULL ,
                         l_linestatus CHAR(1) NOT NULL,
                         l_shipdate DATE NOT NULL,
                         l_commitdate DATE NOT NULL,
                         l_receiptdate DATE NOT NULL,
                         l_shipinstruct VARCHAR2(25) NOT NULL,
                         l_shipmode VARCHAR2(10) NOT NULL,
                         l_comment VARCHAR2(44) NOT NULL)
PARALLEL 2;

CREATE TABLE H_NATION (n_nationkey NUMBER(10) NOT NULL,
                       n_name VARCHAR2(25) NOT NULL,
                       n_regionkey NUMBER (10) NOT NULL,
                       n_comment VARCHAR2 (152) NOT NULL)
NOPARALLEL;

CREATE TABLE H_ORDER (o_orderkey NUMBER (10)  NOT NULL,
                      o_custkey NUMBER(10)  NOT NULL,
                      o_orderstatus CHAR(1) NOT NULL,
                      o_totalprice NUMBER NOT NULL,
                      o_orderdate DATE NOT NULL,
                      o_orderpriority VARCHAR2(15) NOT NULL,
                      o_clerk VARCHAR2(15) NOT NULL,
                      o_shippriority INTEGER NOT NULL,
                      o_comment VARCHAR2(79) NOT NULL)
PARALLEL 2;

CREATE TABLE H_PART (p_partkey NUMBER(10)  NOT NULL,
                     p_name VARCHAR2(55) NOT NULL,
                     p_mfgr VARCHAR2(25) NOT NULL,
                     p_brand VARCHAR2(10) NOT NULL,
                     p_type VARCHAR2(25) NOT NULL,
                     p_size INTEGER NOT NULL,
                     p_container VARCHAR2(10) NOT NULL,
                     p_retailprice NUMBER NOT NULL,
                     p_comment VARCHAR2(23) NOT NULL)
PARALLEL 2;
CREATE TABLE H_PARTSUPP (ps_partkey NUMBER (10)  NOT NULL ,
                         ps_suppkey NUMBER (10)  NOT NULL ,
                         ps_availqty INTEGER NOT NULL,
                         ps_supplycost NUMBER NOT NULL,
                         ps_comment VARCHAR2 (199) NOT NULL)
PARALLEL 2;

CREATE TABLE H_REGION (r_regionkey NUMBER (10)  NOT NULL ,
                       r_name VARCHAR2 (25) NOT NULL,
                       r_comment VARCHAR2 (152) NOT NULL)
NOPARALLEL;

CREATE TABLE H_SUPPLIER (s_suppkey NUMBER (10)  NOT NULL ,
                         s_name VARCHAR2 (25) NOT NULL,
                         s_address VARCHAR2 (40) NOT NULL,
                         s_nationkey NUMBER (10)  NOT NULL ,
                         s_phone VARCHAR2 (15) NOT NULL,
                         s_acctbal NUMBER NOT NULL,
                         s_comment VARCHAR2 (101) NOT NULL)
PARALLEL 2;

spool off

Load your Data using INSERT … SELECT

Now load your data using INSERT … SELECT statements in parallel from external tables to your heap tables

set timing on
spool load_data.log

truncate table h_lineitem;
truncate table h_order;
truncate table h_part;
truncate table h_customer;
truncate table h_nation;
truncate table h_region;
truncate table h_partsupp;
truncate table h_supplier;
alter session enable parallel dml;
insert /*+append*/into h_lineitem
select L_ORDERKEY,
       L_PARTKEY,
       L_SUPPKEY,
       L_LINENUMBER,
       L_QUANTITY,
       L_EXTENDEDPRICE,
       L_DISCOUNT,
       L_TAX,
       L_RETURNFLAG,
       L_LINESTATUS,
       to_date(L_SHIPDATE, 'YYYY-MM-DD'),
       to_date(L_COMMITDATE, 'YYYY-MM-DD'),
       to_date(L_RECEIPTDATE, 'YYYY-MM-DD'),
       L_SHIPINSTRUCT,
       L_SHIPMODE,
       L_COMMENT
from lineitem_ext;
insert /*+append*/ into h_partsupp  select * from partsupp_ext;
insert /*+append*/ into h_part  select * from part_ext;
insert /*+append*/ into h_order
select o_orderkey,
       o_custkey,
       o_orderstatus,
       o_totalprice,
       to_date(o_orderdate, 'YYYY-MM-DD'),
       O_ORDERPRIORITY,
       o_clerk,
       O_SHIPPRIORITY,
       o_comment
from order_ext;
insert /*+append*/ into h_customer  select * from customer_ext;
insert /*+append*/ into h_supplier  select * from supplier_ext;
insert  /*+append*/ into h_nation  select * from nation_ext;
insert /*+append*/ into h_region  select * from region_ext;
commit;
spool off

Build you Constraints

Next step is to build your Primary & Foreign Key contraints

set timing on
spool constraints.log
-- PK Constraints
ALTER TABLE H_REGION ADD CONSTRAINT REGION_PK PRIMARY KEY (r_regionkey);
ALTER TABLE H_NATION ADD CONSTRAINT NATION_PK PRIMARY KEY (n_nationkey);
ALTER TABLE H_SUPPLIER ADD CONSTRAINT SUPPLIER_PK PRIMARY KEY (s_suppkey);

create unique index partsupp_pk on h_partsupp(ps_partkey,ps_suppkey) parallel 2;
ALTER TABLE H_PARTSUPP ADD CONSTRAINT PARTSUPP_PK PRIMARY KEY(ps_partkey,ps_suppkey) using index PARTSUPP_PK;

create unique index PART_PK on H_PART(p_partkey) parallel 2;
ALTER TABLE H_PART ADD CONSTRAINT PART_PK PRIMARY KEY (p_partkey) using index PART_PK;

create unique index ORDERS_PK on H_ORDER(o_orderkey) parallel 2;
ALTER TABLE H_ORDER ADD CONSTRAINT ORDERS_PK PRIMARY KEY (o_orderkey) using index ORDERS_PK;

create unique index LINEITEM_PK on H_LINEITEM(l_linenumber, l_orderkey) parallel 2;
ALTER TABLE H_LINEITEM ADD CONSTRAINT LINEITEM_PK PRIMARY KEY (l_linenumber, l_orderkey)  using index LINEITEM_PK;

create unique index CUSTOMER_PK on H_CUSTOMER(c_custkey) parallel 2;
ALTER TABLE H_CUSTOMER ADD CONSTRAINT CUSTOMER_PK PRIMARY KEY (c_custkey) using index CUSTOMER_PK;

-- FK Constraints
ALTER TABLE H_LINEITEM
ADD CONSTRAINT LINEITEM_PARTSUPP_FK FOREIGN KEY (l_partkey, l_suppkey)
REFERENCES H_PARTSUPP(ps_partkey, ps_suppkey) NOT DEFERRABLE;

ALTER TABLE H_ORDER
ADD CONSTRAINT ORDER_CUSTOMER_FK FOREIGN KEY (o_custkey)
REFERENCES H_CUSTOMER (c_custkey) NOT DEFERRABLE;

ALTER TABLE H_PARTSUPP
ADD CONSTRAINT PARTSUPP_PART_FK FOREIGN KEY (ps_partkey)
REFERENCES H_PART (p_partkey) NOT DEFERRABLE;

ALTER TABLE H_PARTSUPP
ADD CONSTRAINT PARTSUPP_SUPPLIER_FK FOREIGN KEY (ps_suppkey)
REFERENCES H_SUPPLIER (s_suppkey) NOT DEFERRABLE;

ALTER TABLE H_SUPPLIER
ADD CONSTRAINT SUPPLIER_NATION_FK FOREIGN KEY (s_nationkey)
REFERENCES H_NATION (n_nationkey) NOT DEFERRABLE;

ALTER TABLE H_CUSTOMER
ADD CONSTRAINT CUSTOMER_NATION_FK FOREIGN KEY (c_nationkey)
REFERENCES H_NATION (n_nationkey) NOT DEFERRABLE;

ALTER TABLE H_NATION
ADD CONSTRAINT NATION_REGION_FK FOREIGN KEY (n_regionkey)
REFERENCES H_REGION (r_regionkey) NOT DEFERRABLE;

ALTER TABLE H_LINEITEM
ADD CONSTRAINT LINEITEM_ORDER_FK FOREIGN KEY (l_orderkey)
REFERENCES H_ORDER (o_orderkey) NOT DEFERRABLE;

spool off

Gather Statistics

Final step is to gather CBO statistics so that you will have, hopefully, better execution plans

begin
    dbms_stats.gather_schema_stats(ownname => 'TPCH',
                                   degree  => 2,
                                   cascade =>  true);
end;
/

Conclusion

Now you are done you. You can either execute infamous TPC-H queries (you can find in TPC-H documentation) or your own test cases.

Exadata v2 Fast Track Session Slides in RAC SIG Turkey

You can find my Exadata v2 Fast track slides for my 1 hour session in Oracle RAC SIG on last Saturday
Oracle Exadata v2 Fast Track

My Open World 2010 Session

Hi there,

You can find my joint Solidify RMAN Backup with Oracle Open Storage 7000 presentation with my esteemed friend Orhan Bıyıklıoglu held yesterday in Open World 2010.

Oracle Sweden User Group (ORCAN) Event

I have been in Sweden (in a japanese spa hotel near Stockholm City) between Monday and Wednesday to join ORCAN event. Thanks to Patrik Norlander and his friends, the event was really perfect. I had two presentations and joined presentations of other ACEs and experts.

I spent my time in talking with Dan Morgan on a possible Turkey Oracle User Group Event, with Jose Senegacnik on Oracle and planes, with Dimitri Gielis whether APEX 4.0 is sufficiently mature to grow large scale applicaitons, and finally with Luca Canali about recent Oracle Streams projects in CERN.

Thanks guys,

it was a perfect time for me

My Presentations

Book Review: Oracle 11g Streams Implementer’s Guide

After attending the CDC implementation session of CERN team in UKOUG 2009, new features of Oracle Streams technology introduced with 11g got my attention. While searching for a suitable resource, I came across with this extremely helpful resource.

The problem about many Oracle books is that they either paraphrase Tahiti (or Oracle My Support notes although it is illegal) or they are built on some pseudo examples generated just to create a problem to find a solution.This book is definitely an exception and it is not for my book-shelf but for my briefcase.

Thanks to Ann L. R. McKinnell and Eric Yen start with a few warm up chapter (Chapter 1) explaining the underlying concepts of streaming idea and its proper usage, Oracle CDC components, and a brief introduction to XStreams which will be detailed in Chapter 6.

Chapter 2 is for database architects who are responsible with designing the replication system such that it will work smoothly for their business. There is an invaluable check list  including almost everything that should be taken into consideration before starting.

Chapter 3 is a kind of implementation chapter of Chapter 2. In order to satisfy the checklist given in previous chapter, this chapter defines the necessary configuration details.

Chapter 4, Chapter 5, and Chapter 6 explains different ways of replication in detail. Keep in mind that you can read Chapter 5 online.

Chapter 7 and Chapter 8 are my favorite ones and I believe those are the reasons why this book is an excellent reference for all implementors. Chapter 7 is explaining the importance of documentation in a replication environment and explain how you can automatically generate your environment map and how you can gather performance data with Oracle utility packages. Chapter 8 is all about troubleshooting in Oracle Streams environment. I think this is the most important part because people keep on changing what you have implemented. The methodology and toolkit to track,diagnose, and solve a problem in your streaming environment is put very clearly in two sub-chapters and 13 bulletins.

To sum up, Oracle 11g Streams Implementor’s Guide is a really niche reference for not only those try to implement an Oracle CDC environment but also  wish to understand essence of replication concepts (almost all are the same with slight changes in terminology and the way they have been implemented).

Follow

Get every new post delivered to your Inbox.

Join 172 other followers