Create Your Own Oracle TPC-H Playground on Linux


I believe sometimes all of us suffer from the limitations of playing with Oracle’s SH, SCOTT, etc. schemas to generate a sufficiently large playground for our tests. In this post you will find how to create your own TPC-H playground database on Linux.

Download TPC-H Data Generator (dbgen)

TPC as being the council for TPC-H benchmarks delivers a standardized data generation tool for all benchmarks. You can download this tool from http://www.tpc.org/tpch/default.asp (The version I will be using can be downloaded from http://www.tpc.org/tpch/spec/tpch_2_12_0_b5.zip). This bundle contains a bunch of C files to be compiled to form dbgen. Copy the zip file into one of your folders and ensure that your Linux environment has the necessary toolkit to compile C language (gcc, make, etc.)

[oracle@localhost ~]$ mkdir tpch
[oracle@localhost ~]$ mv tpch_2_12_0_b5.zip ./tpch
[oracle@localhost ~]$ cd tpch/
[oracle@localhost tpch]$ unzip tpch_2_12_0_b5.zip
	Archive: tpch_2_12_0_b5.zip
	inflating: build.c
	inflating: driver.c
	inflating: bm_utils.c
	inflating: rnd.c
...
[oracle@localhost tpch]$

Build DBGEN

Next thing you should do is to cp makefile.suite file in tpch directory and  change some parameters in copied file.

  • CC
  • DATABASE
  • MACHINE
  • WORKLOAD
[oracle@localhost tpch]$ cp makefile.suite makefile

[oracle@localhost tpch]$ vi make
...
################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC      = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
#                                  SQLSERVER, SYBASE, ORACLE
# Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
#                                  SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are:  TPCH
DATABASE= ORACLE
MACHINE = LINUX
WORKLOAD = TPCH
...

Now run make command

[oracle@localhost tpch]$ make
chmod 755 update_release.sh
./update_release.sh 2 12 0
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o build.o build.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o driver.o driver.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bm_utils.o bm_utils.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rnd.o rnd.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o print.o print.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o load_stub.o load_stub.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bcd2.o bcd2.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o speed_seed.o speed_seed.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o text.o text.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o permute.o permute.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rng64.o rng64.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o dbgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o -lm
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o qgen.o qgen.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o varsub.o varsub.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o qgen build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o speed_seed.o rng64.o -lm

After a successful compilation you should have dbgen executable in your current working director:

[oracle@localhost tpch]$ ./dbgen -h
TPC-H Population Generator (Version 2.12.0 build 5)
Copyright Transaction Processing Performance Council 1994 - 2010
USAGE:
dbgen [-{vf}][-T {pcsoPSOL}]
[-s <scale>][-C <procs>][-S <step>]
dbgen [-v] [-O m] [-s <scale>] [-U <updates>]

Basic Options
===========================
-C <n> -- separate data set into <n> chunks (requires -S, default: 1)
-f     -- force. Overwrite existing files
-h     -- display this message
-q     -- enable QUIET mode
-s <n> -- set Scale Factor (SF) to  <n> (default: 1)
-S <n> -- build the <n>th step of the data/update set (used with -C or -U)
-U <n> -- generate <n> update sets
-v     -- enable VERBOSE mode

Advanced Options
===========================
-b <s> -- load distributions for <s> (default: dists.dss)
-d <n> -- split deletes between <n> files (requires -U)
-i <n> -- split inserts between <n> files (requires -U)
-T c   -- generate cutomers ONLY
-T l   -- generate nation/region ONLY
-T L   -- generate lineitem ONLY
-T n   -- generate nation ONLY
-T o   -- generate orders/lineitem ONLY
-T O   -- generate orders ONLY
-T p   -- generate parts/partsupp ONLY
-T P   -- generate parts ONLY
-T r   -- generate region ONLY
-T s   -- generate suppliers ONLY
-T S   -- generate partsupp ONLY

To generate the SF=1 (1GB), validation database population, use:
       dbgen -vf -s 1

To generate updates for a SF=1 (1GB), use:
       dbgen -v -U 1 -s 1

Generate Your Playgroun Data

Now you are ready to create your playground database. For those of you who are not familiar with TPC-H model, refer to below Relational Model to have an idea of what it looks like.

TPC-H Data Model

Use dbgen to generate a 4G of TPC-H benchmark data. In order to be able to load generated files in parallel by using Oracle External tables, we will be using file split feature of dbgen (Remember that this step might take some time and can be parallelized depending on your CPU & I/O capacity):

[oracle@localhost tpch]$ ./dbgen -s 4 -S 1 -C 8 -v
TPC-H Population Generator (Version 2.12.0)
Copyright Transaction Processing Performance Council 1994 – 2010
Starting to load stage 1 of 8 for suppliers table…/
Preloading text … 100%
done.
Starting to load stage 1 of 8 for customers table…done.
Starting to load stage 1 of 8 for orders/lineitem tables…done.
Starting to load stage 1 of 8 for part/partsupplier tables…done.
Generating data for nation tabledone.
Generating data for region tabledone.

[oracle@localhost tpch]$ ./dbgen -s 4 -S 2 -C 8 -v
TPC-H Population Generator (Version 2.12.0)
Copyright Transaction Processing Performance Council 1994 – 2010
Starting to load stage 2 of 8 for suppliers table…/
Preloading text … 100%
done.
Starting to load stage 2 of 8 for customers table…done.
Starting to load stage 2 of 8 for orders/lineitem tables…done.
Starting to load stage 2 of 8 for part/partsupplier tables…done.
Generating data for nation tableDo you want to overwrite ./nation.tbl ? [Y/N]: y
done.
Generating data for region tableDo you want to overwrite ./region.tbl ? [Y/N]: y
done.

[oracle@localhost tpch]$ ./dbgen -s 4 -S 8 -C 8 -v
TPC-H Population Generator (Version 2.12.0)
Copyright Transaction Processing Performance Council 1994 – 2010
Starting to load stage 8 of 8 for suppliers table…/
Preloading text … 100%
done.
Starting to load stage 8 of 8 for customers table…done.
Starting to load stage 8 of 8 for orders/lineitem tables…done.
Starting to load stage 8 of 8 for part/partsupplier tables…done.
Generating data for nation tableDo you want to overwrite ./nation.tbl ? [Y/N]: y
done.
Generating data for region tableDo you want to overwrite ./region.tbl ? [Y/N]: y
done.

When you are done with all 8 executions you will have *tbl* files in your current working directory. Those are pipe separated files which you will be loading into your database.

DBGEN Options

–s 4 specifies that we are using a scale factor of 4 meaning that we are generating approximately 4GB of benchmark data. –S 1 instructs dbgen to generate first of 8 chunks. –C 8 is the total number of files for each large dataset (excluding nation and region tables). –v is setting the verbosity for dbgen.

DBGEN Output

In total you will see that all *tbl* files will be approximately 4 GB in size.

[oracle@localhost tpch]$ du -ch *.tbl* | tail -1
4.2G       total

A good idea is to compress all those files with gzip so that they will consume minimum disk space and optimize read I/O in case of CPU power abundance.

[oracle@localhost tpch]$ gzip -4 -v *.tbl*
customer.tbl.1:     61.4% -- replaced with customer.tbl.1.gz
customer.tbl.2:     61.6% -- replaced with customer.tbl.2.gz
customer.tbl.3:     61.6% -- replaced with customer.tbl.3.gz
…

[oracle@localhost tpch]$ du -hc *.tbl.* | tail -1
1.3G    total

As you see above I have achieved more than 1:3 compression ratio for all external files on the average.

Create Database Objects & Load External Files

Follow the guide lines for creating permanent and auxiliary database objects.

  1. Create a TPCH user and give necessary grant to it.
  2. Create an Oracle DIRECTORY (TPCH_DIR) object to point the directory that contains your *.tbl*.gz files.
  3. Create the External tables that you will be using as a read source for your load process
    • See that for the sake of simplicity I have used VARCHAR2 column for DATE columns those will be casted to DATE type during load operation.
    • Use inline gzip decompression with PREPROCESSOR option.
    • Create your external tables with proper parallelism to read external files concurrently.
    • Remember to use APPEND hint for INSERT … SELECT statements.
    • Don’t create PK&FK constraints so that you can load in parallel.
  4. Load your data from auxiliary External tables to your permanent heap tables.
  5. Create all necessary constraints on your permanent tables.
  6. Gather a schema statistics to guide CBO correctly.

Start with Auxiliary Loading Objects

We will start by creating TPCH user and TPCH_DIR directory

create tablespace tpch_ts datafile size 100m autoextend on next 100m nologging;

create user tpch identified by tpch temporary tablespace temp default tablespace tpch_ts ;

grant connect,resource to tpch;

grant create any directory to tpch;

conn tpch/tpch

create directory tpch_dir as '/home/oracle/tpch';

create directory zcat_dir as '/bin';

Now create your external tables and ensure that all works perfectly:

set timing on

spool load_object.log

drop table region_ext;
drop table nation_ext;
drop table supplier_ext;
drop table customer_ext;
drop table order_ext;
drop table part_ext;
drop table partsupp_ext;
drop table lineitem_ext;

CREATE TABLE region_ext (r_regionkey  NUMBER(10),
                         r_name varchar2(25),
                         r_comment varchar(152))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
       LOCATION ('region.tbl.gz'))
NOPARALLEL
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE nation_ext (n_nationkey  NUMBER(10),
                         n_name varchar2(25),
                         n_regionkey number(10),
                         n_comment varchar(152))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
       LOCATION ('nation.tbl.gz'))
NOPARALLEL
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE supplier_ext (S_SUPPKEY NUMBER(10),
                           S_NAME VARCHAR2(25),
                           S_ADDRESS VARCHAR2(40),
                           S_NATIONKEY NUMBER(10),
                           S_PHONE VARCHAR2(15),
                           S_ACCTBAL NUMBER,
                           S_COMMENT VARCHAR2(101))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('supplier.tbl.1.gz','supplier.tbl.2.gz','supplier.tbl.3.gz','supplier.tbl.4.gz',
          'supplier.tbl.5.gz','supplier.tbl.6.gz','supplier.tbl.7.gz','supplier.tbl.8.gz') )
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE customer_ext (C_CUSTKEY NUMBER(10),
                           C_NAME VARCHAR2(25),
                           C_ADDRESS VARCHAR2(40),
                           C_NATIONKEY NUMBER(10),
                           C_PHONE VARCHAR2(15),
                           C_ACCTBAL NUMBER,
                           C_MKTSEGMENT VARCHAR2(10),
                           C_COMMENT VARCHAR2(117))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('customer.tbl.1.gz','customer.tbl.2.gz','customer.tbl.3.gz','customer.tbl.4.gz',
          'customer.tbl.5.gz','customer.tbl.6.gz','customer.tbl.7.gz','customer.tbl.8.gz') )
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE order_ext (O_ORDERKEY NUMBER(10),
                        O_CUSTKEY NUMBER(10),
                        O_ORDERSTATUS CHAR(1),
                        O_TOTALPRICE NUMBER,
                        O_ORDERDATE VARCHAR2(10),
                        O_ORDERPRIORITY VARCHAR2(15),
                        O_CLERK VARCHAR2(15),
                        O_SHIPPRIORITY NUMBER(38),
                        O_COMMENT VARCHAR2(79))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('orders.tbl.1.gz','orders.tbl.2.gz','orders.tbl.3.gz','orders.tbl.4.gz',
          'orders.tbl.5.gz','orders.tbl.6.gz','orders.tbl.7.gz','orders.tbl.8.gz'))
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE part_ext (P_PARTKEY NUMBER(10),
                       P_NAME VARCHAR2(55),
                       P_MFGR VARCHAR2(25),
                       P_BRAND VARCHAR2(10),
                       P_TYPE VARCHAR2(25),
                       P_SIZE NUMBER(38),
                       P_CONTAINER VARCHAR2(10),
                       P_RETAILPRICE NUMBER,
                       P_COMMENT VARCHAR2(23))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('part.tbl.1.gz','part.tbl.2.gz','part.tbl.3.gz','part.tbl.4.gz',
          'part.tbl.5.gz','part.tbl.6.gz','part.tbl.7.gz','part.tbl.8.gz') )
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE partsupp_ext (PS_PARTKEY NUMBER(10),
                           PS_SUPPKEY NUMBER(10),
                           PS_AVAILQTY NUMBER(38),
                           PS_SUPPLYCOST NUMBER,
                           PS_COMMENT VARCHAR2(199))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('partsupp.tbl.1.gz','partsupp.tbl.2.gz','partsupp.tbl.3.gz','partsupp.tbl.4.gz',
          'partsupp.tbl.5.gz','partsupp.tbl.6.gz','partsupp.tbl.7.gz','partsupp.tbl.8.gz'))
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE lineitem_ext (L_ORDERKEY  NUMBER(10),
                           L_PARTKEY NUMBER(10),
                           L_SUPPKEY NUMBER(10),
                           L_LINENUMBER  NUMBER(38),
                           L_QUANTITY NUMBER,
                           L_EXTENDEDPRICE   NUMBER,
                           L_DISCOUNT NUMBER,
                           L_TAX  NUMBER,
                           L_RETURNFLAG  CHAR(1),
                           L_LINESTATUS CHAR(1),
                           L_SHIPDATE  VARCHAR2(10),
                           L_COMMITDATE VARCHAR2(10),
                           L_RECEIPTDATE  VARCHAR2(10),
                           L_SHIPINSTRUCT  VARCHAR2(25),
                           L_SHIPMODE VARCHAR2(10),
                           L_COMMENT VARCHAR2(44))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('lineitem.tbl.1.gz','lineitem.tbl.2.gz','lineitem.tbl.3.gz','lineitem.tbl.4.gz',
          'lineitem.tbl.5.gz','lineitem.tbl.6.gz','lineitem.tbl.7.gz','lineitem.tbl.8.gz'))
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

-- Count all
-- 5 rows
select count(*) region_count from region_ext;
-- 25 rows
select count(*) nation_count from nation_ext;
-- 40000 rows
select  count(*) supplier_count from supplier_ext;
-- 600000 rows
select count(*) customer_count from customer_ext;
-- 6000000 rows
select count(*) order_count from order_ext;
-- 800000 rows
select count(*) part_count from part_ext;
-- 3200000 rows
select count(*) partsupp_count from partsupp_ext;
--23996604 rows
select count(*) lineitem_count from lineitem_ext;

spool off

Continue with Persistent Ones

Next step is to create your persistent objects  (heap tables) without PK and FK constraints on them.

set timing on
spool heap_objects.log
DROP TABLE H_CUSTOMER CASCADE CONSTRAINTS ;
DROP TABLE H_LINEITEM CASCADE CONSTRAINTS ;
DROP TABLE H_NATION CASCADE CONSTRAINTS ;
DROP TABLE H_ORDER CASCADE CONSTRAINTS ;
DROP TABLE H_PART CASCADE CONSTRAINTS ;
DROP TABLE H_PARTSUPP CASCADE CONSTRAINTS ;
DROP TABLE H_REGION CASCADE CONSTRAINTS ;
DROP TABLE H_SUPPLIER CASCADE CONSTRAINTS ;

CREATE TABLE H_CUSTOMER (c_custkey NUMBER(10) NOT NULL,
                         c_name VARCHAR2(25) NOT NULL,
                         c_address VARCHAR2(40) NOT NULL,
                         c_nationkey NUMBER(10) NOT NULL ,
                         c_phone VARCHAR2(15) NOT NULL,
                         c_acctbal NUMBER NOT NULL,
                         c_mktsegment VARCHAR2(10) NOT NULL,
                         c_comment VARCHAR2(117) NOT NULL)
PARALLEL 2;

CREATE TABLE H_LINEITEM (l_orderkey NUMBER(10) NOT NULL,
                         l_partkey NUMBER(10) NOT NULL,
                         l_suppkey NUMBER(10) NOT NULL ,
                         l_linenumber INTEGER  NOT NULL ,
                         l_quantity NUMBER NOT NULL,
                         l_extendedprice NUMBER NOT NULL,
                         l_discount NUMBER NOT NULL,
                         l_tax NUMBER NOT NULL,
                         l_returnflag CHAR(1) NOT NULL ,
                         l_linestatus CHAR(1) NOT NULL,
                         l_shipdate DATE NOT NULL,
                         l_commitdate DATE NOT NULL,
                         l_receiptdate DATE NOT NULL,
                         l_shipinstruct VARCHAR2(25) NOT NULL,
                         l_shipmode VARCHAR2(10) NOT NULL,
                         l_comment VARCHAR2(44) NOT NULL)
PARALLEL 2;

CREATE TABLE H_NATION (n_nationkey NUMBER(10) NOT NULL,
                       n_name VARCHAR2(25) NOT NULL,
                       n_regionkey NUMBER (10) NOT NULL,
                       n_comment VARCHAR2 (152) NOT NULL)
NOPARALLEL;

CREATE TABLE H_ORDER (o_orderkey NUMBER (10)  NOT NULL,
                      o_custkey NUMBER(10)  NOT NULL,
                      o_orderstatus CHAR(1) NOT NULL,
                      o_totalprice NUMBER NOT NULL,
                      o_orderdate DATE NOT NULL,
                      o_orderpriority VARCHAR2(15) NOT NULL,
                      o_clerk VARCHAR2(15) NOT NULL,
                      o_shippriority INTEGER NOT NULL,
                      o_comment VARCHAR2(79) NOT NULL)
PARALLEL 2;

CREATE TABLE H_PART (p_partkey NUMBER(10)  NOT NULL,
                     p_name VARCHAR2(55) NOT NULL,
                     p_mfgr VARCHAR2(25) NOT NULL,
                     p_brand VARCHAR2(10) NOT NULL,
                     p_type VARCHAR2(25) NOT NULL,
                     p_size INTEGER NOT NULL,
                     p_container VARCHAR2(10) NOT NULL,
                     p_retailprice NUMBER NOT NULL,
                     p_comment VARCHAR2(23) NOT NULL)
PARALLEL 2;
CREATE TABLE H_PARTSUPP (ps_partkey NUMBER (10)  NOT NULL ,
                         ps_suppkey NUMBER (10)  NOT NULL ,
                         ps_availqty INTEGER NOT NULL,
                         ps_supplycost NUMBER NOT NULL,
                         ps_comment VARCHAR2 (199) NOT NULL)
PARALLEL 2;

CREATE TABLE H_REGION (r_regionkey NUMBER (10)  NOT NULL ,
                       r_name VARCHAR2 (25) NOT NULL,
                       r_comment VARCHAR2 (152) NOT NULL)
NOPARALLEL;

CREATE TABLE H_SUPPLIER (s_suppkey NUMBER (10)  NOT NULL ,
                         s_name VARCHAR2 (25) NOT NULL,
                         s_address VARCHAR2 (40) NOT NULL,
                         s_nationkey NUMBER (10)  NOT NULL ,
                         s_phone VARCHAR2 (15) NOT NULL,
                         s_acctbal NUMBER NOT NULL,
                         s_comment VARCHAR2 (101) NOT NULL)
PARALLEL 2;

spool off

Load your Data using INSERT … SELECT

Now load your data using INSERT … SELECT statements in parallel from external tables to your heap tables

set timing on
spool load_data.log

truncate table h_lineitem;
truncate table h_order;
truncate table h_part;
truncate table h_customer;
truncate table h_nation;
truncate table h_region;
truncate table h_partsupp;
truncate table h_supplier;
alter session enable parallel dml;
insert /*+append*/into h_lineitem
select L_ORDERKEY,
       L_PARTKEY,
       L_SUPPKEY,
       L_LINENUMBER,
       L_QUANTITY,
       L_EXTENDEDPRICE,
       L_DISCOUNT,
       L_TAX,
       L_RETURNFLAG,
       L_LINESTATUS,
       to_date(L_SHIPDATE, 'YYYY-MM-DD'),
       to_date(L_COMMITDATE, 'YYYY-MM-DD'),
       to_date(L_RECEIPTDATE, 'YYYY-MM-DD'),
       L_SHIPINSTRUCT,
       L_SHIPMODE,
       L_COMMENT
from lineitem_ext;
insert /*+append*/ into h_partsupp  select * from partsupp_ext;
insert /*+append*/ into h_part  select * from part_ext;
insert /*+append*/ into h_order
select o_orderkey,
       o_custkey,
       o_orderstatus,
       o_totalprice,
       to_date(o_orderdate, 'YYYY-MM-DD'),
       O_ORDERPRIORITY,
       o_clerk,
       O_SHIPPRIORITY,
       o_comment
from order_ext;
insert /*+append*/ into h_customer  select * from customer_ext;
insert /*+append*/ into h_supplier  select * from supplier_ext;
insert  /*+append*/ into h_nation  select * from nation_ext;
insert /*+append*/ into h_region  select * from region_ext;
commit;
spool off

Build you Constraints

Next step is to build your Primary & Foreign Key contraints

set timing on
spool constraints.log
-- PK Constraints
ALTER TABLE H_REGION ADD CONSTRAINT REGION_PK PRIMARY KEY (r_regionkey);
ALTER TABLE H_NATION ADD CONSTRAINT NATION_PK PRIMARY KEY (n_nationkey);
ALTER TABLE H_SUPPLIER ADD CONSTRAINT SUPPLIER_PK PRIMARY KEY (s_suppkey);

create unique index partsupp_pk on h_partsupp(ps_partkey,ps_suppkey) parallel 2;
ALTER TABLE H_PARTSUPP ADD CONSTRAINT PARTSUPP_PK PRIMARY KEY(ps_partkey,ps_suppkey) using index PARTSUPP_PK;

create unique index PART_PK on H_PART(p_partkey) parallel 2;
ALTER TABLE H_PART ADD CONSTRAINT PART_PK PRIMARY KEY (p_partkey) using index PART_PK;

create unique index ORDERS_PK on H_ORDER(o_orderkey) parallel 2;
ALTER TABLE H_ORDER ADD CONSTRAINT ORDERS_PK PRIMARY KEY (o_orderkey) using index ORDERS_PK;

create unique index LINEITEM_PK on H_LINEITEM(l_linenumber, l_orderkey) parallel 2;
ALTER TABLE H_LINEITEM ADD CONSTRAINT LINEITEM_PK PRIMARY KEY (l_linenumber, l_orderkey)  using index LINEITEM_PK;

create unique index CUSTOMER_PK on H_CUSTOMER(c_custkey) parallel 2;
ALTER TABLE H_CUSTOMER ADD CONSTRAINT CUSTOMER_PK PRIMARY KEY (c_custkey) using index CUSTOMER_PK;

-- FK Constraints
ALTER TABLE H_LINEITEM
ADD CONSTRAINT LINEITEM_PARTSUPP_FK FOREIGN KEY (l_partkey, l_suppkey)
REFERENCES H_PARTSUPP(ps_partkey, ps_suppkey) NOT DEFERRABLE;

ALTER TABLE H_ORDER
ADD CONSTRAINT ORDER_CUSTOMER_FK FOREIGN KEY (o_custkey)
REFERENCES H_CUSTOMER (c_custkey) NOT DEFERRABLE;

ALTER TABLE H_PARTSUPP
ADD CONSTRAINT PARTSUPP_PART_FK FOREIGN KEY (ps_partkey)
REFERENCES H_PART (p_partkey) NOT DEFERRABLE;

ALTER TABLE H_PARTSUPP
ADD CONSTRAINT PARTSUPP_SUPPLIER_FK FOREIGN KEY (ps_suppkey)
REFERENCES H_SUPPLIER (s_suppkey) NOT DEFERRABLE;

ALTER TABLE H_SUPPLIER
ADD CONSTRAINT SUPPLIER_NATION_FK FOREIGN KEY (s_nationkey)
REFERENCES H_NATION (n_nationkey) NOT DEFERRABLE;

ALTER TABLE H_CUSTOMER
ADD CONSTRAINT CUSTOMER_NATION_FK FOREIGN KEY (c_nationkey)
REFERENCES H_NATION (n_nationkey) NOT DEFERRABLE;

ALTER TABLE H_NATION
ADD CONSTRAINT NATION_REGION_FK FOREIGN KEY (n_regionkey)
REFERENCES H_REGION (r_regionkey) NOT DEFERRABLE;

ALTER TABLE H_LINEITEM
ADD CONSTRAINT LINEITEM_ORDER_FK FOREIGN KEY (l_orderkey)
REFERENCES H_ORDER (o_orderkey) NOT DEFERRABLE;

spool off

Gather Statistics

Final step is to gather CBO statistics so that you will have, hopefully, better execution plans

begin
    dbms_stats.gather_schema_stats(ownname => 'TPCH',
                                   degree  => 2,
                                   cascade =>  true);
end;
/

Conclusion

Now you are done you. You can either execute infamous TPC-H queries (you can find in TPC-H documentation) or your own test cases.

About kocakahin

Just a computer engineer

Posted on October 22, 2010, in Oracle and tagged , , . Bookmark the permalink. 9 Comments.

  1. Husnu herzamanki gibi inanılmaz faydalı bir kaynak olmus. eline saglik

  2. Nice!!!
    Congratulations by the article.

    Do you know if the same can be applied for TPC-E and TPC-C ?

    Thanks,
    Daniel

  3. Grand!

    Is it possible to get a Windows version of this generator?

    Regards,
    Tunde

    • set the MACHINE option to be WIN32 in makefile so that it will perform Windows specific changes during compile. Then download your favorite C compiler on Windows (for exampple, Miracle) and rest is the same process you will follow.

  4. Marie Hoffmann

    Hi,
    I’m using 11.2g XE where each user can be assigned to only one instance (might not be relevant). I followed the first steps and got hung on > “conn tpch/tpch” which produces the errors “ORA-01034: ORACLE not available” and “ORA-27101: shared memory realm does not exist”. ORACLE_HOME is currently set to “/u01/app/oracle/product/11.2.0/xe/” and ORACLE_SID=XE. XE is the preinstalled database. Therefore, I guess I have to reset ORACLE_HOME and ORACLE_SID(=TPCH?). My problem is on which directory do I set ORACLE_HOME? Do I create a directory parallel to …/11.2.0/xe or must it already exist?

    • Marie Hoffmann

      One day later: it works! I restored the environment variables for ORACLE_HOME, etc. they must not be edited after creating a new user. SID is still XE!
      Another pitfall: the *:tbl.gz files must be stored in the created directory “/home/oracle/tpch/” and you need write acces to this directory!

      Thanks for your detailed installation description!

      best,
      Marie

  1. Pingback: M-A-O-L » Create Your Own Oracle TPC-H Playground on Linux

  2. Pingback: [TPC-H] リポジトリの作成と とりあえず測ってみた | Insight Technology, Inc.

Leave a reply to Tunde Cancel reply