The great grandson of Husnu Sensoy

September 24, 2011

Line of Sight (LoS) Analysis: Optimizing the Observers for Best Coverage (Part 4)

Filed under: Uncategorized — kocakahin @ 10:29 pm
Tags: ,
After my last post on multiple observers, I will discuss another topic related to multiple observers. The question is ”What should be the optimal choice of N observers on terrain such that visible region (as many green points as possible by our convention) is maximized ?”.

We first define a pseudo code in order to find the optimal (Not guaranteed. Keep in mind that optimization problems are usually NP-complete by their nature) layout of N observers. For simplicity we will assume that all observers have the same height (7 units) which can be relaxed later.

We will implement a constructive way of finding optimal layout for N observers. Here is the pseudo code:

  1. Find the optimal layout for 1 observer and compute coverage ratio (best coverage for one observer)
  2. Add another random observer ((uniform(-8,8),uniform(-8,8))) and compute the coverage for those two observers (random one and the best observer from Step 1).
    1. If the new coverage is better than the coverage in Step 1, use this as the input of optimization solver
    2. Otherwise repeat Step 2 to find a better coverage.
  3. For number of observers greater than 2 apply the idea in Step 2 recursively.

There are some blur points in this pseudo code. We will define those before moving further with the implementation.

Coverage

The very first thing to be defined is the coverage idea. As you will remember from second post, we have defined our 3D terrain by evaluating our height function over outer product of x & y values varying over [-8,8] with a step size of 0.1 units. We have 1681 different (x,y) tuples. Here is the definition of coverage based on our conventions:

  • Coverage Ratio is the ratio of points within LoS of a given observer/group of observers (at least one of the observers mark those set of points as green) to the total number of points (1681)

How to Find Optimal Coordinates of Observers ?

Optimality is a very common word used in place of many different concepts  in real life or engineering. Let me define it once more for our purpose:

  • Optimization is the process of searching for an N-dimensional vector using a technique to maximize/minimize a function of that N-dimensional vector.

Now let’s substitute three italic words of definition for our problem:

  • N-dimensional vector in our problem is the vector of first to components of observer dimensions. Such as, (x1,y1,x2,y2,…,xn,yn).
  • Technique to be used is the Nelder and Mead Technique (A version of it implemented in R).
  • Function to be maximized is the coverage function which we have defined for a given set of observers.

Implementation

Let’s start by defining the function to be optimized that is coverage of terrain for a given set of observers.


targetfunc<-function(observer){
m <- matrix(data=observer,ncol=2,byrow=TRUE)

# Compute merged status of all observers
mergedstatus <- rep("red",length(terrain$height))
for(oidx in seq(1:dim(m)[1])){
terrain$dist2observer <- distance(terrain, c(m[oidx,],7))

status <- LoS(terrain,c(m[oidx,],7),maxVisibleDistance)

mergedstatus <- updatestatus(mergedstatus,status)
}

sum(mergedstatus=="green")/1681
}

matrix routine allows us to create a table of two columns(first two dimensions of observers) and length(observer)/2 rows. We have used the technique discussed in part 3 to compute merged status of observers.  sum(mergedstatus==”green”) call is used to count number of green points on terrain with respect to observers.

Next is the computation of first input to be given to optimization solver. That’s because for any optimization technique starting point is critical. Without any formal definition we will use our pseudo code to choose a “good starting point/vector”.


n <- 2
baselineValue <- 0.541344
previousObserver <- c(1.15861411217711, 1.1499851362913)
observers <- c(previousObserver,runif(2,-8,8))
while(targetfunc(observers) <= baselineValue){
observers <- c(previousObserver,runif(2,-8,8))
}
print(observers)

Above code is an example to initialize observers vector for searching best 2 observer layout. It uses the best coverage ratio for 1 observer case (54.1344%) and adds a new random observer next to best observer found for single observer case.

Final point is the optimization solver which is very simple and totally handled by R


optim <- optim(observers, targetfunc,
control=list(fnscale=-1,trace=5,REPORT=1))

First parameter is the initial value for input vector (prepared by previous code piece). Second parameter is the name of the function to be maximized. optim function is implemented to solve minimization problems by default. Setting fnscale attribute of control parameter turns it to a maximization problem solver.

Now we can combine all to have our final script


library(rgl)
##################
# Functions
##################
# 3D Terrain Function
height <- function (point) {
sin(point$x)+0.125*point$y*sin(2*point$x)+sin(point$y)+0.125*point$x*sin(2*point$y)+3
}

# Linear Function
linear <- function (px, observer, target) {
v <- observer - target

y <- ((px - observer[1])/v[1])*v[2]+observer[2]
z <- ((px - observer[1])/v[1])*v[3]+observer[3]

data.frame(x=px,y=y, z=z)
}

# Linear Function
distance <- function (terrain, observer) {
sqrt((terrain$x-observer[1])^2+(terrain$y-observer[2])^2+(terrain$height-observer[3])^2)
}

LoS <- function(terrain, observer, maxVisibleDistance){
status = c()
for (i in seq(1:nrow(terrain))) {
if (observer[1] == terrain$x[i] && observer[2] == terrain$y[i]){
if(observer[3] >= terrain$height[i]){
if (terrain$dist2observer[i] > maxVisibleDistance){
status <- c(status,"yellow")
}else{
status <- c(status,"green")
}
}else{
status <- c(status,"red")
}
}else{
# All points on line
line <- linear(seq(from=min(observer[1],terrain$x[i]),
to=max(observer[1],terrain$x[i]),
by=0.1),
observer,
c(terrain$x[i],terrain$y[i],terrain$height[i]))
# Terrain Height
h <- height(line)

# LoS Analysis
aboveTerrain <- round((line$z-h),2) >= 0.00

visible <- !is.element(FALSE,aboveTerrain)
if (visible){
# Second Rule
if(terrain$dist2observer[i] <= maxVisibleDistance){
status <- c(status,"green")
}else{
status <- c(status,"yellow")
}
}else{
status <- c(status,"red")
}
}
}

status
}

updatestatus <- function(status1,status2){
mergedstatus<-c()

for(i in seq(length(status1))){
if (status1[i] == "green" || status2[i] == "green"){
mergedstatus <- c(mergedstatus,"green")
}else if (status1[i] == "yellow" || status2[i] == "yellow"){
mergedstatus <- c(mergedstatus,"yellow")
}
else{
mergedstatus <- c(mergedstatus,"red")
}
}

mergedstatus

}

##################
# Input
##################

# Max visible distance
maxVisibleDistance = 8

# Generate points with a step size of 0.1
x <- seq(from=-8,to=8,by=0.4)

xygrid <- expand.grid(x=x,
y=x)

terrain <- data.frame(xygrid,
height=height(xygrid)
)

targetfunc<-function(observer){
#print(observer)
m <- matrix(data=observer,ncol=2,byrow=TRUE)

# Compute merged status of all observers
mergedstatus <- rep("red",length(terrain$height))
for(oidx in seq(1:dim(m)[1])){
terrain$dist2observer <- distance(terrain, c(m[oidx,],7))

status <- LoS(terrain,c(m[oidx,],7),maxVisibleDistance)

mergedstatus <- updatestatus(mergedstatus,status)
}

sum(mergedstatus=="green")/1681
}

n <- 3
baselineValue <- 0.541344
previousObserver <- c(-1.32661956044593, 2.18870625357827)

# List of observers (x1,y1,z1,x2,y2,z2)
observers <- c(previousObserver,runif(2,-8,8))
while(targetfunc(observers) <= baselineValue){
observers <- c(previousObserver,runif(2,-8,8))
}
print(observers)

optim <- optim(observers, targetfunc,
control=list(fnscale=-1,trace=5,REPORT=1))

Results

Here is the coverage ratio for different number of observers after optimization

You can test covering more than 98% of whole terrain is not trivial by using only 6 random observers but requires “careful” choice of their layout.

Finally let’s check step by step improvement in coverage as we add more optimal observers.

Single Observer

Two Observers

Three Observers

Four Observers

Five Observers

Six Observers

September 16, 2011

Line of Sight (LoS) Analysis: Multiple Observers (Part 3)

Filed under: Uncategorized — kocakahin @ 6:42 pm
Tags: ,

In this part of my LoS Analysis series, I will try to extend 3D LoS analysis for multiple observers. Assume that you drop multiple observers into a terrain with the aim of covering it perfectly (100% green).

We will reuse R codes used in Part 2. However we need to add a simple code piece to be used to merge Line of Sight results of multiple observers. If a point on terrain is visible by any of the observers that means point is visible, if the point is visible but far from all observers that means point is out of LoS due to distance (marked with yellow), for all other conditions point on terrain is red. updatestatus function is implemented for this purpose.

library(rgl)
##################
# Functions
##################
# 3D Terrain Function
height <- function (point) {
sin(point$x)+0.125*point$y*sin(2*point$x)+sin(point$y)+0.125*point$x*sin(2*point$y)+3
}

# Linear Function
linear <- function (px, observer, target) {
v <- observer - target

y <- ((px - observer[1])/v[1])*v[2]+observer[2]
z <- ((px - observer[1])/v[1])*v[3]+observer[3]

data.frame(x=px,y=y, z=z)
}

# Linear Function
distance <- function (terrain, observer) {
sqrt((terrain$x-observer[1])^2+(terrain$y-observer[2])^2+(terrain$height-observer[3])^2)
}

LoS <- function(terrain, observer, maxVisibleDistance){
status = c()
for (i in seq(1:nrow(terrain))) {
if (observer[1] == terrain$x[i] && observer[2] == terrain$y[i]){
if(observer[3] >= terrain$height[i]){
if (terrain$dist2observer[i] > maxVisibleDistance){
status <- c(status,"yellow")
}else{
status <- c(status,"green")
}
}else{
status <- c(status,"red")
}
}else{
# All points on line
line <- linear(seq(from=min(observer[1],terrain$x[i]),
to=max(observer[1],terrain$x[i]),
by=0.1),
observer,
c(terrain$x[i],terrain$y[i],terrain$height[i]))
# Terrain Height
h <- height(line)

# LoS Analysis
aboveTerrain <- round((line$z-h),2) >= 0.00

visible <- !is.element(FALSE,aboveTerrain)
if (visible){
# Second Rule
if(terrain$dist2observer[i] <= maxVisibleDistance){
status <- c(status,"green")
}else{
status <- c(status,"yellow")
}
}else{
status <- c(status,"red")
}
}
}

status
}

updatestatus <- function(status1,status2){
mergedstatus<-c()

for(i in seq(length(status1))){
if (status1[i] == "green" || status2[i] == "green"){
mergedstatus <- c(mergedstatus,"green")
}else if (status1[i] == "yellow" || status2[i] == "yellow"){
mergedstatus <- c(mergedstatus,"yellow")
}
else{
mergedstatus <- c(mergedstatus,"red")
}
}

mergedstatus

}

##################
# Input
##################
# Observer location
#observers<-c(0,0, 6,1,1,6)

# Max visible distance
maxVisibleDistance = 8

# Generate points with a step size of 0.1
x <- seq(from=-8,to=8,by=0.4)

xygrid <- expand.grid(x=x,
y=x)

terrain <- data.frame(xygrid,
height=height(xygrid)
)

# List of observers (x1,y1,z1,x2,y2,z2)
observers <- c(runif(2,-8,8),6,runif(2,-8,8),6,
runif(2,-8,8),6,runif(2,-8,8),6,
runif(2,-8,8),6,runif(2,-8,8),6,
runif(2,-8,8),6,runif(2,-8,8),6)
m <- matrix(data=observers,ncol=3,byrow=TRUE)

# Compute merged status of all observers
mergedstatus <- rep("red",length(terrain$height))
for(oidx in seq(1:dim(m)[1])){
terrain$dist2observer <- distance(terrain, m[oidx,])

status <- LoS(terrain,m[oidx,],maxVisibleDistance)

mergedstatus <- updatestatus(mergedstatus,status)
}

# Set merged status as the ultimate status
terrain <- data.frame(terrain,status = mergedstatus)

rgl.open()
rgl.surface(x, x,
matrix(data=terrain$height,nrow=length(x),ncol=length(x)),
col=matrix(data=mergedstatus,nrow=length(x),ncol=length(x))
)
bg3d("gray")

# Mark all observers
for(oidx in seq(1:dim(m)[1])){
spheres3d(c(m[oidx,1]),
c(m[oidx,3]),
c(m[oidx,2]),
radius=0.25,
color="white"
)
}

rgl.viewpoint(-60,30)

A Few Examples

Here are a few examples. All those observers are uniformly distributed over terrain using runif function

Trivial Case: Single Observer

Trivial Case with One Observer

Two Observers

Two Random Observers

Four Observers

Four Random Observers

Eight Observers

Eight Random Observers

September 8, 2011

Line of Sight (LoS) Analysis: 3D Terrain Analysis (Part 2)

Filed under: Uncategorized — kocakahin @ 12:35 am
Tags: ,

In my previous post on LoS Analysis, I have tried to explain briefly the basics of LoS in two dimensional space. Obviously real life problems are based on three dimensional terrains although basic concepts are all the same. In this second part I will try to adapt the same techniques with a few modifications for three dimensional terrains.

3D Terrain Visualization with R

One of the first differences in 3D LoS analysis is the terrain visualization. We can not use plot function for proper visualization is 3D. Fortunately R has all packages you need for any type of problem. I will use rgl package which can be downloaded using install.packages("rgl") command.

Once you have the rgl package, generating pseudo 3D terrains as we did for 2D is a trivial thing.

3D Terrain with RGL

You can use the following R script to generate your 3D terrains like above.

library(rgl)

# 3D Terrain Function
height <- function (x,y) {
  sin(x)+0.125*y*sin(2*x)+sin(y)+0.125*x*sin(2*y)+0.25
}

# Terrain boundaries -8<=x<=8 and -8<=y<=8
boundary <- c(-8,8)

# Terrain grid with a step size of 0.1 units
xy<-seq(from=boundary[1],to=boundary[2],by=0.1)

# Evaluate all heights for all grid points
z<-outer(xy,xy,height)

# A few visualization staff
zlim <- range(z)
zlen <- zlim[2] - zlim[1] + 1
colorlut <- terrain.colors(zlen) # height color lookup table
col <- colorlut[ z-zlim[1]+1 ] # assign colors to heights for each point

# Draw the terrain
rgl.open()
bg3d("gray")
rgl.surface(xy, xy, z,
            color=col)

A new function in this script is outer function which generates the product of a vector and a row-vector to have a matrix (product of a row-vector with a vector/column-vector is obviously a scalar value and named to be dot/inner product). The third parameter of the function provides us the mechanism to apply a given function (height in our case) for each element of this matrix. Obviously you can play with height function to have fancier 3D terrains and to have best visualization you may need viewpoint routine in rgl package .

LoS in 3D Terrain

Line of Sight analysis on 3D terrain uses the same principles as it does in 2D. Use the following R script to decide on status of a point (invisible, visible, visible but far away)

library(rgl)
##################
# Functions
##################
# 3D Terrain Function
height <- function (x,y) {
  sin(x)+0.125*y*sin(2*x)+sin(y)+0.125*x*sin(2*y)+0.25
}

# Linear Function
linear <- function (x, observer, target) {
  v <- observer - target

  y <- ((x - observer[1])/v[1])*v[2]+observer[2]
  z <- ((x - observer[1])/v[1])*v[3]+observer[3]

  data.frame(x=x,y=y, z=z)
}

# Linear Function
distance <- function (p0,p1) {
  sqrt(sum((p0-p1)^2))
}

##################
# Input
##################
# Observer location
observer<-c(10,10,1)

# Target on terrain
target <- c(5, 5, height(5,5))

# Max visible distance
maxVisibleDistance = 4

# Generate points with a step size of 0.1
x <- seq(from=min(observer[1],target[1]),
         to=max(observer[1],target[1]),
         by=0.1)

# All points on line
line <- linear(x, observer, target)

# Terrain Height
h <- height(line$x,line$y)

# LoS Analysis
aboveTerrain <- round((line$z-h),2) >= 0.1

# First Rule
visible <- !is.element(FALSE,aboveTerrain)
if (visible){
  # Second Rule
  d <- distance(observer, target)
  if(d <= maxVisibleDistance){
    status <- "LoS"
  }else{
    status <- "non-LoS due to Distance"
  }
}else{
  status <- "non-LoS due to Blocking"
}

Obviously there are a few changes in the script with compared to 2D version.  The first one is linear function(Code Lines 10-18). New version not only evaluates second (y) but also the third dimension (z). Notice that z is our height dimension by convention. We have also utilized data.frame function to concatenate all dimensions to form a table of point dimensions

The second difference is on height function (Code Lines 5-8). It is no longer a mapping from x to y but a mapping from x,y to z.

Rest of the 3D version of script is pretty much the same or trivial to discuss more.

Visualizing LoS on 3D Terrain

Until this point we have analyzed LoS of a single point on 2D-3D terrains. But usually network analists wish to know LoS map of the terrain with respect to a given observer.  In other words we need to visually understand which regions on 3D terrain are visible by the observer, invisible by the observer due to blocking, or further than the limit from the observer.

Here the LoS map of our pseudo 3D terrain with respect to an observer with a given set of coordinates and maximum service range(green vs yellow regions).

3D LoS Analysis

You can obtain this visualization using following R script.

library(rgl)
##################
# Functions
##################
# 3D Terrain Function
height <- function (point) {
  sin(point$x)+0.125*point$y*sin(2*point$x)+sin(point$y)+0.125*point$x*sin(2*point$y)+3
}

# Linear Function
linear <- function (px, observer, target) {
  v <- observer - target

  y <- ((px - observer[1])/v[1])*v[2]+observer[2]
  z <- ((px - observer[1])/v[1])*v[3]+observer[3]

  data.frame(x=px,y=y, z=z)
}

# Linear Function
distance <- function (terrain, observer) {
  sqrt((terrain$x-observer[1])^2+(terrain$y-observer[2])^2+(terrain$height-observer[3])^2)
}

LoS <- function(terrain, observer, maxVisibleDistance){
  status = c()
  for (i in seq(1:nrow(terrain))) {
    if (observer[1] == terrain$x[i] && observer[2] == terrain$y[i]){
      if(observer[3] >= terrain$height[i]){
        if (terrain$dist2observer[i] > maxVisibleDistance){
          status <- c(status,"yellow")
        }else{
          status <- c(status,"green")
        }
      }else{
        status <- c(status,"red")
      }
    }else{
      # All points on line
      line <- linear(seq(from=min(observer[1],terrain$x[i]),
                         to=max(observer[1],terrain$x[i]),
                         by=0.1),
                     observer,
                     c(terrain$x[i],terrain$y[i],terrain$height[i]))
      # Terrain Height
      h <- height(line)

      # LoS Analysis
      aboveTerrain <- round((line$z-h),2) >= 0.00

      visible <- !is.element(FALSE,aboveTerrain)
      if (visible){
        # Second Rule
        if(terrain$dist2observer[i] <= maxVisibleDistance){
          status <- c(status,"green")
        }else{
          status <- c(status,"yellow")
        }
      }else{
        status <- c(status,"red")
      }
    }
  }

  status
}

##################
# Input
##################
# Observer location
observer<-c(0.835597146302462, -1.71025141328573, 6)

# Max visible distance
maxVisibleDistance = 8

# Generate points with a step size of 0.1
x <- seq(from=-8,to=8,by=0.4)

xygrid <- expand.grid(x=x,
                      y=x)

terrain <- data.frame(xygrid,
                      height=height(xygrid)
                      )

terrain <- data.frame(terrain,
                      dist2observer=distance(terrain, observer)
                      )

terrain <- data.frame(terrain,
                      status = LoS(terrain, observer, maxVisibleDistance))

rgl.open()
rgl.surface(x, x,
            matrix(data=terrain$height,nrow=length(x),ncol=length(x)),
            col=matrix(data=terrain$status,nrow=length(x),ncol=length(x))
            )
bg3d("gray")
# Mark the observer
spheres3d(c(observer[1]),
          c(observer[3]),
          c(observer[2]),
          radius=0.5,
          color="white"
          )

rgl.viewpoint(-60,30)

For a better visualization R allows you to implement spinning 3D terrains using play3d function and record it in gif format using movie3d function as I did below.

3D LoS Analysis 360

September 2, 2011

Line of Sight (LoS) Analysis: Basics (Part 1)

Filed under: Uncategorized — kocakahin @ 3:02 am
Tags: ,

Introduction

Line of Sight analysis is a commonly used technique in telecommunication industry for A/I (Air Interface) equipment planning and allocation. With the simplest terms LoS is the question whether a point on N-dimensional space is visible by an other observer point. The question can be used to answer where to locate a transceiver on terrain so that it can serve customers on some region A.

Before relatively more complicated problems, let’s start with an easy example focusing on two dimensional terrains. Throughout the post, we will use R for coding which is my favorite option for any mathematical problem (statistics, plotting, linear algebra, optimization, etc.). But you can easily adapt coding material to Mathlab, Python,or your favorite language.

We will start by defining a mathematical function to be used to generate our pseudo terrains. For this purpose trigonometric functions (sin, cos) and polynomial functions are the best ones because of their wavy shapes. Here is an example of trigonometric terrain

Figure 1 Trigonometric Terrain

In order to generate this two dimensional one use the following code piece

x <- seq(from=4,to=10,by=0.01)
y <- sin(x)+cos(2*x)+sin(3*x)+cos(4*x)+3

windows()
plot(x,y,'l',
      main="y=sin(x)+cos(2x)+sin(3x)+cos(4x)+3",
      ylab="height",col="blue")

You may choose to use a polynomial terrain also

Figure 2 Polynomial Terrain

To obtain this terrain, use the following R script piece

x <- seq(from=0,to=6,by=0.01)
y <- x*(x-1)*(x-2)*(x-3)*(x-4)*(x-5)*(x-6)+100

windows()
plot(x,y,'l',
     main="y=x(x-1)(x-2)(x-3)(x-4)(x-5)(x-6)+100",
     ylab="height",col="blue")

Combining polynomial terrain functions with trigonometric ones will give you fancier ones.

What is LoS ?

You can think that we have already answered this question but this was an informal try which is not very useful for solving the problem. In order to solve this problem methodically we need to understand what makes a target visible (within LoS) by the observer.

As you see on Figure 3, green point is within line of sight of observer (blue point). However there is pseudo hill between red point and observer. The difference is that the line connecting observer and green point is always greater than the terrain function whereas this is not valid for the line connecting observer and red point (for x ε [~2.5, ~3.5] red line is under the terrain curve).

Figure 3 LoS vs non-LoS

This was the first point (blocking) we should define. The second one is an easier one related with maximum Euclidean distance between observer and target. The distance between observer and target may cause a phase shift in signal if the distance is sufficiently long or depending on weather conditions and terrain properties you may observer diffraction problems (actually there might be more than those). In return this will cause signal quality issues or call drops.  On Figure 3, although blocking is not an issue between observer and  yellow point, target is out of  visible range (say 8 units) of observer.

You can generate Figure 3 using the following R script

# Terrain Function
height <- function (x) {
  x*x/3+sin(x)+cos(2*x)+sin(3*x)+cos(4*x)+sin(5*x)+cos(6*x)+3
}

# Observer location
observer<-c(1.5,8.9)

# Generate terrain points with a tolerance of 0.1
x<-seq(from=-0.1,to=6.1,by=0.1)
terrainHeight<-height(x)

windows()
# Draw terrain
plot(x,terrainHeight,type='b',
     xlim=range(x),ylim=range(terrainHeight),
     main="Line of Sight (LoS)",
     ylab="Height",xlab="")

# Not LoS
points(x=c(observer[1],x[41]),
       y=c(observer[2],terrainHeight[41]),
       col="red",type='b')

# LoS
points(x=c(observer[1],x[5]),
       y=c(observer[2],terrainHeight[5]),
       col="green",type='b')

# LoS but far
points(x=c(observer[1],x[length(x)]),
       y=c(observer[2],terrainHeight[length(x)]),
       col="yellow",type='b')

# Draw Observer
points(x=c(observer[1]),
       y=c(observer[2]),
       col="blue",pch=10)

Method to Decide LoS

Finally let’s define a method to find all visible, invisible, and “far” points on any terrain. Since it is not “easy” to decide analytically whether the line connecting observer and target “is above” the terrain for any terrain function, we will use a simple numeric method.

We will define a step size small enough (around Spatial Tolerance) to generate all x values between observer and target. seq function is a good choice for doing this (Code Lines 33-36).  Evaluate these x values for line function connecting observer and target and terrain function. Evaluation is simple for terrain function using height function (Code Lines 4-7). Evaluation of line function is held by function linear  using parametric definition of line function (Code Lines 9-14) . Next step is to search for any x value having a line evaluation less than terrain evaluation (Code Line 44-28). The rest is simple as to evaluate euclidean distance and assigning values to status variable.

##################
# Functions
##################
# Terrain Function
height <- function (x) {
  x*x/3+sin(x)+cos(2*x)+sin(3*x)+cos(4*x)+sin(5*x)+cos(6*x)+3
}

# Linear Function
linear <- function (x, observer, target) {
  v <- observer - target

  ((x - observer[1])/v[1])*v[2]+observer[2]
}

# Linear Function
distance <- function (p0,p1) {
  sqrt(sum((p0-p1)^2))
}

##################
# Input
##################
# Observer location
observer<-c(1.5,9)

# Target on terrain
target <- c(5, height(5))

# Max visible distance
maxVisibleDistance = 4

# Generate points with a step size of 0.1
x <- seq(from=min(observer[1],target[1]),
         to=max(observer[1],target[1]),
         by=0.1)

# Terrain Height
h <- height(x)

# y Values
y <- linear(x, observer, target)

# LoS Analysis
aboveTerrain <- round((y-h),2) >= 0.00

# First Rule
visible <- !is.element(FALSE,aboveTerrain)
if (visible){
  # Second Rule
  d <- distance(observer, target)
  if(d <= maxVisibleDistance){
    status <- "LoS"
  }else{
    status <- "non-LoS due to Distance"
  }
}else{
  status <- "non-LoS due to Blocking"
}

August 16, 2011

Book Review: Oracle Warehouse Builder 11gR2: Getting Started 2011

Filed under: Oracle — kocakahin @ 8:07 am
Tags: , , ,

As being the embedded option of Oracle relational database engine, OWB is still my favourite ETL (or to correct ELT) tool. In addition, although Oracle positions ODI as their strategic tool for this purpose, they still keep investing on OWB  at least in 11g Release 2.

I have recently had the chance to review a new book by Packt Publishing, namely Oracle Warehouse Builder 11gR2: Getting Started 2011. Book is  a good introductory book for newbies willingness to learn more on OWB by playing with it in a playground.

Book is organised to let you build your own OWB environment and work on a toy problem namely ACME Toy and Gizmos. You design and create your source and target structures and implement you ELT.

The book will help those who wish to learn more on OWB and look for a practical guide.

May 15, 2011

Book Review: Oracle GoldenGate 11g Implementer’s Guide

Filed under: Oracle — kocakahin @ 9:48 pm
Tags: , , ,

goldengate

As being a consultant in VLDB domain, one of the most popular questions of today is “How can I feed my data warehouse/reporting environment in real-time?” Yet another one is “How can I offload my reporting activity over my OTP environment without generating any time gap between reality and what is being reported?” In addition, I am a bit depressed to see people taking CDC (Change Data Capture) products as DR (Disaster Recovery) solutions.

It was two years ago in ACED briefing when Thomas Kurain declared that GoldenGate is Oracle’s strategic real-time integration solution. After that, I have spent quite a few time to understand merits and drawbacks of this product. I have talked customers want to use it, already using it, and suffering it. Almost all sites annoyed with the same problems

  • It is very hard to configure GoldenGate
  • It is very hard to monitor&manage GoldenGate
  • Oracle documentation is still not sufficient for them.

To be honest it is hard to say that they are wrong.

Last week I have read John P. Jeffries’s Oracle GoldenGate 11g Implementer’s Guide and I can easily say that it is a nice piece of material built just to make reader a successful GoldenGate implementer. There is no dictionary-based definition of GoldenGate concepts like Extract, Trail File, Data Pump, etc. as it is in Oracle formal documentations. The book is structured in “Let me show what I define above” fashion. The book is full of details to show you the way of implementing up and running GoldenGate systems. However, I will continue to write on my favorite sections.

Chapter 6: Configuring Golden Gate for HA is on how to configure GoldenGate on a RAC database. Chapter covers to integrate GoldenGate with clusterware software to enable automatic failover. I have seen customer sites still writing custom scripts for this. Therefore, this chapter is a good how to for RAC implementers.

Chapter 8: Managing Oracle GoldenGate is a chapter mainly about reporting performance metrics and errors of functional GoldenGate systems. Chapter explains to report your GoldenGate errors, latency, and throughput and interpret them.

Chapter 9: Performance Tuning is on tuning GoldenGate using parallel extracts and replicates and tuning GoldenGate storage and network for best performance.

To sum up, if you are a newbie in GoldenGate but responsible with implementing a new GoldenGate environment or maintaining an existing one, this book might be a good resource for you.

May 1, 2011

An English Man in Istanbul

Filed under: Oracle — kocakahin @ 10:33 pm
Tags: , ,

A gentle man… An Oracle expert … An esteemed community figure…
Yes. As the reader of this Oracle focused blog, you should figure out about whom I am talking. Last week (on Thursday), we have achieved to set first Turkey Oracle User Group (TROUG) Day at Bahcesehir University.

We have just discussed on the keynote speaker as the Turkish community members and agreed on one name: Jonathan Lewis
The problem was that as being such a popular figure in community, it was almost impossible to find a proper gap for our event in Jonathan’s schedule. As I have sent the invitation to him and asked for his attendance on one Sunday afternoon, Jonathan just replied that he would be with us and perform the keynote speech. Almost a month after our mailing, Jonathan was on stage at Bahcesehir University auditorium as our keynote speaker talking “just about joining two tables”.
It was amazing that we have finally achieved with this user group day and the man whose books are my Oracle library’s masterpiece was on stage.
Thank you Jonathan personally and as the founding member of TROUG for honoring us and hope to see you soon again…

March 20, 2011

The First Oracle Exadata Certified Implementation Specialist of Turkey

Filed under: Oracle — kocakahin @ 6:14 pm
Tags: ,

After migrating more than 120 TB of data on Exadata v2 and delivering 4 Exadata Handson Courses in Europe (Germany, Russia, and Belgium), last week my friend Zekeriya Besiroglu told me that they have recently opened up an Exadata certification exam. I have taken the exam last week and pass with 93% score.

Just to guide you guys who will take this exam, it seems that I/O Resource Manager is the most important topic of the exam although this is not true for Exadata customers in my region :)

October 22, 2010

Create Your Own Oracle TPC-H Playground on Linux

Filed under: Oracle — kocakahin @ 7:48 pm
Tags: , ,

I believe sometimes all of us suffer from the limitations of playing with Oracle’s SH, SCOTT, etc. schemas to generate a sufficiently large playground for our tests. In this post you will find how to create your own TPC-H playground database on Linux.

Download TPC-H Data Generator (dbgen)

TPC as being the council for TPC-H benchmarks delivers a standardized data generation tool for all benchmarks. You can download this tool from http://www.tpc.org/tpch/default.asp (The version I will be using can be downloaded from http://www.tpc.org/tpch/spec/tpch_2_12_0_b5.zip). This bundle contains a bunch of C files to be compiled to form dbgen. Copy the zip file into one of your folders and ensure that your Linux environment has the necessary toolkit to compile C language (gcc, make, etc.)

[oracle@localhost ~]$ mkdir tpch
[oracle@localhost ~]$ mv tpch_2_12_0_b5.zip ./tpch
[oracle@localhost ~]$ cd tpch/
[oracle@localhost tpch]$ unzip tpch_2_12_0_b5.zip
	Archive: tpch_2_12_0_b5.zip
	inflating: build.c
	inflating: driver.c
	inflating: bm_utils.c
	inflating: rnd.c
...
[oracle@localhost tpch]$

Build DBGEN

Next thing you should do is to cp makefile.suite file in tpch directory and  change some parameters in copied file.

  • CC
  • DATABASE
  • MACHINE
  • WORKLOAD

[oracle@localhost tpch]$ cp makefile.suite makefile

[oracle@localhost tpch]$ vi make
...
################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC      = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
#                                  SQLSERVER, SYBASE, ORACLE
# Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
#                                  SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are:  TPCH
DATABASE= ORACLE
MACHINE = LINUX
WORKLOAD = TPCH
...

Now run make command

[oracle@localhost tpch]$ make
chmod 755 update_release.sh
./update_release.sh 2 12 0
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o build.o build.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o driver.o driver.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bm_utils.o bm_utils.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rnd.o rnd.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o print.o print.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o load_stub.o load_stub.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o bcd2.o bcd2.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o speed_seed.o speed_seed.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o text.o text.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o permute.o permute.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o rng64.o rng64.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o dbgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o -lm
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o qgen.o qgen.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -c -o varsub.o varsub.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DORACLE -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64 -O -o qgen build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o speed_seed.o rng64.o -lm

After a successful compilation you should have dbgen executable in your current working director:

[oracle@localhost tpch]$ ./dbgen -h
TPC-H Population Generator (Version 2.12.0 build 5)
Copyright Transaction Processing Performance Council 1994 - 2010
USAGE:
dbgen [-{vf}][-T {pcsoPSOL}]
[-s <scale>][-C <procs>][-S <step>]
dbgen [-v] [-O m] [-s <scale>] [-U <updates>]

Basic Options
===========================
-C <n> -- separate data set into <n> chunks (requires -S, default: 1)
-f     -- force. Overwrite existing files
-h     -- display this message
-q     -- enable QUIET mode
-s <n> -- set Scale Factor (SF) to  <n> (default: 1)
-S <n> -- build the <n>th step of the data/update set (used with -C or -U)
-U <n> -- generate <n> update sets
-v     -- enable VERBOSE mode

Advanced Options
===========================
-b <s> -- load distributions for <s> (default: dists.dss)
-d <n> -- split deletes between <n> files (requires -U)
-i <n> -- split inserts between <n> files (requires -U)
-T c   -- generate cutomers ONLY
-T l   -- generate nation/region ONLY
-T L   -- generate lineitem ONLY
-T n   -- generate nation ONLY
-T o   -- generate orders/lineitem ONLY
-T O   -- generate orders ONLY
-T p   -- generate parts/partsupp ONLY
-T P   -- generate parts ONLY
-T r   -- generate region ONLY
-T s   -- generate suppliers ONLY
-T S   -- generate partsupp ONLY

To generate the SF=1 (1GB), validation database population, use:
       dbgen -vf -s 1

To generate updates for a SF=1 (1GB), use:
       dbgen -v -U 1 -s 1

Generate Your Playgroun Data

Now you are ready to create your playground database. For those of you who are not familiar with TPC-H model, refer to below Relational Model to have an idea of what it looks like.

TPC-H Data Model

Use dbgen to generate a 4G of TPC-H benchmark data. In order to be able to load generated files in parallel by using Oracle External tables, we will be using file split feature of dbgen (Remember that this step might take some time and can be parallelized depending on your CPU & I/O capacity):

[oracle@localhost tpch]$ ./dbgen -s 4 -S 1 -C 8 -v
TPC-H Population Generator (Version 2.12.0)
Copyright Transaction Processing Performance Council 1994 – 2010
Starting to load stage 1 of 8 for suppliers table…/
Preloading text … 100%
done.
Starting to load stage 1 of 8 for customers table…done.
Starting to load stage 1 of 8 for orders/lineitem tables…done.
Starting to load stage 1 of 8 for part/partsupplier tables…done.
Generating data for nation tabledone.
Generating data for region tabledone.

[oracle@localhost tpch]$ ./dbgen -s 4 -S 2 -C 8 -v
TPC-H Population Generator (Version 2.12.0)
Copyright Transaction Processing Performance Council 1994 – 2010
Starting to load stage 2 of 8 for suppliers table…/
Preloading text … 100%
done.
Starting to load stage 2 of 8 for customers table…done.
Starting to load stage 2 of 8 for orders/lineitem tables…done.
Starting to load stage 2 of 8 for part/partsupplier tables…done.
Generating data for nation tableDo you want to overwrite ./nation.tbl ? [Y/N]: y
done.
Generating data for region tableDo you want to overwrite ./region.tbl ? [Y/N]: y
done.

[oracle@localhost tpch]$ ./dbgen -s 4 -S 8 -C 8 -v
TPC-H Population Generator (Version 2.12.0)
Copyright Transaction Processing Performance Council 1994 – 2010
Starting to load stage 8 of 8 for suppliers table…/
Preloading text … 100%
done.
Starting to load stage 8 of 8 for customers table…done.
Starting to load stage 8 of 8 for orders/lineitem tables…done.
Starting to load stage 8 of 8 for part/partsupplier tables…done.
Generating data for nation tableDo you want to overwrite ./nation.tbl ? [Y/N]: y
done.
Generating data for region tableDo you want to overwrite ./region.tbl ? [Y/N]: y
done.

When you are done with all 8 executions you will have *tbl* files in your current working directory. Those are pipe separated files which you will be loading into your database.

DBGEN Options

–s 4 specifies that we are using a scale factor of 4 meaning that we are generating approximately 4GB of benchmark data. –S 1 instructs dbgen to generate first of 8 chunks. –C 8 is the total number of files for each large dataset (excluding nation and region tables). –v is setting the verbosity for dbgen.

DBGEN Output

In total you will see that all *tbl* files will be approximately 4 GB in size.

[oracle@localhost tpch]$ du -ch *.tbl* | tail -1
4.2G       total

A good idea is to compress all those files with gzip so that they will consume minimum disk space and optimize read I/O in case of CPU power abundance.

[oracle@localhost tpch]$ gzip -4 -v *.tbl*
customer.tbl.1:     61.4% -- replaced with customer.tbl.1.gz
customer.tbl.2:     61.6% -- replaced with customer.tbl.2.gz
customer.tbl.3:     61.6% -- replaced with customer.tbl.3.gz
…

[oracle@localhost tpch]$ du -hc *.tbl.* | tail -1
1.3G    total

As you see above I have achieved more than 1:3 compression ratio for all external files on the average.

Create Database Objects & Load External Files

Follow the guide lines for creating permanent and auxiliary database objects.

  1. Create a TPCH user and give necessary grant to it.
  2. Create an Oracle DIRECTORY (TPCH_DIR) object to point the directory that contains your *.tbl*.gz files.
  3. Create the External tables that you will be using as a read source for your load process
    • See that for the sake of simplicity I have used VARCHAR2 column for DATE columns those will be casted to DATE type during load operation.
    • Use inline gzip decompression with PREPROCESSOR option.
    • Create your external tables with proper parallelism to read external files concurrently.
    • Remember to use APPEND hint for INSERT … SELECT statements.
    • Don’t create PK&FK constraints so that you can load in parallel.
  4. Load your data from auxiliary External tables to your permanent heap tables.
  5. Create all necessary constraints on your permanent tables.
  6. Gather a schema statistics to guide CBO correctly.

Start with Auxiliary Loading Objects

We will start by creating TPCH user and TPCH_DIR directory

create tablespace tpch_ts datafile size 100m autoextend on next 100m nologging;

create user tpch identified by tpch temporary tablespace temp default tablespace tpch_ts ;

grant connect,resource to tpch;

grant create any directory to tpch;

conn tpch/tpch

create directory tpch_dir as '/home/oracle/tpch';

create directory zcat_dir as '/bin';

Now create your external tables and ensure that all works perfectly:

set timing on

spool load_object.log

drop table region_ext;
drop table nation_ext;
drop table supplier_ext;
drop table customer_ext;
drop table order_ext;
drop table part_ext;
drop table partsupp_ext;
drop table lineitem_ext;

CREATE TABLE region_ext (r_regionkey  NUMBER(10),
                         r_name varchar2(25),
                         r_comment varchar(152))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
       LOCATION ('region.tbl.gz'))
NOPARALLEL
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE nation_ext (n_nationkey  NUMBER(10),
                         n_name varchar2(25),
                         n_regionkey number(10),
                         n_comment varchar(152))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
       LOCATION ('nation.tbl.gz'))
NOPARALLEL
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE supplier_ext (S_SUPPKEY NUMBER(10),
                           S_NAME VARCHAR2(25),
                           S_ADDRESS VARCHAR2(40),
                           S_NATIONKEY NUMBER(10),
                           S_PHONE VARCHAR2(15),
                           S_ACCTBAL NUMBER,
                           S_COMMENT VARCHAR2(101))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('supplier.tbl.1.gz','supplier.tbl.2.gz','supplier.tbl.3.gz','supplier.tbl.4.gz',
          'supplier.tbl.5.gz','supplier.tbl.6.gz','supplier.tbl.7.gz','supplier.tbl.8.gz') )
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE customer_ext (C_CUSTKEY NUMBER(10),
                           C_NAME VARCHAR2(25),
                           C_ADDRESS VARCHAR2(40),
                           C_NATIONKEY NUMBER(10),
                           C_PHONE VARCHAR2(15),
                           C_ACCTBAL NUMBER,
                           C_MKTSEGMENT VARCHAR2(10),
                           C_COMMENT VARCHAR2(117))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('customer.tbl.1.gz','customer.tbl.2.gz','customer.tbl.3.gz','customer.tbl.4.gz',
          'customer.tbl.5.gz','customer.tbl.6.gz','customer.tbl.7.gz','customer.tbl.8.gz') )
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE order_ext (O_ORDERKEY NUMBER(10),
                        O_CUSTKEY NUMBER(10),
                        O_ORDERSTATUS CHAR(1),
                        O_TOTALPRICE NUMBER,
                        O_ORDERDATE VARCHAR2(10),
                        O_ORDERPRIORITY VARCHAR2(15),
                        O_CLERK VARCHAR2(15),
                        O_SHIPPRIORITY NUMBER(38),
                        O_COMMENT VARCHAR2(79))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('orders.tbl.1.gz','orders.tbl.2.gz','orders.tbl.3.gz','orders.tbl.4.gz',
          'orders.tbl.5.gz','orders.tbl.6.gz','orders.tbl.7.gz','orders.tbl.8.gz'))
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE part_ext (P_PARTKEY NUMBER(10),
                       P_NAME VARCHAR2(55),
                       P_MFGR VARCHAR2(25),
                       P_BRAND VARCHAR2(10),
                       P_TYPE VARCHAR2(25),
                       P_SIZE NUMBER(38),
                       P_CONTAINER VARCHAR2(10),
                       P_RETAILPRICE NUMBER,
                       P_COMMENT VARCHAR2(23))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('part.tbl.1.gz','part.tbl.2.gz','part.tbl.3.gz','part.tbl.4.gz',
          'part.tbl.5.gz','part.tbl.6.gz','part.tbl.7.gz','part.tbl.8.gz') )
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE partsupp_ext (PS_PARTKEY NUMBER(10),
                           PS_SUPPKEY NUMBER(10),
                           PS_AVAILQTY NUMBER(38),
                           PS_SUPPLYCOST NUMBER,
                           PS_COMMENT VARCHAR2(199))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('partsupp.tbl.1.gz','partsupp.tbl.2.gz','partsupp.tbl.3.gz','partsupp.tbl.4.gz',
          'partsupp.tbl.5.gz','partsupp.tbl.6.gz','partsupp.tbl.7.gz','partsupp.tbl.8.gz'))
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

CREATE TABLE lineitem_ext (L_ORDERKEY  NUMBER(10),
                           L_PARTKEY NUMBER(10),
                           L_SUPPKEY NUMBER(10),
                           L_LINENUMBER  NUMBER(38),
                           L_QUANTITY NUMBER,
                           L_EXTENDEDPRICE   NUMBER,
                           L_DISCOUNT NUMBER,
                           L_TAX  NUMBER,
                           L_RETURNFLAG  CHAR(1),
                           L_LINESTATUS CHAR(1),
                           L_SHIPDATE  VARCHAR2(10),
                           L_COMMITDATE VARCHAR2(10),
                           L_RECEIPTDATE  VARCHAR2(10),
                           L_SHIPINSTRUCT  VARCHAR2(25),
                           L_SHIPMODE VARCHAR2(10),
                           L_COMMENT VARCHAR2(44))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
       RECORDS DELIMITED BY NEWLINE
       PREPROCESSOR zcat_dir:'zcat'
       BADFILE 'bad_%a_%p.bad'
       LOGFILE 'log_%a_%p.log'
       FIELDS TERMINATED BY '|'
       MISSING FIELD VALUES ARE NULL)
LOCATION ('lineitem.tbl.1.gz','lineitem.tbl.2.gz','lineitem.tbl.3.gz','lineitem.tbl.4.gz',
          'lineitem.tbl.5.gz','lineitem.tbl.6.gz','lineitem.tbl.7.gz','lineitem.tbl.8.gz'))
PARALLEL 2
REJECT LIMIT 0
NOMONITORING;

-- Count all
-- 5 rows
select count(*) region_count from region_ext;
-- 25 rows
select count(*) nation_count from nation_ext;
-- 40000 rows
select  count(*) supplier_count from supplier_ext;
-- 600000 rows
select count(*) customer_count from customer_ext;
-- 6000000 rows
select count(*) order_count from order_ext;
-- 800000 rows
select count(*) part_count from part_ext;
-- 3200000 rows
select count(*) partsupp_count from partsupp_ext;
--23996604 rows
select count(*) lineitem_count from lineitem_ext;

spool off

Continue with Persistent Ones

Next step is to create your persistent objects  (heap tables) without PK and FK constraints on them.

set timing on
spool heap_objects.log
DROP TABLE H_CUSTOMER CASCADE CONSTRAINTS ;
DROP TABLE H_LINEITEM CASCADE CONSTRAINTS ;
DROP TABLE H_NATION CASCADE CONSTRAINTS ;
DROP TABLE H_ORDER CASCADE CONSTRAINTS ;
DROP TABLE H_PART CASCADE CONSTRAINTS ;
DROP TABLE H_PARTSUPP CASCADE CONSTRAINTS ;
DROP TABLE H_REGION CASCADE CONSTRAINTS ;
DROP TABLE H_SUPPLIER CASCADE CONSTRAINTS ;

CREATE TABLE H_CUSTOMER (c_custkey NUMBER(10) NOT NULL,
                         c_name VARCHAR2(25) NOT NULL,
                         c_address VARCHAR2(40) NOT NULL,
                         c_nationkey NUMBER(10) NOT NULL ,
                         c_phone VARCHAR2(15) NOT NULL,
                         c_acctbal NUMBER NOT NULL,
                         c_mktsegment VARCHAR2(10) NOT NULL,
                         c_comment VARCHAR2(117) NOT NULL)
PARALLEL 2;

CREATE TABLE H_LINEITEM (l_orderkey NUMBER(10) NOT NULL,
                         l_partkey NUMBER(10) NOT NULL,
                         l_suppkey NUMBER(10) NOT NULL ,
                         l_linenumber INTEGER  NOT NULL ,
                         l_quantity NUMBER NOT NULL,
                         l_extendedprice NUMBER NOT NULL,
                         l_discount NUMBER NOT NULL,
                         l_tax NUMBER NOT NULL,
                         l_returnflag CHAR(1) NOT NULL ,
                         l_linestatus CHAR(1) NOT NULL,
                         l_shipdate DATE NOT NULL,
                         l_commitdate DATE NOT NULL,
                         l_receiptdate DATE NOT NULL,
                         l_shipinstruct VARCHAR2(25) NOT NULL,
                         l_shipmode VARCHAR2(10) NOT NULL,
                         l_comment VARCHAR2(44) NOT NULL)
PARALLEL 2;

CREATE TABLE H_NATION (n_nationkey NUMBER(10) NOT NULL,
                       n_name VARCHAR2(25) NOT NULL,
                       n_regionkey NUMBER (10) NOT NULL,
                       n_comment VARCHAR2 (152) NOT NULL)
NOPARALLEL;

CREATE TABLE H_ORDER (o_orderkey NUMBER (10)  NOT NULL,
                      o_custkey NUMBER(10)  NOT NULL,
                      o_orderstatus CHAR(1) NOT NULL,
                      o_totalprice NUMBER NOT NULL,
                      o_orderdate DATE NOT NULL,
                      o_orderpriority VARCHAR2(15) NOT NULL,
                      o_clerk VARCHAR2(15) NOT NULL,
                      o_shippriority INTEGER NOT NULL,
                      o_comment VARCHAR2(79) NOT NULL)
PARALLEL 2;

CREATE TABLE H_PART (p_partkey NUMBER(10)  NOT NULL,
                     p_name VARCHAR2(55) NOT NULL,
                     p_mfgr VARCHAR2(25) NOT NULL,
                     p_brand VARCHAR2(10) NOT NULL,
                     p_type VARCHAR2(25) NOT NULL,
                     p_size INTEGER NOT NULL,
                     p_container VARCHAR2(10) NOT NULL,
                     p_retailprice NUMBER NOT NULL,
                     p_comment VARCHAR2(23) NOT NULL)
PARALLEL 2;
CREATE TABLE H_PARTSUPP (ps_partkey NUMBER (10)  NOT NULL ,
                         ps_suppkey NUMBER (10)  NOT NULL ,
                         ps_availqty INTEGER NOT NULL,
                         ps_supplycost NUMBER NOT NULL,
                         ps_comment VARCHAR2 (199) NOT NULL)
PARALLEL 2;

CREATE TABLE H_REGION (r_regionkey NUMBER (10)  NOT NULL ,
                       r_name VARCHAR2 (25) NOT NULL,
                       r_comment VARCHAR2 (152) NOT NULL)
NOPARALLEL;

CREATE TABLE H_SUPPLIER (s_suppkey NUMBER (10)  NOT NULL ,
                         s_name VARCHAR2 (25) NOT NULL,
                         s_address VARCHAR2 (40) NOT NULL,
                         s_nationkey NUMBER (10)  NOT NULL ,
                         s_phone VARCHAR2 (15) NOT NULL,
                         s_acctbal NUMBER NOT NULL,
                         s_comment VARCHAR2 (101) NOT NULL)
PARALLEL 2;

spool off

Load your Data using INSERT … SELECT

Now load your data using INSERT … SELECT statements in parallel from external tables to your heap tables

set timing on
spool load_data.log

truncate table h_lineitem;
truncate table h_order;
truncate table h_part;
truncate table h_customer;
truncate table h_nation;
truncate table h_region;
truncate table h_partsupp;
truncate table h_supplier;
alter session enable parallel dml;
insert /*+append*/into h_lineitem
select L_ORDERKEY,
       L_PARTKEY,
       L_SUPPKEY,
       L_LINENUMBER,
       L_QUANTITY,
       L_EXTENDEDPRICE,
       L_DISCOUNT,
       L_TAX,
       L_RETURNFLAG,
       L_LINESTATUS,
       to_date(L_SHIPDATE, 'YYYY-MM-DD'),
       to_date(L_COMMITDATE, 'YYYY-MM-DD'),
       to_date(L_RECEIPTDATE, 'YYYY-MM-DD'),
       L_SHIPINSTRUCT,
       L_SHIPMODE,
       L_COMMENT
from lineitem_ext;
insert /*+append*/ into h_partsupp  select * from partsupp_ext;
insert /*+append*/ into h_part  select * from part_ext;
insert /*+append*/ into h_order
select o_orderkey,
       o_custkey,
       o_orderstatus,
       o_totalprice,
       to_date(o_orderdate, 'YYYY-MM-DD'),
       O_ORDERPRIORITY,
       o_clerk,
       O_SHIPPRIORITY,
       o_comment
from order_ext;
insert /*+append*/ into h_customer  select * from customer_ext;
insert /*+append*/ into h_supplier  select * from supplier_ext;
insert  /*+append*/ into h_nation  select * from nation_ext;
insert /*+append*/ into h_region  select * from region_ext;
commit;
spool off

Build you Constraints

Next step is to build your Primary & Foreign Key contraints

set timing on
spool constraints.log
-- PK Constraints
ALTER TABLE H_REGION ADD CONSTRAINT REGION_PK PRIMARY KEY (r_regionkey);
ALTER TABLE H_NATION ADD CONSTRAINT NATION_PK PRIMARY KEY (n_nationkey);
ALTER TABLE H_SUPPLIER ADD CONSTRAINT SUPPLIER_PK PRIMARY KEY (s_suppkey);

create unique index partsupp_pk on h_partsupp(ps_partkey,ps_suppkey) parallel 2;
ALTER TABLE H_PARTSUPP ADD CONSTRAINT PARTSUPP_PK PRIMARY KEY(ps_partkey,ps_suppkey) using index PARTSUPP_PK;

create unique index PART_PK on H_PART(p_partkey) parallel 2;
ALTER TABLE H_PART ADD CONSTRAINT PART_PK PRIMARY KEY (p_partkey) using index PART_PK;

create unique index ORDERS_PK on H_ORDER(o_orderkey) parallel 2;
ALTER TABLE H_ORDER ADD CONSTRAINT ORDERS_PK PRIMARY KEY (o_orderkey) using index ORDERS_PK;

create unique index LINEITEM_PK on H_LINEITEM(l_linenumber, l_orderkey) parallel 2;
ALTER TABLE H_LINEITEM ADD CONSTRAINT LINEITEM_PK PRIMARY KEY (l_linenumber, l_orderkey)  using index LINEITEM_PK;

create unique index CUSTOMER_PK on H_CUSTOMER(c_custkey) parallel 2;
ALTER TABLE H_CUSTOMER ADD CONSTRAINT CUSTOMER_PK PRIMARY KEY (c_custkey) using index CUSTOMER_PK;

-- FK Constraints
ALTER TABLE H_LINEITEM
ADD CONSTRAINT LINEITEM_PARTSUPP_FK FOREIGN KEY (l_partkey, l_suppkey)
REFERENCES H_PARTSUPP(ps_partkey, ps_suppkey) NOT DEFERRABLE;

ALTER TABLE H_ORDER
ADD CONSTRAINT ORDER_CUSTOMER_FK FOREIGN KEY (o_custkey)
REFERENCES H_CUSTOMER (c_custkey) NOT DEFERRABLE;

ALTER TABLE H_PARTSUPP
ADD CONSTRAINT PARTSUPP_PART_FK FOREIGN KEY (ps_partkey)
REFERENCES H_PART (p_partkey) NOT DEFERRABLE;

ALTER TABLE H_PARTSUPP
ADD CONSTRAINT PARTSUPP_SUPPLIER_FK FOREIGN KEY (ps_suppkey)
REFERENCES H_SUPPLIER (s_suppkey) NOT DEFERRABLE;

ALTER TABLE H_SUPPLIER
ADD CONSTRAINT SUPPLIER_NATION_FK FOREIGN KEY (s_nationkey)
REFERENCES H_NATION (n_nationkey) NOT DEFERRABLE;

ALTER TABLE H_CUSTOMER
ADD CONSTRAINT CUSTOMER_NATION_FK FOREIGN KEY (c_nationkey)
REFERENCES H_NATION (n_nationkey) NOT DEFERRABLE;

ALTER TABLE H_NATION
ADD CONSTRAINT NATION_REGION_FK FOREIGN KEY (n_regionkey)
REFERENCES H_REGION (r_regionkey) NOT DEFERRABLE;

ALTER TABLE H_LINEITEM
ADD CONSTRAINT LINEITEM_ORDER_FK FOREIGN KEY (l_orderkey)
REFERENCES H_ORDER (o_orderkey) NOT DEFERRABLE;

spool off

Gather Statistics

Final step is to gather CBO statistics so that you will have, hopefully, better execution plans

begin
    dbms_stats.gather_schema_stats(ownname => 'TPCH',
                                   degree  => 2,
                                   cascade =>  true);
end;
/

Conclusion

Now you are done you. You can either execute infamous TPC-H queries (you can find in TPC-H documentation) or your own test cases.

October 16, 2010

Exadata v2 Fast Track Session Slides in RAC SIG Turkey

Filed under: Oracle — kocakahin @ 11:37 pm
Tags:

You can find my Exadata v2 Fast track slides for my 1 hour session in Oracle RAC SIG on last Saturday
Oracle Exadata v2 Fast Track

Next Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 29 other followers