COMPEC Oracle Show: Day 2: Part 1


First of all thanks to all attendees, especially our special guest my esteemed friend Anil Chalil. We’ve talked about many things today also. Wow it was fantastic for me. Hopefully for you also…

Column & Row

A table may be though to as a spreadsheet in the sense that both have rows and columns. But relational table columns have a domain for each column. That is each column has to have a specific type applied to all rows. There is a common misunderstating that it is the relation modelling enforcing a column to store data of a fixed context such as name, surname, etc. There is nothing enforcing you to this. The following is a valid relation table

Column1 Column2
Istanbul Beşiktaş
Bostancı Istanbul
Ankara Kızılay
Alsancak Izmir

Table 1

It is not the relational model itself enforcing fix context in one field it is the domain of the column if you define the domain of the column1 as {for all x|x is a String and length(x) < 30}. The above table is a correct data sample. The domain of the first column is defined to be a sub-domain of a predefined domain String (length is also a predefined function in String domain) .

If you want your model to store city data in column one and country data in column two, then you have to change the domains of fields (attributes). That is the domain of the first column may be defined as {for all x|x is an element of the set City names of Turkiye and length(x) < 30} and the second column domain may be defined as {for all x|x is an element of the set County names of Turkiye and length(x) < 30}. This domain definition makes Table 1 data an invalid sample whereas Table 2 data is a valid one.

Column1 Column2
Istanbul Beşiktaş
Istanbul Bostancı
Ankara Kızılay
Izmir Alsancak

Table 2

To sum up, in a spreadsheet you are allowed to write any type of data into any cell but in a relational model the concept of domain of a field restricts you while choosing a value for a column of a row.

Correct Typing

This week we’ve looked at three basic classes of Oracle SQL types

  • VARCHAR2 is used for strings(character arrays) of variable size meaning that if you know an upper bound for the length of a string field but not the exact size of it you use VARCHAR2(<upper_bound>)
  • NUMBER (n,p) is the basic numeric type in Oracle. Remember that writing only NUMBER as column type yields a high precision numeric type (38 digits after point) for the related field. Be sure that this is the amount of precision you desire.
  • DATE is the last type of this week used for calendar dating. A question may arise asking why not to use VARCHAR2 instead of a DATE field for simplicity. Good idea but look at the ordering of the same field when it is DATE and VARCHAR2.

Jim was born in 31.01.1980 and John was born 01.02.1987. Which one is older?

I now that question is so silly that anyone will give an immediate answer that Jim is elder. But let’s order them in SQL way:

Using DATE

Using VARCHAR2

Row # Name Birth date   Row # Name Birth date
1 Jim 31.01.1980   1 John 01.02.1987
2 John 01.02.1987   2 Jim 31.01.1980

Table 3

In its simplest terms this is a catastrophe. You sort the data wrong. While DATE column uses chronological ordering VARCHAR2 uses alphabetic ordering which yields an erroneous result that John is the elder brother of two.

Correct choice for valid modeling

We’ve already talked that invalid typing my cause programming errors (wrong sorting of rows). But assume that you model a physical database model but on the production system it causes no errors at all. So it’s ok? No. Let’s see the problem:

 

ModelErr

For many of you this illustration may be totally weird. Let’s explain. Assume that the ideal model that you have to design is A. Moreover for simplicity let’s assume that only parameter in our modeling game is the SQL variable types. If you use wrong or imprecise variable types during your modeling process, you either,

  • On the right Venn diagram you’ve over-modeled the requirement. That is you include some details that your requirement doesn’t entail.
  • On the left Venn diagram you’ve under-modeled the requirement. That is you exclude some details that your requirement entails, indeed.

Let me exemplify them; assume that your requirement team told you that they need to store balance of a customer. You said ok and add a column to some table as NUMBER. Now tell me about a monetary system requiring 38 digit decimal precision. You have just over-modeled. This is mad even if there is no bug for 6 years-old production system.

Same requirement team told you that you also need to keep customer names/surnames in you database. And by under estimating the maximum length of a person name you use VARCHAR2 (30) for the field. Wow you’ve under-modeled your world. if a customer whose name is longer than 30 characters your model will also fail. This is also bad.

Under-modeling is usually rare in comparison with over-modeling. Do you see why? People usually prefer to over estimate column ranges so that they solve the problem of under-modeling by over-modeling.

Correct choice for performance

Wrong type modeling also causes various performance implications. Some of them are

Computational burden due to type casting

Type casting is sometimes issued in systems for overcoming the problems due to wrong typing while DML/query writing steps of development process. Those types of casting requirements consume unnecessary CPU cycles on the server side. This consumption may be so huge that you may need to perform billions of casting function calls in a data warehouse environment for a single query.

Computational burden due to unnecessarily high type precision

If you’ve taken a course on computational sciences (or computer architecture), you’ve most probably learned that decimal computation is more aggressive than integer computation. Or as the precision requirement increases, the computation cost will also increase. In case that you unnecessarily utilize a high precision (over-model) for a number column you start to use more CPU and indeed you are consuming a common resource. In return this causes slow motion SQL executions. So if your requirement enforces never feel reluctant to use NUMBER, but if not NEVER use higher precision than it requires.

Inefficient memory usage due to unnecessarily high type precision

As all you may know CPU access of disk is not allowed in most of today’s modern architectures directly. There has to be a memory caching first and than CPU can read data from there or CPU will first write into memory and then it will be copied on disk. Using longer VARCHAR2 columns than required will cause the same problem as well. You use more memory space for your session variables. And again you consume another common resource.

 

About kocakahin

Just a computer engineer

Posted on December 9, 2007, in Oracle. Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: