Wednesday, December 24, 2008

HAPPY CHRISTMAS

I hope that at this time of year you are able to have some fun, see the family and get some rest from those projects!

Lets keep our fingers crossed that 2009 will turn out to be a better year for the world, and in particular the hard working IT consultants :)

Have some fun at http://www.speedysanta.com/


I'll post part 3 of the Schema discussion next week


Adrian

Wednesday, December 10, 2008

Stars and Snowflakes

Part 2

Continuing on the discussion about OBIEE (Oracle BI EE) structure of the Logical Layer - please read Part 1 before this chapter.

To set the context, The discussion is around your choices in table design in the warehouse and I am only referring to the reporting in OBIEE on the datawarehouse, and your design considerations for Logical Tables, logical Table Sources and Physical Joins in the Physical Layer. Another point to note is that I am not too concerned about the theoretical difference between Snowflakes and Normalised, in the example shown in Part 1 you could call that a Snowflake or a Normalised design. Some have described Snowflakes as semi-normalised stars which is a fair decription, but the important point for the discussion is that there is at least one join between dimensions, which instantly makes it a non-star design, and is on the road to normalisation.

In part three I will look at a mixed model - Logical stars with normalised physical tables.

I do believe the actual dataWarehouse design and ETL routines should follow what you decide in your logical design considerations. Having said that though, I don't think that the whole reporting system can be developed in a linear path, you often have to loop back at each stage when more clarity is found.

And finally, Please join in the discussion by posting your comments.

Now, on with the blog...

From Part 1 some of you may be thinking that I'm advocating that Stars are not always required in the warehouse. Well, I tried to put together a fair case for the use of normalised design as best as I could. Now, lets have a look at the case for a pure star.

Continuing with the Contact example used in part 1, we can redraw the model in a logical and physical pure star.


















Running the same report.















And obviously it works, but look at the SQL:


SELECT

T3489."ATTRIB_11" as c1,

sum(T3517."BLAH_VALUE") as c2

from "W_PERSON_DX" T3489,

"W_PERSON_F" T3517

where ( T3489."ROW_WID" = T3517."CONTACT_WID" )

group by T3489."ATTRIB_11"order by 1



So this time the join is direct in the physical layer, and therefore the server does not add in another table to get the answer.
Is this reduction in joins THE main reason why you design a pure star? Or are there other reasons why star = good?


So on the face of it there is no problem with either method. The star method was very efficient in the query, minimising the number of joins. But at what cost to the ETL process? Here, in this example, none because it joined to an existing key, however if you were talking about say, Account Id, then the mapping would have to look it up and add it to the target table. Of course the Star Transformation in the db will struggle with the normalised schema (or will it – answers on a postcard).

But. I cheated. Twice. Firstly, this is a very simple model, with little data in, so design is not vital.
Secondly, what if I want to do this …..


Part 2A

Let's build a hybrid. Surely we can use a star and introduce some level of normalisation and all will be well? Consider the following design.





The above 'star' has all the physical dimension tables in the single dimension logical table.

We have a logical star. With Physical table sources arranged in a physical star, except we have a few ‘spokes’ around some of the physical dimension tables (this is essentially what mosst people would call a Snowflake!). A common use here would be a ‘List of Values’.

Does it work? and is it 'better'?

There are two considerations

  1. Errors is Analytics generating SQL
  2. Duplication of data.

I'll explore these in part 3.

Monday, December 08, 2008

Stars and Snowflakes

PART 1

We like to talk about the weather in England, and today it is fine - Sunny and Cold. Here in the West at night you can see all the stars and over in the East they are having light snow falls.
Nature has not yet figured out how to combine a clear, star lit night with snow fall.
You can have one but not the other at the same time.
But can OBIEE defy the laws of nature?

A
recent debate got me re-thinking about why I design like I do in OBIEE / Siebel Analytics.


Let’s look at the arguments:
There are two main contenders, Star Schema verses Normalised. Inmon vs Kimball. (I’ll not go into the details of these, I’ll assume you know what each looks like.)

Bert Scalzo says in his book on Warehousing "You should not have a star schema with snowflakes in", but he does say 'AVOID snowflakes' which suggests they are not banned completely.

However, many clients I have worked for actively encourage snowflakes. They're not actually requesting them, but they can't see the point of spending lots of money waiting for the warehouse to be ready for reporting. "The data is in there, so just report on it", they say! This is not helped by sales men and women telling clients how easy it is to develop on your existing structure. One client I worked at was convinced that the layers in the database with complex ETL was purely Intellectual stimulus for bored contractors (Who also got paid by the day!).

Then there is the real killer, the POC – Proof of Concept. Certainly proves that you can report on their data without the hassle of ETL, schedules, downtime, etc.!!!

Another project I know of is full of snowflakes, and just like the real thing, each one is unique! Again the client is happy with the design because works for the current reports.

Up until now I have been a convert to the Star Schema, slavishly making sure every table fitted the philosophy, but I need to keep the client happy, which effectively means developing cheaper. When they send their development offshore they also tend to abdicate from significant design decisions, they just want the system up and running as quickly and cheaply as possible. But that’s life, this is the new game, get over it. Compete.

Before I revisit my design principles, Can Siebel Analytics/OBIEE really cope with snowflakes? Or is it smoke and mirrors? The answer depends upon whether you are looking at a Logical Level or Physical, and how well you implement a snowflake.

On first glance you look at the Business Model and Mapping Layer as a pure Logical Level. You normally have logical tables, arranged in a logical star. And in this logical layer the training documents have always decreed that you have a star.

But can you use another logical model? What would happen if you tried to organise your logical tables in a snowflake? Do you get an error when you check for consistency? No? Why not? Surely OBIEE does not understand snowflakes and it should tell you if there is a bad design!?.
Lets have a look at a snowflake, or normalised, logical layer.


Here we have one dimension joining to the fact, and two other dimension joining to it.









So, running a report,
First the list from W_PERSON_D on its own,








Now add the Fact,










NO problem!

Including normal aggregation




Now we add one of the ‘flakes’







And still no problem.

But what if we only query the Flake (W_PERSON_DX) and the fact?







The result is …..





It works !!!


Here is the SQL being produced.


select T519."ATTRIB_11" as c1,
sum(T547."BLAH_VALUE") as c2
from
"W_PERSON_DX" T519,
"W_PERSON_D" T301,
"W_PERSON_F" T547
where ( T301."ROW_WID" = T519."ROW_WID" and T301."ROW_WID" = T547."CONTACT_WID" )
group by T519."ATTRIB_11"
order by 1



It added in the W_PERSON_D table all by itself.
Remember there is no join between PERSON_DX and PERSON_F, not physically and not logically.

So, a normalised set of tables arranged with the joins in normal form does work.

So why bother with stars?

(part2 to follow)



The Cowes

The Cowes
Cowes Racing