jump to navigation

Degenerate dimensions in SSAS December 20, 2009

Posted by willem42195 in Business Intelligence.
Tags: ,
9 comments

This post is about a quirk in Analysis Services (version 2008 is used for this article) that can be quite annoying (and difficult to debug) if you’re not aware of it. The next steps describe how you can reproduce the issue with a simple example.

First, create a new table:

CREATE TABLE sales (product VARCHAR(50), product_nr INT, price MONEY)

We will use this table for creating a fact table (measure = price) with a degenerate dimension (product) in SSAS. The key for the degenerate dimension is product. The column product_nr is only added to illustrate more clearly what is happening.

Add some data:

INSERT INTO sales (product, product_nr, price)
VALUES (‘Car’, 1, 99.99),
(‘Car’ + CHAR(9) + CHAR(10) + CHAR(13), 2, 199.99)

Note that in T-SQL, both products are different.

SELECT DISTINCT product FROM sales

returns the following 2 rows

product
Car
Car

but it’s hard to see the difference. The following statement will make this more clear:

SELECT REPLACE(REPLACE(REPLACE([product],CHAR(9),’*TAB*’),CHAR(10),’*LF*’),CHAR(13),’*CR*’) AS product FROM sales

returns

product
Car
Car*TAB**LF**CR*

Now create a dimension in SSAS from the Sales table with the following 2 attributes: product (key) and product_nr. When you process the dimension, you will get the following error:

Internal error: The operation terminated unsuccessfully.
Server: The operation has been cancelled.
Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: ‘dbo_Sales’, Column: ‘product’, Value: ‘Car’. The attribute is ‘Product’.

Now, how is that possible? We just proved that there are no duplicate values (you can add a primary key or unique constraint to be 100% sure if you want)!

The reason is that when processing the dimension, SSAS by default does a right trim and this eliminates not only the spaces, but also any of the three special characters (tab, line feed and carriage return) we added! Note how this differs from T-SQL where these characters are not impacted by RTRIM as can be seen here:

SELECT REPLACE(REPLACE(REPLACE(RTRIM([product]),CHAR(9),’*TAB*’),CHAR(10),’*LF*’),CHAR(13),’*CR*’) AS product FROM sales

still returns

product
Car
Car*TAB**LF**CR*

Now, let’s change the default behaviour. Change the trimming from Right to None by going to the properties of the product attribute. Use the following screen shot as a guide to find where this setting is hiding.


Now the dimension processes smoothly. Problem solved? Not quite!
To illustrate this, create a cube with the following options:
measure group table = sales, measure = price, dimension = product, do not add new dimensions. Process the cube. Still looking fine? Have a look at the contents of the cube.


Where is product_nr 2? What we learn from this is that when processing the cube, the RTRIM is applied again! but this time there is no setting to turn it off!

For the finale, let me show you what can go wrong (and did in my ‘real life’ case).

Add an extra row to the table:

INSERT INTO Sales (product, product_nr, price)
VALUES (‘Bike’ + CHAR(9) + CHAR(10) + CHAR(13), 3, 49.99)

Note that for this product, the variant without the special characters does not exist! Because trimming is set to none for the dimension, but trimming will occur when processing the cube, no match will be found and you will get the following error:

Server: The operation has been cancelled.
Errors in the OLAP storage engine: The attribute key cannot be found when processing: Table: ‘dbo_Sales’, Column: ‘product’, Value: ‘Bike’. The attribute is ‘Product’.
Errors in the OLAP storage engine: The record was skipped because the attribute key was not found. Attribute: Product of Dimension: Product from Database: Test SSAS, Cube: Sales, Measure Group: Sales, Partition: Sales, Record: 3.

Conclusion:
If you run into this situation, you have 2 options: either you solve it in the data source view (create a named query where you remove the special characters) or remove the special characters in your ETL. I found the former to give really bad performance on large tables so my recommendation would be to do the latter.

SSIS Package Configurations August 5, 2009

Posted by willem42195 in Business Intelligence.
Tags: ,
add a comment

A quick post about two (minor) issues I ran into when defining package configurations in SQL Server. Note: I’m using SSIS 2008 but I guess they equally apply to SSIS 2005.

I store all my configurations in SQL Server, which by default suggests a table called dbo.SSIS Configurations. Working on multiple projects for different clients, they all have their own preference on how to call this table and which schema to put it in. Since I have developed a template (including auditing, error logging, performance monitoring, etc.) that I want to be universally deployable, I created a synonym dbo.SSIS Configurations pointing to wherever the physical table is. This seems to be working well, but not when adding a new configuration or modifying an existing one! Suddenly BIDS starts complaining that the table is not in the format it expects it to be.

config error

So there is no other option than to select the physical table for adding a configuration. Aftwerwards, you can edit the package in XML to use the synonym and it will work. If you want to change the configuration, you either have to delete and recreate it or change the XML directly. Because this is a template, this approach is acceptable. The template itself doesn’t change very often. And when using the template in a development project, the location of the physical table is known (and fixed) and there is no need for the synonym anymore.

As a sidenote: when copying the template package, BIDS 2008 automatically changes the ID of the package. Unfortunately this is still not true for the indiviudal components in the package.

The second issue comes forth from a long connection string. When trying to add a a configuration for a ConnectionString, I got the following very clear error message: 

connection string

The actual problem is that the ConfiguredValue is longer than the 255 characters that are foreseen in the table. This is (partially) caused by the fact that BIDS adds a very long Application Name attribute to the ConnectionString. I solved this issue by first saving the configuration to an XML file. I copied the ConnectionString and manually created a record in the SSISConfigurations table with a shortened Application Name. Then in BIDS  I added the configuration (from SQL Server) by selecting the ConfigurationFilter I just created and choosing ‘Reuse Existing’.

Probably making the size of the column larger would have worked as well although I’m not 100% sure there won’t be any side effects. Since in my case, shortening the ConnectionString was easy to do, I preferred to stay on the safe side.

Comparing apples to oranges May 11, 2009

Posted by willem42195 in Business Intelligence.
Tags:
add a comment

Setting: requirements gathering meeting.

User: (firm) My data cannot be aggregated.

Analyst: Cannot or should not?

User: (puzzled) Cannot.

Analyst: Why not?

User: Because you cannot compare apples to oranges.

Analyst: No? Why can’t you compare their weight, volume, taste etc.?

User: Hmm. (thinking) OK, but you cannot add them.

Analyst: Well, in a way you can: 5 apples and 3 oranges are 8 pieces of fruit.

User: Hmm. (thinking) Maybe you’re right. (enthusiastic) That would give me some of those key figures I have to report every month. Never thought I would be able to get that information from the data warehouse. Now, can we define the rules when (not) to aggregate data and what aggregations to apply?

Analyst: Yes we can!

When things go wrong May 4, 2009

Posted by willem42195 in Business Intelligence.
Tags: , ,
2 comments

This article has been triggered by some recent posts on the role of IT. See for example Peter Thomas’s The scope of IT’s responsibility when businesses go bad or Jill Dyché’s Dear IT: A Letter from Your Business Users.

Let me jump immediately to my conclusion: the success of any business intelligence initiative is a shared responsibility.

In Competing on Analytics, Tom Davenport writes the best information and analytics aren’t very useful unless good decisions are made and the right actions taken.

While it is IT’s job to make sure that the business has everything at its disposal to make the right decisions, it is not IT’s task to run the business.

If you know a little about statistics or game theory, chances are (pun intended) that you do not engage in gambling. Yet, so many people do. The same is true in business.

There can be many reasons why business men and women deliberately choose to ignore factual information and follow their instincts:

  • they have been successful before and started to believe they are unbeatable
  • the incentive is so big that they simply want to try their luck
  • they fear they will no longer be around when credit is given for the hard work they have to do now, whereas for the same reason it will be too late to hold them accountable for their negligence

As IT you may have access to someone with more power than the offender (either an executive in your organization or an external actor like e.g. a regulator) and prevent disaster from happening. But sometimes you don’t or you are not prepared to pay the price for blowing the whistle. In that case, the business intelligence initiative will fail. Is it IT’s responsibility? Maybe, but no more than 50%. You can lead a horse to water, but you cannot make it drink.


Tweet about this article on twitter_logo

To contribute or not to contribute April 27, 2009

Posted by willem42195 in Social networks.
Tags: ,
1 comment so far

What a week!

Less than a month ago I didn’t have blog, I didn’t care about Twitter (I hadn’t even tried it) and – as far as I know – Twibe groups didn’t exist yet.

And now, in exactly one week, I’m writing my second blog post, more than 165 people (and still counting!) have registered with the BusinessIntelligence Twibe and it’s hard to think of something for which there is no Twibe group yet.

Agreed, Twibes is not that stable yet and some people seem to struggle to make it work so it can improve on user-friendliness. But the Twibes team is very responsive (thank you Adam & C°!) and they are adding great features all the time.

A feature that I’m really looking forward to is the possibility to tweet directly from the Twibe. In the first week, only 5 persons or so actually actively participated by publishing content in the BusinessIntelligence Twibe. My hope is that this new feature will have a positive impact and get more people involved.

In 17 Things we Used to Do Andrew McAfee wrote:

I perceive myself to be part of a single network of friends on Facebook, but I’m part of two very different networks on Twitter: the people I follow (I select these people because I want to get information from them), and those who follow me (these people select me because they want to get information from me).

Sure, people can be part of both networks (you follow them and they follow you), but often they’re not. Likewise, in the Twibe group there will be members that join only with the intention of “listening” and not being “heard” – a position they can of course change at any time.

But since you don’t have to be a member of the Twibe group to see its content, it makes me wonder what their incentive is to join?

Personally, I believe it is about being part of a group and having something in common (after all we call them social networks). It also clearly relates to the last two use cases that Andrew described: Finding information on topics of interest and Finding people who share an interest.

I’m curious to hear what you think about it. Please leave a comment below.

On Twitter and Twibes April 17, 2009

Posted by willem42195 in Social networks.
Tags: ,
5 comments

Recently I discovered Twibe groups. It was love at first sight and I decided to immediately create a group about my passion: Business Intelligence (http://www.twibes.com/BusinessIntelligence).

What is it that I like so much about the concept?

While I do recognize that chitchatting is inherent to Twitter and part of its charm for many users, the reason why I’m on Twitter is purely professional. I use Twitter to unload my mailbox and get rid of all those newsletters and other marketing messages that I don’t read anyway. I prefer the short tweets and people’s personal comments. Mip described this very nicely in his post “How I Became a Twitter Convert”.

But, since people (luckily!) have many interests and a life next to work, I often have to wade through a lot of tweets that are of little interest to me to find those that are relevant to my area of expertise. Jeremiah Owyang predicts even worse in his post “What Happens When Twitter Gets Mainstream Attention”.

This is where I believe Twibe groups come to the rescue. All members of the group share a single (1) interest – in this case Business Intelligence. Twibes scans all tweets from members of the group for those that match any of up to 3 tags (which the group can freely define).

Sometimes, less is more. My idea of the Twibe group is like a Reader’s Digest: you only get to see tweets that are relevant to the group without you having to track down people and following them individually. The group is a powerful medium for content providers to reach a large and attentive audience. The win for the information consumers is efficiency: they get lots of needles and only little hay.

Rules of engagement for the BusinessIntelligence Twibe group:

  • to make your tweet show up in the group, tag it with bitwibe (if you prefer #bitwibe, that will work too)
  • only tag tweets that are directly related to BI (defined in its largest sense)
  • if you retweet something that was already tagged, remove the bitwibe tag unless you add a personal comment that is interesting for the group
  • no spam (this includes commercial messages and information about seminars, trainings and webinars)
  • the group is international so tweet in English and about topics of general interest

If you have any suggestions or would like to share what you expect from the group, then please enter your comments below.

Follow

Get every new post delivered to your Inbox.