Degenerate dimensions in SSAS December 20, 2009Posted by willem42195 in Business Intelligence.
Tags: Degenerate Dimension, SSAS
This post is about a quirk in Analysis Services (version 2008 is used for this article) that can be quite annoying (and difficult to debug) if you’re not aware of it. The next steps describe how you can reproduce the issue with a simple example.
First, create a new table:
CREATE TABLE sales (product VARCHAR(50), product_nr INT, price MONEY)
We will use this table for creating a fact table (measure = price) with a degenerate dimension (product) in SSAS. The key for the degenerate dimension is product. The column product_nr is only added to illustrate more clearly what is happening.
Add some data:
INSERT INTO sales (product, product_nr, price)
VALUES (‘Car’, 1, 99.99),
(‘Car’ + CHAR(9) + CHAR(10) + CHAR(13), 2, 199.99)
Note that in T-SQL, both products are different.
SELECT DISTINCT product FROM sales
returns the following 2 rows
but it’s hard to see the difference. The following statement will make this more clear:
SELECT REPLACE(REPLACE(REPLACE([product],CHAR(9),’*TAB*’),CHAR(10),’*LF*’),CHAR(13),’*CR*’) AS product FROM sales
Now create a dimension in SSAS from the Sales table with the following 2 attributes: product (key) and product_nr. When you process the dimension, you will get the following error:
Internal error: The operation terminated unsuccessfully.
Server: The operation has been cancelled.
Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: ‘dbo_Sales’, Column: ‘product’, Value: ‘Car’. The attribute is ‘Product’.
Now, how is that possible? We just proved that there are no duplicate values (you can add a primary key or unique constraint to be 100% sure if you want)!
The reason is that when processing the dimension, SSAS by default does a right trim and this eliminates not only the spaces, but also any of the three special characters (tab, line feed and carriage return) we added! Note how this differs from T-SQL where these characters are not impacted by RTRIM as can be seen here:
SELECT REPLACE(REPLACE(REPLACE(RTRIM([product]),CHAR(9),’*TAB*’),CHAR(10),’*LF*’),CHAR(13),’*CR*’) AS product FROM sales
Now, let’s change the default behaviour. Change the trimming from Right to None by going to the properties of the product attribute. Use the following screen shot as a guide to find where this setting is hiding.
Now the dimension processes smoothly. Problem solved? Not quite!
To illustrate this, create a cube with the following options:
measure group table = sales, measure = price, dimension = product, do not add new dimensions. Process the cube. Still looking fine? Have a look at the contents of the cube.
For the finale, let me show you what can go wrong (and did in my ‘real life’ case).
Add an extra row to the table:
INSERT INTO Sales (product, product_nr, price)
VALUES (‘Bike’ + CHAR(9) + CHAR(10) + CHAR(13), 3, 49.99)
Note that for this product, the variant without the special characters does not exist! Because trimming is set to none for the dimension, but trimming will occur when processing the cube, no match will be found and you will get the following error:
Server: The operation has been cancelled.
Errors in the OLAP storage engine: The attribute key cannot be found when processing: Table: ‘dbo_Sales’, Column: ‘product’, Value: ‘Bike’. The attribute is ‘Product’.
Errors in the OLAP storage engine: The record was skipped because the attribute key was not found. Attribute: Product of Dimension: Product from Database: Test SSAS, Cube: Sales, Measure Group: Sales, Partition: Sales, Record: 3.
If you run into this situation, you have 2 options: either you solve it in the data source view (create a named query where you remove the special characters) or remove the special characters in your ETL. I found the former to give really bad performance on large tables so my recommendation would be to do the latter.
SSIS Package Configurations August 5, 2009Posted by willem42195 in Business Intelligence.
Tags: Package Configurations, SSIS
add a comment
A quick post about two (minor) issues I ran into when defining package configurations in SQL Server. Note: I’m using SSIS 2008 but I guess they equally apply to SSIS 2005.
I store all my configurations in SQL Server, which by default suggests a table called dbo.SSIS Configurations. Working on multiple projects for different clients, they all have their own preference on how to call this table and which schema to put it in. Since I have developed a template (including auditing, error logging, performance monitoring, etc.) that I want to be universally deployable, I created a synonym dbo.SSIS Configurations pointing to wherever the physical table is. This seems to be working well, but not when adding a new configuration or modifying an existing one! Suddenly BIDS starts complaining that the table is not in the format it expects it to be.
So there is no other option than to select the physical table for adding a configuration. Aftwerwards, you can edit the package in XML to use the synonym and it will work. If you want to change the configuration, you either have to delete and recreate it or change the XML directly. Because this is a template, this approach is acceptable. The template itself doesn’t change very often. And when using the template in a development project, the location of the physical table is known (and fixed) and there is no need for the synonym anymore.
As a sidenote: when copying the template package, BIDS 2008 automatically changes the ID of the package. Unfortunately this is still not true for the indiviudal components in the package.
The second issue comes forth from a long connection string. When trying to add a a configuration for a ConnectionString, I got the following very clear error message:
The actual problem is that the ConfiguredValue is longer than the 255 characters that are foreseen in the table. This is (partially) caused by the fact that BIDS adds a very long Application Name attribute to the ConnectionString. I solved this issue by first saving the configuration to an XML file. I copied the ConnectionString and manually created a record in the SSISConfigurations table with a shortened Application Name. Then in BIDS I added the configuration (from SQL Server) by selecting the ConfigurationFilter I just created and choosing ‘Reuse Existing’.
Probably making the size of the column larger would have worked as well although I’m not 100% sure there won’t be any side effects. Since in my case, shortening the ConnectionString was easy to do, I preferred to stay on the safe side.
Comparing apples to oranges May 11, 2009Posted by willem42195 in Business Intelligence.
Tags: Business Intelligence
add a comment
Setting: requirements gathering meeting.
User: (firm) My data cannot be aggregated.
Analyst: Cannot or should not?
User: (puzzled) Cannot.
Analyst: Why not?
User: Because you cannot compare apples to oranges.
Analyst: No? Why can’t you compare their weight, volume, taste etc.?
User: Hmm. (thinking) OK, but you cannot add them.
Analyst: Well, in a way you can: 5 apples and 3 oranges are 8 pieces of fruit.
User: Hmm. (thinking) Maybe you’re right. (enthusiastic) That would give me some of those key figures I have to report every month. Never thought I would be able to get that information from the data warehouse. Now, can we define the rules when (not) to aggregate data and what aggregations to apply?
Analyst: Yes we can!
When things go wrong May 4, 2009Posted by willem42195 in Business Intelligence.
Tags: bi-and-the-economic-crisis, Business Intelligence, Business IT Alignment
This article has been triggered by some recent posts on the role of IT. See for example Peter Thomas’s The scope of IT’s responsibility when businesses go bad or Jill Dyché’s Dear IT: A Letter from Your Business Users.
Let me jump immediately to my conclusion: the success of any business intelligence initiative is a shared responsibility.
In Competing on Analytics, Tom Davenport writes the best information and analytics aren’t very useful unless good decisions are made and the right actions taken.
While it is IT’s job to make sure that the business has everything at its disposal to make the right decisions, it is not IT’s task to run the business.
If you know a little about statistics or game theory, chances are (pun intended) that you do not engage in gambling. Yet, so many people do. The same is true in business.
There can be many reasons why business men and women deliberately choose to ignore factual information and follow their instincts:
- they have been successful before and started to believe they are unbeatable
- the incentive is so big that they simply want to try their luck
- they fear they will no longer be around when credit is given for the hard work they have to do now, whereas for the same reason it will be too late to hold them accountable for their negligence
As IT you may have access to someone with more power than the offender (either an executive in your organization or an external actor like e.g. a regulator) and prevent disaster from happening. But sometimes you don’t or you are not prepared to pay the price for blowing the whistle. In that case, the business intelligence initiative will fail. Is it IT’s responsibility? Maybe, but no more than 50%. You can lead a horse to water, but you cannot make it drink.
Tweet about this article on