jump to navigation

Degenerate dimensions in SSAS December 20, 2009

Posted by willem42195 in Business Intelligence.
Tags: ,
9 comments

This post is about a quirk in Analysis Services (version 2008 is used for this article) that can be quite annoying (and difficult to debug) if you’re not aware of it. The next steps describe how you can reproduce the issue with a simple example.

First, create a new table:

CREATE TABLE sales (product VARCHAR(50), product_nr INT, price MONEY)

We will use this table for creating a fact table (measure = price) with a degenerate dimension (product) in SSAS. The key for the degenerate dimension is product. The column product_nr is only added to illustrate more clearly what is happening.

Add some data:

INSERT INTO sales (product, product_nr, price)
VALUES (‘Car’, 1, 99.99),
(‘Car’ + CHAR(9) + CHAR(10) + CHAR(13), 2, 199.99)

Note that in T-SQL, both products are different.

SELECT DISTINCT product FROM sales

returns the following 2 rows

product
Car
Car

but it’s hard to see the difference. The following statement will make this more clear:

SELECT REPLACE(REPLACE(REPLACE([product],CHAR(9),’*TAB*’),CHAR(10),’*LF*’),CHAR(13),’*CR*’) AS product FROM sales

returns

product
Car
Car*TAB**LF**CR*

Now create a dimension in SSAS from the Sales table with the following 2 attributes: product (key) and product_nr. When you process the dimension, you will get the following error:

Internal error: The operation terminated unsuccessfully.
Server: The operation has been cancelled.
Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: ‘dbo_Sales’, Column: ‘product’, Value: ‘Car’. The attribute is ‘Product’.

Now, how is that possible? We just proved that there are no duplicate values (you can add a primary key or unique constraint to be 100% sure if you want)!

The reason is that when processing the dimension, SSAS by default does a right trim and this eliminates not only the spaces, but also any of the three special characters (tab, line feed and carriage return) we added! Note how this differs from T-SQL where these characters are not impacted by RTRIM as can be seen here:

SELECT REPLACE(REPLACE(REPLACE(RTRIM([product]),CHAR(9),’*TAB*’),CHAR(10),’*LF*’),CHAR(13),’*CR*’) AS product FROM sales

still returns

product
Car
Car*TAB**LF**CR*

Now, let’s change the default behaviour. Change the trimming from Right to None by going to the properties of the product attribute. Use the following screen shot as a guide to find where this setting is hiding.


Now the dimension processes smoothly. Problem solved? Not quite!
To illustrate this, create a cube with the following options:
measure group table = sales, measure = price, dimension = product, do not add new dimensions. Process the cube. Still looking fine? Have a look at the contents of the cube.


Where is product_nr 2? What we learn from this is that when processing the cube, the RTRIM is applied again! but this time there is no setting to turn it off!

For the finale, let me show you what can go wrong (and did in my ‘real life’ case).

Add an extra row to the table:

INSERT INTO Sales (product, product_nr, price)
VALUES (‘Bike’ + CHAR(9) + CHAR(10) + CHAR(13), 3, 49.99)

Note that for this product, the variant without the special characters does not exist! Because trimming is set to none for the dimension, but trimming will occur when processing the cube, no match will be found and you will get the following error:

Server: The operation has been cancelled.
Errors in the OLAP storage engine: The attribute key cannot be found when processing: Table: ‘dbo_Sales’, Column: ‘product’, Value: ‘Bike’. The attribute is ‘Product’.
Errors in the OLAP storage engine: The record was skipped because the attribute key was not found. Attribute: Product of Dimension: Product from Database: Test SSAS, Cube: Sales, Measure Group: Sales, Partition: Sales, Record: 3.

Conclusion:
If you run into this situation, you have 2 options: either you solve it in the data source view (create a named query where you remove the special characters) or remove the special characters in your ETL. I found the former to give really bad performance on large tables so my recommendation would be to do the latter.

Advertisements