PowerPivot Inside Out (Part 2/3)

This is the second article of three. If you hadn’t read the first part yet, it is recommended that you do so. It talked about the basics of PowerPivot and already covered the first three of the ten things you need to know. So we will continue with number four…

4. Sorting values of one column by values in another

In Excel this is more or less science fiction. But in PowerPivot this works beautifully. So imagine this, you have a column with descriptions that go something like “beginner”, “intermediate” and “master”. When you put these in a pivot table, you actually get a right order, but only since that is the alphabetical order of things. But what if our descriptions were “Rookie”, “Seems to know his way around” and “Almost god”. Now if you were to insert these descriptions into a pivot table, the order of things would be anything but natural. This is a very rudimentary example of things, but it can go much further. In PowerPivot, time intelligence is a big thing. You can read more about it in the sixth section entitled Time Intelligence. But in order for many of the brilliant time intelligence functions you can use to work you have to use what’s called a Calendar Table. It’s a simple table with continuous dates. So before long you will have to deal with dates in PowerPivot. When it comes to dates, Pivot tables created from a Power Pivot data model do not act as one would expect. For Example if you add dates to the Rows and right click on those dates…

…there is no ability to Group or Ungroup them. So if you want to get data by months, you have to get month names to a separate column. Nothing wrong with that, you either use Text or Format, but when you add those to a Pivot table, you get something like this

So the months are not in the logical order. The alphabetical order of the months is no surprise, but it is a hassle. Luckily we have a way to solve this. All you need is a column in the PowerPivot table with numbers that correspond to the correct sorting order. With months this is a simple MONTH function.

Now here it is, in PowerPivot you have a Sort By Column Command. And here you can say

Sort the Month Names as you would Month Numbers. And right away we get

A great article about sorting months chronologically here.

5. Disconnected tables

Usually in a PowerPivot Data Model, you would see something like this…

…so a Fact Table and Dimensions as connected tables. This is a very simple star schema as opposed to a far more complex snowflake schema where the dimensions have their own connected tables and so on. These connections are part of the magic behind PowerPivot. But sometimes you want to have a table that is not connected to any other table. We call this tables disconnected tables. They may sometimes do havoc, as they almost inevitably lead to the “Relationships between tables may be needed” warnings, but they can be a great asset.

Here’s an example. Our data will be as simple as this

Now based on this data, we have a following analysis.

So SUM Of Sales per Year per QTR. This is a very simple measure

[SumOfSales]
=SUM(Table1[Sales]) .

But now we want to take this a step further, we want to calculate a SumOfSales for our top 10 sales in that period. Again a fairly simple measure Sales – TOP 10…

Now here comes the tricky part, what if we want to change that 10 dynamically. Top 3, 20, 50… Here we will create a new table with the values we want to use for the N.

We add this table to our data model…

And since we will not add a connection, this is now a disconnected table in our Data Model.

Now in the Pivot table we add a Slicer based on the TOPN field of the disconnected table.

Next step is to write a measure, which will return the selected value. This can be a simple MAX like so…

With this we can modify the TOP10 measure we wrote earlier, to include this selection…

With this measure, we can now use the slicer to modify the N counter and get a SumOfTopNSales. On the image below is the Sum of top 12 sales.

It’s ingenuous.

6. Time Intelligence

As mentioned in section 4, Time Intelligence is a big thing in PowerPivot and DAX. It can do wonders for you, but one condition has to be met, for you to be able to use Time Intelligence. You must have a Calendar Table! 99% of Data Models include a Calendar table. It’s a table of sequential dates (none can be missing!). Now you connect this Calendar Dimension to your Fact Table Date field and you are set to go! Now you can use functions like TOTALYTD, TOTALMTD, TOTALQTD…

As you can see, TOTALYTD expects as a second argument something called Dates. This is the Date field of your Calendar Table. As you can see it also offers you a [YearEndDate] option, where you can specify something different from 12/31/ and get YTD for a July – June year and so on.

Another great function is a DateAdd where you can say do a sum of sales, but go back 8 months or back 4 years and so on.

The 8 months back would be [Sales8MonthsBack] = CALCULATE([SUMOFSALES],DATEADD(Calendar[Date],-8,MONTH)) . This is a perfect FILTER function to use in your CALCULATE, but I’m getting ahead of myself. The goal here is to let you know that Time Intelligence is a big thing and deserves to be among the 10 things you need to know and use if you want to take your PowerPivot reports to the next level.

Now we are three steps closer to eternal happiness 🙂

PowerPivot Inside Out (Part 1/3)

10 things you need to know

The following article is kind of a baseline knowledge, one should possess, before attempting the creation of their first PowerPivot workbook in Excel. I always believed that understanding is key, and therefore this article is written to deepen your understanding of how PowerPivot Add-Inn for Excel works and what’s actually happening under the hood. In the process of reading this article you will also be given many reasons for using PowerPivot in Excel. Although I borrowed the title from Microsoft books, this will be just as Informative but much shorter J.

  1. First thing’s first. What is PowerPivot?

It’s an Add-In for Excel 2010 and 2013, and it brings SQL Server Analysis Services (SSAS) to Excel, but in a way that is new to Excel users as it is for T-SQL Professionals. The revolution does not stop there. It must also be said that it stores data far more efficiently than Excel does. More about that in paragraph 3. Comparing the amount of data that can be stored in PowerPivot vs. the amount in Excel goes something like 999.999.999 vs. 1.048.576. That’s not even the limit, it’s just the most I’ve seen… But rather than portraying this as a choice of one or the other, the fact is one cannot exist without the other. If you want to do amazing things in Excel, sooner or later you will have to use PowerPivot. So to answer the initial question, Power Pivot is an Add-Inn for Excel, which allows you to store large amount of data in a compact form as a metadata of an Excel file and then analyze that data with Pivot Tables, Power View sheets or PowerMap projects.

Just as an extra point I would like to point out that Power Pivot V1, V2 and Power Pivot for Excel 2013 are not interchangeable. If you have Power Pivot data in Excel 2013 you cannot open that data in Excel 2010. In the opposite case, it does offer to upgrade your data from 2010 to 2013 however. But once you’ve done so, you cannot open it in 2010!

2. What are the space and RAM requirements

Here is a basic comparison of a file size when data is in an Excel sheet, (just for fun I added a difference between formatting your data as a Table rather than just leaving it as a List) or in a Power Pivot metadata of an Excel file. As an added value, there is a significant difference in the consumption of RAM as can be seen in the table and chart.

But leaving this comparison aside for a moment, let us explain how we can save space with the import of data into PowerPivot. The thing to understand is that when you import a table from any source into PowerPivot, it’s not the number of rows that should concern you, it’s the number of columns. The reason why this is so, is the way that PowerPivot indexing works. Now if you have a column called Company in a table that has hundreds of millions of rows but there are only thousand unique companies, than this column requires almost no space on disk and RAM. But if you have a field with many unique values (like ID) than that will take up a lot of resources. You can see the difference in section 3, but let us just conclude by saying this:

You can gain most (performance vise) at importing data, by only importing those columns(!) that you really need for your data model and analysis.

3. So where and how is the data stored?

As you know all Office files from 2007 on are basically a zip folder. So if you take your Excel file called test.xlsx and rename it to test.zip, you can browse through that file.

In the xl folder you will find a folder called model. Now in that folder you can find a file that has a .data extension. This is actually an offline OLAP cube that allows you to work with your data even if you cannot connect to a data source (being SQL server, Access file or an Excel file).

But when you open an Excel file that has PowerPivot data, first it will load as if no PowerPivot id present. The Pivot tables will be present and visible, but no Data Model will be loaded. That is until you either refresh a Pivot Table, change a filter or slicer value in a Pivot Table or open a PowerPivot window. Now that’s when you can see the RAM consumption go up, but there is a way that you can even see the memory consumption for each individual column or field. Here is how. You go to C:\Users\[USER NAME]\AppData\Local\Temp. In the Temp folder you will find at least one folder called VertiPaq_* where a star represents the following 20 symbols. VertiPaq has changed is identity as xVelocity or Officially as “xVelocity in-memory analytics engine (VertiPaq)” with SQL Server 2012. But Essentially the size of this VetiPaq folder (you can guess the right one by the creation date) is very close to the RAM consumption of that Data Model. But it gets even better. Inside the VertiPaq folder there is a folder with a .db at the end of it’s name. In this folder you will find folders representing the tables in your data model. Luckily here you can see the name of the table at the beginning of the folder name. And in this folder you have individual files representing the size of each column.

And it’s clear to see that most space and RAM consuming column is Date, which is to be expected as it has most unique values.

For the cherry on the cake, you can resolve many problems in PowerPivot by deleting the VertiPaq folders (when the file is not running of course).

Don’t forget, this is only part one of thid In-depth dive into PowerPivot. Two more parts are going to follow soon.

 

Does a PowerPivot Pivot Table beat a regular Pivot Table

Hands down, YES!

Before you read this, you should understand the difference between Power Pivot Add-In and a Pivot Table in Excel. You can read a short but very down to facts post on the two in this Pivot Table vs Power Pivot post

If you are new to the PowerPivot Add-In, or if you just heard it being mentioned in passing, what you should know is that PowerPivot is a game changer. Before Excel 2010, Microsoft was gently laying down the groundwork. Being in a strictness of “You should use Tables, and you can’t go wrong” (and you should) or being that they slowly brought the file format to the xlsx with the XML and ZIP and meta data… Everybody noticed the visual change from menus to the ribbon, but underneath something far greater was about to emerge.

It all started with Excel 2010 and the PowerPivot Add-In. Up to that point, Microsoft knew that Excel is the application that no company can go without, but it had serious limitations. The 216 or 220 of rows ( 65536 and 10480576) which might sound impressive is nothing when talking about Big Data or even smaller Databases. Even worse, those are the theoretical limitations of Excel whereas the practical ones are far smaller. With a few 100000 rows filled, the file becomes too slow and Excel can no longer do Automatic calculations. Knowing this, Microsoft was also aware that there were people doing serious magic in their SQL and SharePoint servers but there was a serious gap between MDX (MultiDimensional eXpressions) and OLAP (OnLine Analytical Processing) constructions and knowledge of Excel that normal Excel users possessed. They had no means to analyze the data without the help of a SQL and SharePoint masters. And let’s face it one data source per Pivot Table is a big stretch for the best analytical engine out there.

With Power Pivot (not a mistake, it should be Power SPACE Pivot now (I guess it’s something to do with registration)) they added a so called DAX (Data Analysis Expressions) language and leveled the playing field. It had some functions that Excel users were already familiar with and some Time intelligence functions that people writing SQL queries already knew. But for both it was a learning Experience. But what it did is it brought the two worlds together.

Just a short mention, in Excel 2010 and especially Excel 2013 the word Power is the key. Now you have Power Pivot, Power Query, Power View and Power Map who combined give an Excel 2013 user the analytical power never seen before. And also worth mentioning, PowerView or PowerMap cannot function without Power Pivot. Data has to be stored in a Data Model that is PowerPivot.

Ok, but this article is not about Power Pivot and DAX, it’s about Pivot Tables, which can now come from a simple Table in Excel or from a Power Pivot Data Model. In Excel 2010 you can actually see the difference in the Pivot Table field list because it adds the Slicers Vertical and Slicers Horizontal fields.

In Excel 2013 they were removed and the difference cannot be seen this way.

So if the difference cannot be seen, is there a difference?

Yes! There are two main differences just in Excel.

Data Sets

One of the best things that you can do when your data comes from an outside, SQL or better yet OLAP source. So why does it work with Power Pivot? Well Power Pivot is actually just and offline OLAP for Excel. If you change the .xlsx to .zip and browse that zip, you will find a .data file. That is your Power Pivot data Model with all the tables and so on. But let us look at what Data Sets are used for. Given the following sample…

Let’s say it’s the 3rd September 2012. In our Pivot we have two Years’ worth of Actual and Forecast data. But for 2011, we don’t need the Forecast, but for 2012 we need both, the Actual and the Forcast. Now if we were to turn of Forecast at a filter level,

then that would turn of Forecast for the entire Pivot Table.

But that is not the desired result. Whereas we would like to hide the Forecast for 2011, we would still like to have it visible for 2012. The word “hide” slipped out in the previous sentence, but as it’s now out there, I would like to point out, that hiding it is not the path to eternal happiness. Especially if we imagined we have more Pivot Tables where we would have to repeat the same process.

Now the screenshots above were taken with a Pivot Table created from a Table in Excel. From here on, we will be looking at the same Pivot Table, but this time created from Power Pivot data (same table but inserted into Power Pivot as a linked table). Here’s where things get really interesting.

First we will go to ANALYZE (Excel 2013) or Options (Excel 2010), then Fields, Items, & Sets and then choose Create Set Based on Column Items.

What you will get is the following window

Here you can give your set a name, and then basically say which columns should be visible and which not. You can select the 2011 Forecast Row and choose the Delete Row button on top. Now since you can remove whichever row you like, you therefor remove that column from the analysis. My final view of this window was

Now the moment you press OK you get

Which is good but the best thing is in the Pivot Table Field List you get

And this Sets folder will now be visible and more importantly usable in all Pivot Tables! So the same Set of Columns I another Pivot Table is just a click away. Very useful.

OLAP Tools

Now Sets were great but here is something even greater – OLAP Tools. A very important thing to understand is that Power Pivot works with all the data in RAM. But first it must pull all the data from the source. And that’s exactly what it does. If you create a Power Pivot project where you insert data from Access or SQL Server or MySQL etc. that data is stored in the file! So if you lose connectivity or you want to display your analysis somewhere where you can’t get to your SQL Server, it will work without a hitch. So all this data is in a .data file which is in the Excel files Meta Data. But that .data file is nothing else that an Offline OLAP Cube. So if you’re working on a Power Pivot data, you’re working on an OLAP Cube. And therefore you get OLAP tools. And amongst those tools, there is one unparticular, I would like to focus on. And that is Convert to Formulas.

Now many times we wanted to insert a column of our own calculations into the Pivot Table. But that cannot be done. Well it cannot be done in a plain Excel Pivot Table but it can be done in a Power Pivot Pivot Table. In regular Pivot you would select GETPIVOTDATA and get the entire Pivot on another sheet and there you would insert your columns. But now it can be done right there. All you have to do is to choose Convert a Pivot Table to formulas and you will no longer have a Pivot Table per say. So you cannot choose the fields anymore, but guess what all the calculations are still active and fresh. So refreshing your Power Pivot data, refreshes these cells, but now that’s exactly what they are, just plain old cells. You can insert your own columns and do whatever calculations you want.

And even the Filters from the Pivot Table are still there, still active and still functional! Such power can be derived from the use of this functionality.

As you can see, there is far more to Power Pivot than meets the eye. It’s not just that it gives you the power to work on more than 1048576 rows of data and on multiple data sources at once, it even makes the good old pivot in excel that much better. So to finish of exactly where I started. Does a Power Pivot Pivot Table beat a regular Pivot Table. Once again, hands down, YES!

Sorting months chronologically and not alphabetically in a Pivot Table report based on Power Pivot data

THIS POST IS OUT OF DATE! You can read the upgraded and up to date version of this article here

Here is our problem. When you create a Pivot Table in Excel that has a date field, you can Group that field by month and the sort will be logical (January, February, …). But when you create a Pivot Table based on Power Pivot, the grouping does not work! So you have to get to the month names and a correct sorting by using a different path. We do this by using the Format function in PowerPivot, but the problem is that when you put this field in a Pivot Table, it gets sorted alphabetically. This is logical since the values are text and have nothing to do with dates as far as that Pivot Table is concerned, but this is a problem since the months are not sorted chronologically. This article will tell you how to achieve that.

Let’s start with a simple table in Excel that has only two columns. One has Date values and the other has a number of visitors on that date. Now we would like to create a Pivot Table report to see how the number of visitors is spread through the months.

Case 1: Pivot Table report based on an Excel Table

First we create a Pivot Table based on an Excel Table


The Pivot Table will show the number of visitors by months. But to do this, since we only have Dates, we have to do Grouping by months on the Dates


And right away we get the desired result.


Case 2: Pivot Table report based on Power Pivot data.

First we add our Table data to Power Pivot the easiest way – by using the Add to Data Model command on the PowerPivot tab.


Now that we have the data in the Power Pivot we can create a Pivot Table report from Power Pivot window. But when we create a Pivot Table and want to see the analysis by months we see we just can’t select the Group command. It is grayed out…


So to get to months we use a different trick, we go back to the Power Pivot window and create a calculated column using a Format function. This is a PowerPivot rendition of the Text function from Excel. The syntax is the same, just the names differ.


Excel Version


Power Pivot version.

Using the Format function we now get the month names and a new field to create a Pivot Report by. But when we create it, it looks quite disappointing.


So the numbers are OK, but the sorting is alphabetical and not the kind we want. To get the sorting right, we have to go back to the PowerPivot window and create a new Calculated column using the Month function. This way we now get a month number along each date and month name.


Now just adding that to the Pivot Table report would get rid of our problem, but let’s not forget that we want the month names as they were, only the sorting is wrong. But now we have all we need.


In the Power Pivot window, we select a value in the month name column and then select a Sort by Column command on the home tab and hey, look at that. You can now say that the Month name column will be sorted by Month No. column.


Doing that has changed our Pivot Report instantly!


And we are one step closer to eternal happiness