Mincing Thoughts: Cognos

For beginning and advanced report developers alike, the biggest challenge I have witnessed is the challenge of Report Conceptualization. This is the process of translating often incomplete business requirements into a structured plan that will produce the result that the business needs. For operational reports based on known data structures and calculations this process can be simple, often operational reports are structured very simply and the job of the report developer is to simply put the right fields in the right place. The process becomes much more difficult for strategic reports which attempt to help the business define what they should be doing. Strategic reports are often poorly understood by the business, their needs are uncertain and difficult to communicate, and vision for the end product is amorphous. Taking vague, conceptual requirements into a concrete view of business data is a challenge for developers and analysts alike.

Recently I have been mentoring two Cognos Report Developers in their process. These developers are experienced at building operations-level reports in Cognos and have a good understanding of the data and the business, but they are challenged at providing insightful analysis to the business when the business users themselves only have vague ideas of what they want to see, or in some cases have too many ideas that all jumble together.

The first step in this conceptualization process was to focus on a particular element, a decision, or a comparison that the business needs to make and determine an appropriate way to visualize the data that is needed. This was covered in Part 1, Report Visualization.

Once a vision has been established, the second step is to plan the underlying structure of the report so that the end result is a simple data structure that can be directly mapped to the visualization. This is the process I use for breaking down a difficult report into simple steps, and have trained numerous developers who use it successfully.

The Process

We will start with the user story:

“As a Product Manager I need to know year-over-year sales trends of my products to determine which products I should assign to my premium display space for each retailer.”

Based on the story we have done further analysis and come up with a detailed business requirement. The business user needs a line graph that shows monthly sales of the product lines and products they select for the current year and the previous year. The graph also needs to show the average sales of the selected products. The graph should only include sales from selected retailers.

A mock-up of the chart is given below.

Identify Your Measures

Measures are all numerical fields that can be aggregated or calculated and that are connected to dimensions. These are generally (in Cognos) located in the Fact table, but for the purposes of this process also includes deduced values such as counts or distinct counts of measures. Ie count(distinct product_id) is a measure that can be connected to other dimensions through an appropriate fact table, this produces the deduced value “Count of products for each value of dimension X where sales exist.”

In our above example our only measure is “Sales $”.

Identify Your Summarizations/Calculations

List all unique summarizations of measures alongside your measure list. These will be treated as separate measures that may or may not be able to be grouped with the list of identified measures so far. Remember that your measures will already be summarized according to their default (usually total) aggregation, you need to identify any alternate summarization such as “average”, “max”, “min”, “count” etc in this list.

Calculations are any formula that combines measures (or applies to a single identified measure). Add these alongside your measure list.

In this example we also have “Average $” as a summarization of all products.

Identify Your Aggregations/Groups

Aggregations and groups are any dimension, attribute, column, or field that the defined measures are grouped by (sums, counts, averages, etc). These groups I am calling descriptors, because the column or attribute describes an aggregation, and any particular value (ie. “Product A”) describes the values associated with it.

In our example the aggregations are: “Product”, “Calendar Month”, “Calendar Year”.

Identify Your Filters/Restrictions

Filters and restrictions are any condition applied to the data that reduces the data set. These are usually another set of descriptors (dimensions, attributes, columns, or fields) that are limited or selected in some fashion, and usually overlap in some fashion with the already defined aggregations/groups. There may be additional filters/restrictions that are not represented in the list of descriptors so far, and there may be value filters applied that are based on aggregations of the defined measures.

In our example the filters are: “Product Line – by selection”, “Product – by selection”, “Calendar Year = Current Year”, “Calendar Year = Previous Year”, "Retailer - by selection".

Build a Mapping Table
Create a table and label each column with your Measures, Summarizations, and Calcuations and label each row with your Descriptors, Filters, and Restrictions. Place a mark in each cell where the Descriptor will be used to group or filter the Measure

Our example should produce a table like this:

	Sales $	Average $
Product	X
Calendar Month	X	X
Calendar Year	X	X
Filter - Product Line	X	X
Filter - Product	X	X
Filter - Calendar Year = Current Year	X	X
Filter - Calendar Year = Previous Year	X	X
Filter - Retailer	X	X

Note that Average $ is not grouped by Product but is impacted by the Filter - Product, this is because the Average $ calculation will be performed across all products in the filtered list.

Choose Your Pivot Descriptor

The Pivot is a critical component in the analysis, it is the descriptor that is at the core of the visualization. Generally you can find it at the centre of the business requirement, it is the thing that is being reported on, and that decisions are being made about. The purpose it serves is that all other pieces of data in the analysis are connected to it and revolve around it.

This pivot will generally be used as the join key in any inner/outer joins that will be created in this report, it may not be the only item used in a join, but it will be the common thread that is used in (nearly) every join.

On simpler reports it may be unclear which descriptor is actually serving as pivot, and this discussion often centers around the calendar, whether the date is the pivot object, or whether something else (like product or location) is it. The easiest way to decide is simply asking why are you using the report? What are you reporting on? Are you reporting on Sales by Month (but shown by product)? Or are you really concerned about Product Sales (but shown by month/year)?.

Another quick way to sometimes decide which descriptor is acting as pivot (not always) is whether or not it is being shown on the report in two or more different ways (or distinct sets)? If it is, then it’s probably not your pivot. You can also look at filters, which descriptors are being filtered by the business user’s choice? These are good candidates for the Pivot.

In the example above your Pivot is probably “Product”. The other option would be “Month”, but there are a couple of reasons why this is probably not the case. 1) Product is part of a user-driven filter, the business user is choosing which product lines/products to see and is thus reporting on the sales of particular products, not sales of particular months. 2) Calendar is being included in two distinct sets, Previous Year and Current Year, this is not a hard-and-fast rule but it can imply that this is not the central piece of the view but it is reinforces reason #1.
3) Although Average $ is not being grouped by the Product, it is performing a summary of Products and in the defined report the Average is being displayed in the same manner as a Product.

Group Descriptors into Hierarchies

Identify descriptors that are related to one another in a natural or structured hierarchy, these can be easily collected as “summary” and “detail” components of a single query.

In the example the natural hierarchies are “Month, Year, Filter - Current Year”, and “Month, Year, Filter - Previous Year”. Since month is included in two separate hierarchies it is the natural joining point to connect data from the “Current Year” hierarchy and the “Previous Year” hierarchy.

If we wanted to also display the Product Line on our report we would add it as a Descriptor and it would form a third hierarchy "Product Line, Product" but for simplicity we will leave it out.

Identify Independent or Disjoint Sets

Identify filter or restriction combinations that are independent of one another. These are filter combinations that are mutually exclusive or partially overlapping, and that if both were included in the same query would produce no results or fewer rows than intended.

Our example has the mutually exclusive "Filter - Current Year" and "Filter - Previous Year". If both filters were included in the same query, this would result in no data because a row cannot be both in the Current Year AND the Previous Year.

You may note that in this situation both could be combined into the same query using an OR condition, and due to the simplicity of this example that is true. This process is not intended to produce the most optimal or shortest path to a result, but is intended to produce a consistent, repeatable, and understandable path to a result. The counter-argument to using an OR clause is that if we decided to replace "Current Year" and "Previous Year" with arbitrary (potentially overlapping) date ranges, then using an OR clause would no longer be possible and at least 2 new queries would need to be created to perform the enhancement.

Group Measures/Calculations into Queries based on Descriptor Map

First lets review our updated mapping table.

	Sales $	Average $
Product	X
¹Calendar Month	X	X
- Calendar Year	X	X
- Filter - Calendar Year = Current Year	X	X
²Calendar Month	X	X
- Calendar Year	X	X
- Filter - Calendar Year = Previous Year	X	X
Filter - Product Line	X	X
Filter - Product	X	X
Filter - Retailer	X	X

1 & 2 - Independent sets

First rule of grouping into queries: If 2 measures have exactly the same mapping (and in Cognos they must come from the same Fact table) then they can be combined into the same queries.

Second rule: Split independent sets into different queries, and add all unrelated descriptors into both queries.

Third rule: Identify each combination of measures (with different mappings) and split descriptors as a separate query.

In our example we will produce 4 queries:

Sales $ [Product, Month/Year/Filter - Current Year, Filter - Product Line, Filter - Product, Filter - Retailer]
Sales $ [Product, Month/Year/Filter - Previous Year, Filter - Product Line, Filter - Product, Filter - Retailer]
Average $ [Month/Year/Filter - Current Year, Filter - Product Line, Filter - Product, Filter - Retailer]
Average $ [Month/Year/Filter - Previous Year, Filter - Product Line, Filter - Product, Filter - Retailer]

Combine Queries by Joining on Pivot

With each query defined, the only step left is to join them together in a fashion that produces our report. There are a couple of factors that go into deciding how to join queries together. A key point to remember is that there isn't a wrong way to join them, the resulting report will still be technically valid, but the data may not be represented (or show) the values that you are intending to show. Because we have each of the queries we need the raw data is available to produce our intended result.

Guidelines:

Start with the most-inclusive and most-detailed query and add to it. Additional filters can be added later to reduce the set of results being displayed. Start with the largest set of data and summarize/reduce the working set as you add more information.
Joins (outer/inner) are used to directly compare measures using a calculation (such as percent change or difference). They are not used to indirectly compare measures (such as grouping and plotting together on a graph). Unions are used to make indirect comparisons.
Outer joins are used to add additional reference information to your pivot (ie. adding cost of goods info to products that happen to have COG values).
Inner joins are used to intersect and add conditional information to your pivot (ie. only showing products that have both Sales and COG values).

Our example is quite simple in terms of joins, we are only going to use Union joins by creating a new column in our Average $ queries to replace the missing Product column and using it to label the rows as "AVG" and "AVG-PY".

If we were going to year-over-year differentials between products and averages we would have to change our joins as follows:

JOIN Sales $ (Previous Year) to Sales $ (Current Year) using an outer join on Product and Month
JOIN Average $ (Previous Year) to Average $ (Current Year) using an outer join on Month
UNION Sales $ (Year over year) to Average $ (Year over year) as normal

This would allow us to calculate the change in Sales $ from Previous year to Current year, and calculate the change in Average $ from Previous year to Current year and display both values on our chart.

Conclusion

This is an outline of my process when building complex reports in Cognos. As described above there are a few gaps in it, and I have only touched on some of the finer points of report development (joins, multiple pivots, and aggregations for example) but I have used it effectively, and through a series of workshops have taught it to both novice and experienced Cognos report developers so that they could apply it on their own and produce solid, maintainable reports that are currently being using in production within their own organizations.

I hope this can provide some small help to Cognos developers out there, and if you have any questions or need me to clarify or expand on any of my points in this article please feel free to contact me.

The first step is understanding which are the best visualization options to meet the business requirements. Often a table of numbers is what a business unit understands and asks for because they are used to dealing with operational reports where they need numbers. But sometimes a well-designed visualization of the data makes it easier to understand and gives the business the interpretation that they need in order to make a decision without having to spend time crunching numbers.

Visualization of a data set needs to be chosen carefully to properly communicate the information that needs to be understood. A poorly chosen visualization, especially if it is poorly documented, can provide confusing, useless, and sometimes misleading information.

A couple of my favourite visualizations that do an excellent job of communicating information are:

1) Florence Nightingale’s Diagram of the causes of mortality in the army in the East

http://upload.wikimedia.org/wikipedia/commons/1/17/Nightingale-mortality.jpg
http://en.wikipedia.org/wiki/File:Nightingale-mortality.jpg

The purpose of Nightingale’s chart was to illustrate that the primary causes of death amongst soldiers in the Crimean war were due to preventable diseases. The polar area diagram does a fantastic job of showing this, and by how much. As an aside, Nightingale could have exaggerated her diagram by choosing to use a polar radius diagram where the radial measurement is linear with the value instead of the area of the wedge. Since the eye naturally compares areas it would imply that the scale of deaths due to preventable causes was that much larger, but to her credit Nightingale strove for precise accuracy in her representations.

2) Charles Minard’s Flow map of Napoleon’s March

http://upload.wikimedia.org/wikipedia/commons/2/29/Minard.png
http://en.wikipedia.org/wiki/File:Minard.png

Minard’s graph combines a variety of pieces of information in a novel way, representing the course of Napolean’s march geographically, as well as including the number of soldiers on both the initial march and the retreat from Moscow and the successive losses incurred, but also the temperature on the return march showing the impact of the weather on casualties.

That being said, visualization is something that needs to be developed with cooperation from the business people who are going to be using it. Even if a 100% Stacked Bar Chart is a perfect representation of the information that needs to be communicated, it is worthless if the business users do not understand what it represents and how to use it, or are trying to interpret more information from the visualization than exists because of their preconceptions about what a bar chart is.

One example is the use of two types of bar charts: Stacked Bar Chart, and 100 Percent Stacked Bar Chart. The Stacked Bar Chart can often be confused with an Area Chart as the stacked bars are misinterpreted as “overlapping” by assuming the view is a projection of the normal Standard Bar Chart from the end. Likewise the 100 Percent Stacked Bar Chart is confused with the Stacked Bar Chart assuming that the height is representative of magnitude not share, this confusion arises because the business user does not see the 100 Percent Stacked Bar Chart as analogous to a series of Pie Charts.

Remember when designing charts for business use that creative interpretation can hinder clarity. A grid of Pie charts may feel clumsy, but may do a better job of communicating effectively with your business user.

Deciding

How do you decide when to use what kind of chart?

The first step in choosing an appropriate visualization is to understand what you are trying to communicate. The purpose of a chart is to convey information about some kind of relationship between pieces of data. There is always at least two pieces of information in a chart (otherwise it is a very boring chart) and the type of relationship between those pieces of information helps determine the most effective way to display it.

What kind of relationship do you want to illustrate?

Do you want to draw a comparative relationship?
Ex. How do the values of A compare to B, compare to C, over time?
Do you want to illustrate the existence of a relationship?
Ex. As the values of A increase, what happens to B or C?
Do you want to understand a composition, how pieces make up a whole?
Ex. How are the values of A broken down into groups B and C?
Do you want to show how values in a relationship are distributed?
Ex. How often does A occur by value?

One of the better examples I have seen on how to start understanding this process is by Extreme Presentation Method and their Chart Chooser (http://www.extremepresentation.com/design/charts/). This chooser presents a nice compact decision tree to understand how certain charts are best used to explain specific relationships. It is neither perfect, nor complete, but it is an excellent starting point.

Another good resource is the Periodic Table of Visualization Methods by Visual Literacy (http://www.visual-literacy.org/periodic_table/periodic_table.html#). This resource gives a wide spectrum of visualization options and groups them by What is being visualized (Data, Information, Concept, Strategy, Metaphor, Compound), whether the visualization is Process or Structure, whether the visualization is on the Overview or Detail level or both, and whether a visualization is geared towards Convergent or Divergent thinking. It does not do a good job of explaining when or how to use any particular visualization, but it is a nice complement to the Extreme Presentation decision tree by providing a bit of context around the purpose of certain visualizations.

Mincing Thoughts

Tuesday, April 02, 2013

Report Conceptualization Training Strategy – Part 2, Structured Report Planning

Monday, November 12, 2012

Report Conceptualization Training Strategy – Part 1, Report Visualization