Mincing Thoughts: 2013

Monday, November 25, 2013

Counterexample #1 - Performance Engineering of Healthcare.gov

For the past few months we have all had ringside seats to the spectacular failure of planning and communication that is Healthcare.gov - the personalized health insurance marketplace run by the United States Federal Government.

We now know that the project team and its managers were aware of problems with the application as early as March. That insufficient testing, evolving requirements, and performance were all contributors to the limitations that were seen at launch where the system could only handle 1,100 users per day. Considering that initial estimates were anticipating 50 to 60 thousand simultaneous users and in reality has been seeing upwards of 250,000 simultaneous users, this is a remarkable example of the impact of failing to engineer for performance.

On September 27th, four days before go-live, the Acting Directory of the CMS Office of Enterprise Management David Nelson wrote the following, illuminating, quote: "We cannot proactively find or replicate actual production capacity problems without an appropriately sized operational performance testing environment." By September 30th, the day before go-live another email : "performance degradation started when there were around 1,100 to 1,200 users".

The Catalogue of Catastrophe, a list of failed or troubled projects around the world has this to say about the project: "Healthcare.gov joins the list of projects that underestimated the volume of transactions they would be facing (see "Failure to address performance requirements" for further examples)."

If we example the list of Classic Mistakes as to why projects fail we can see that Healthcare.gov committed no less than 7 out of the top 10. Clay Shirky has written
a fabulous article titled Healthcare.gov and the Gulf Between Planning and Reality, that explains the scope and magnitude of failure of communication that occurred on this project, as well as the inherent flaw in the statement "Failure is not an option."

Thursday, September 05, 2013

Application Security – Authorization Layers in Spring Security

Formerly Acegi Security System for Spring, Spring Security is a powerful, flexible, and widely used framework for authentication and authorization in your Java applications. If you are just starting with Spring Security then the Spring Source¹ getting started documentation and tutorials are a great way to get your feet wet.

Once you understand the basics of how to implement a basic security framework and the wealth of options at your fingertips, the questions usually arise: “Which parts of this framework do I need to use?”, “What are they for?”, and “When do I need to use them?”.

For many applications there are 3 layers of authorization that we typically need to be concerned about when implementing Spring Security.

HTTP Request Authorization – verifying that a user is authenticated (if necessary) and authorized to access a specific URL.
Service Layer Authorization – verifying that a user is authorized to access a specific method, class, or service.
Component Authorization – verifying that a user is authorized to see or use a specific component, operation, logic, or data.

These are the core components of the Spring Security framework and together are sufficient to provide reasonably complete authorization control for your application. Each layer serves a specific purpose and works best for that purpose. Attempting to shoe-horn all your authorization components into a single layer, using them to do more than they are intended to do will cause needless complication.^2,3

HTTP Request Authorization

The basic tutorial example for security-app-context.xml^4,5

<beans:beans xmlns:beans="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://www.springframework.org/schema/security"
    xsi:schemalocation="http://www.springframework.org/schema/beans 
        http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
        http://www.springframework.org/schema/security 
        http://www.springframework.org/schema/security/spring-security-3.1.xsd">

    <http use-expressions="true">
        <intercept-url access="permitAll" pattern="/index.jsp">
        <intercept-url access="hasRole('supervisor')" pattern="/secure/extreme/**">
        <intercept-url access="isAuthenticated()" pattern="/secure/**">
        <intercept-url access="isAuthenticated()" pattern="/listAccounts.html">
        <intercept-url access="isAuthenticated()" pattern="/post.html">
        <intercept-url access="denyAll" pattern="/**">
        <form-login>
        <logout>
    </logout></form-login></intercept-url>
    </intercept-url></intercept-url></intercept-url>
    </intercept-url></intercept-url>
    </http>

    <authentication-manager>
        <authentication-provider>
            <user-service>
                <user authorities="supervisor, teller, user" name="rod" password="koala">
                <user authorities="teller, user" name="dianne" password="emu">
                <user authorities="user" name="scott" password="wombat">
                <user authorities="user" name="peter" password="opal">
            </user></user></user></user></user-service>
        </authentication-provider>
    </authentication-manager>
</beans:beans>

The basic example provides a simple template for setting up user accounts, roles, and permissions based on URL patterns in your application. Although most real-world implementations will replace the authentication-provider due to the limitations of the example, the intercept-url example is reasonable to use with almost any framework that provides different views based on the provided URL.

Purpose

The primary focus of the HTTP Request Authorization layer is to provide catch-all security for your application to prevent unauthorized users from directly linking to, and accessing functions that they are not allowed to access. This removes the necessity of adding custom authentication code to every page of your application (depending on your framework and architecture) and gives you a universal way to limit the severity of access/authentication defects by forgetting to include or making mistakes with your authentication code.

Limitations

The usefulness of this layer drops dramatically as application complexity increases and each distinct URL provides a wealth of functions to the user. Monolithic application frameworks that are built entirely around a single URL may only find the basic authentication service useful, whereas applications designed to segment functionality into different URLs by role will get the most value out of it.

Service Layer Authentication

The basic tutorial example for security annotations in classes and methods:³

public interface BankService {
    public Account readAccount(Long id);
    public Account[] findAccounts();

    @PreAuthorize(
            "hasRole('supervisor') or " +
            "hasRole('teller') and (#account.balance + #amount >= -#account.overdraft)" )
    public Account post(Account account, double amount);
}

The basic example demonstrates annotating a method with a preauthorize Spring EL expression. This provides a powerful framework to provide complex security rules around both methods and classes and ensure your service operations are secure.

Purpose

The primary purpose of Service Layer Authentication using annotations or interceptors is to safeguard access to services or operations that should only be accessed by certain roles. This allows you to ensure that only administrators can access administrative functions, read-only users cannot access write operations, and to mitigate the chance that coding mistakes may provide accidental access to services and operations that a role should not have access to. It is best used as a safeguard to prevent unintentional access to sensitive services.

Limitations

Due to the nature of the class and method annotations, Service Layer Authentication does not provide a useful interface into the visibility of the services it protects. It provides reactive security to negate attempts to access a service, it does nothing to provide proactive information about which roles can access the service. Common questions about Service Layer Authentication often ask about how to catch the security exceptions that occur or use the annotations to make control-flow decisions^6,7. The answer to those questions is complicated, but more importantly it should be irrelevant. This layer is not intended to provide information to make those decisions, and if the application is built well it should never be visible to the user. It is best used only as a safeguard to avoid the consequences of mistakes made in the HTTP Request Authentication, and the Component Authorization layers.

Component Authorization

An example of JSP Taglib security:⁸

<security:authorize ifAnyGranted="ROLE_ADMIN">
    <tr>
    <td colspan="2">
        <input type="submit" value="<spring:message code="label.add"/>"/>
    </td>
    </tr>
</security:authorize>

An example of inline security:⁹

Authentication auth = SecurityContextHolder.getContext().getAuthentication();
if (auth != null) {
   if (auth.getPrincipal() instanceof UserDetails) {
      report.setUser(new User(((UserDetails) auth.getPrincipal()).getUsername()));
   } else {
      report.setUser(new User(auth.getPrincipal().toString()));
   }
}

These examples demonstrate two quick methods of using Component Authorization, through the use of the Spring Security JSP Taglib and using the Spring Security java API.

Purpose

This layer provides component-level security and allows you to make control flow decisions based on role. It is the connecting layer between the page-based HTTP Request Authorization layer and the method and class level of the Service Layer Authentication that is vital for any application that provides heavyweight or multi-function URLs. This is the developer’s security layer that allows you to turn on and off components or make decisions at any point in your code to provide access to specific functions, links, or workflows.

Limitations

Using Component Authorization is repetitive and requires an intimate understanding of which roles have access to which operations and when. It is not optimal to use to provide page based security and basic authentication, because that is better handled with the HTTP Request Authorization layer which is easier, universal, and more reliable. It is not optimal to provide class a method layer security, because that is better handled with the Service Layer Authentication which can annotate interfaces, abstract classes, and interceptors and does not require as much repetition or context-related knowledge to be applied effectively.

Final Thoughts

Spring Security is a useful and powerful tool, but it is best used when each type of security layer it provides is used effectively and for the purpose that it was designed. A carefully considered multi-prong approach to securing your application will provide a simpler, more elegant, and more secure solution.

References

1 http://static.springsource.org/spring-security/site/start-here.html

2 http://stackoverflow.com/questions/8742842/how-to-handle-accessdeniedexception-in-spring-security

3 http://stackoverflow.com/questions/16071069/propagating-accessdeniedexception-in-spring-security

4 http://static.springsource.org/spring-security/site/tutorial.html

5 http://www.disasterarea.co.uk/blog/protecting-service-methods-with-spring-security-annotations/

6 http://stackoverflow.com/questions/8347151/spring-aop-and-exception-intercepting

7 http://stackoverflow.com/questions/5094449/spring-exception-handler-to-not-handle-certain-types-of-exception

8 http://howtodoinjava.com/2013/04/18/spring-security-at-view-layer-using-jsp-taglibs/

9 http://grzegorzborkowski.blogspot.ca/2008/10/spring-security-acl-very-basic-tutorial.html

Wednesday, August 14, 2013

Agile Performance Testing

Performance Testing is an often neglected component of application development and system replacement projects. It is often ignored or relegated to the end of the project for a period of “performance tuning activities” in favor of functional development. And when a project starts to go off the tracks, targets get pushed, then QA and time allocated for tuning gets cut.

Why does this legacy of waterfall planning continue to exist within an Agile world?

Waiting until a system is functionally complete to start performance testing will not save money and increases risk that can be mitigated by incorporating it into your cycle of sprints.

Failure to test the performance of your system can mean risking an underbuilt environment leading to delays, downtime, and unhappy users. When the system is up and running, it may be slow, unresponsive, and extended queues can infuriate users.

Performance testing at the end of the project during stabilization means you may have time to build out your environment to meet initial demand, but specific performance problems due to poor code, or inefficient architecture will not have time to be resolved unless go-live is delayed. Performance testing too late in the project risks sluggish performance and intolerable wait times for some operations and can mean dissatisfied users and cycles of emergency patches to improve performance.

Performance testing can be accomplished in an agile project by incorporating it as part of the agile process and be willing to prioritize it appropriately.

Effective performance testing is something that is planned for and included starting from inception of the project and is part of a continuous cycle of QA. Every story must include as part of its QA acceptance a set of performance metrics that it must meet before it can be marked as complete.

The standard story card “As an xxxx I want yyyy so that zzzz” defines the typical requirements of a story and are generally further defined by acceptance criteria (“x is working”, “x cannot do y without doing z”, “x is stored in y for use elsewhere”).

Acceptance criteria has the following benefits:

The get the team to think through how a feature or piece of functionality will work from the user’s perspective
They remove ambiguity from requirements
They form the tests that will confirm that a feature or piece of functionality is working and complete.

But acceptance criteria generally only defines functional acceptance. You will rarely see acceptance criteria such as the following: “x must return results in less than y seconds when the server is under z load 19 times out of 20 and less than u seconds when the server is under w load 18 times out of 20.”

A good performance testing plan will define:

The performance criteria that the system is required to meet
An explanation of how the performance criteria will be measured and how it matches against business objectives
Remediation steps to explain how failures will be prioritized, handled, and resolved.

A team that has a solid grasp of the importance of system performance can incorporate performance testing tasks and remediation into an agile project by defining performance objectives, systematically evaluating the system, and defining failures as remediation stories that get fed into the backlog and prioritized.

A common issue with many development teams is a shortage of resources with the depth of experience to conduct effective and efficient performance testing. One option to consider is hiring a team with the knowledge and specialization to analyze, manage, educate and implement a performance testing plan in a cost-effective manner.

The MNP Performance Testing team has the tools and experience to conduct thorough performance analysis while integrating seamlessly within the Agile process on projects both large and small.

Tuesday, June 11, 2013

On Problem Solving

Earlier this year I had the opportunity to attend an excellent workshop on Problem Solving given by Peter de Jager. Myself coming from a Mathematics and Computer Science background I have sat through many lectures on problem solving techniques, how to deconstruct a problem into its’ component parts, how to formalize statements using first-order logic, etc. Although I enjoy those subjects, I was pleasantly surprised that Peter took a very different approach to his seminar. He took a hands on approach by giving us dozens of physical and mental problems throughout the day to challenge us while illustrating specific points about the process of problem solving. In the end what we took away from the session didn’t include specific tools on how to solve a type of problem, it was a deeper understanding of problems in general, why they can occur, and new perspectives on how to solve problems and pre-emptively warding them off before they can start.

The key to being a more effective problem solver in everyday life lies in understanding ourselves and human nature in general. The problem is literally one of perspective.

This is certainly not a new idea and our culture is littered with references attempting to express this fact “Thinking outside the box”, “Can’t see the forest for the trees”, “Too close to the problem - What’s the bigger picture?” even the Socratic’ Method outlined in Plato’s dialogues in the 5^th century BC is all about perspective and that the key to understanding a problem is to change your perspective, question your assumptions, modify your view of it.

One of the key tenants is understanding that you bring a lifetime of preconceptions with you when you are looking at a problem and questioning those preconceptions, modifying them is a fundamental skill.

Over the course of the day Peter led us through examples and problems of perspective, with tidbits of wisdom, real-world examples, common misconceptions, and traps to avoid.

A tool to help with problem solving is understanding labels. Labels are an essential tool for communication, when I refer to “a nail” it immediately conjures up all the things that a nail is and what it is used for. However when we need to solve a problem, labels can be a hindrance because they come loaded with preconceptions. Sometimes we are so locked into that perspective that we cannot think of other ways a nail can be seen or used. If I called a nail an “awl”, or an “icepick”, or a “model train bridge support strut” or even “sharp metal cylinder with a lip” it suddenly becomes an entirely different object. Labels are powerful, and sometimes we need to understand all the properties and attributes of an object independent of its’ label.

My favourite example that was given of this power was the story of a group of Christian people attempting and failing to hang a small crucifix on a wall with a nail. During this process they tried every conceivable method and object at their disposal to secure the nail to the wall, except for the single object that they had that was the same size, weight, and approximate shape of a hammer. The cross. To this group of people the crucifix had such importance, such value that it was impossible for them to conceive of it as a hammer on their own.

The last topic I will bring up was arguably the most important and useful in everyday life of everything that was covered. Consider the statement “People are resistant to change” do you agree or don’t you? It’s something we hear often, people don’t like change, change is frightening, there’s even a whole management discipline called “change management”. But is it true?

Make yourself a list of checkmarks for every time you have:

1 for each time you’ve gone to a new school
1 for each time you’ve taken a trip more than 1 hour away
1 for each time you’ve gotten a new pet
1 for each time you’ve moved
1 more for each time you’ve moved to a new city
5 for a new country
1 for each time you’ve started a relationship
1 for each time you’ve broken up a relationship
2 for each time you’ve gotten married
10 if you (intentionally) decided to have children
5 more for any subsequent children
Have you bought a car, gotten a new job, been promoted, changed positions in your job, switched banks, invested, flown on a plane, done an extreme or potentially dangerous sport.

Do you think you really resist change? Do you really think everyone else is much different than you?

These are all really big changes that are taken on willingly and in many cases enthusiastically. So why is there the misconception that people resist change?

The key is that people don’t resist change, we embrace change when we choose it. We change happily and often when we decide that the change is right for us. This is the essence of change management: answering the question “Why?”.

If you want to induce change in others you want them to choose the change in order to be successful. Change by dictatorship rarely works well and is never easy (or well received). If you want to convince someone to embrace a change there are 7 key questions that they will have (that they might not even know that they have) that if you can answer for them effectively will go a long way towards your change being successful.

Why?
What’s in it for me?
Monday – what am I going to start/need to do differently on Monday?
What might go wrong?
What won’t change?
What will go wrong or be difficult?
Signposts - How will you measure progress towards the change?

Succinct answers to these questions can prevent many problems that may occur that lend credence to the statement “People resist change”. Answering them won’t guarantee success, or even approval, but without those answers change is far more likely to fail.

Tuesday, April 23, 2013

Hurdles #3 - Apache Pivot - Finding Request IP Address

Hurdles article number and still working with Apache Pivot, this time we are looking at the RESTful services provided by the Pivot libraries. The Pivot WebService libraries are quite handy, they provide a nice, clean wrapper to work with that abstracts out a lot of the complexity around serialization, HTTPServlet objects, and HTTPRequest objects.

Unfortunately there is one, nasty flaw with the v2.0.2 libraries that I have come across so far. And that is the HTTPRequest object has been so thoroughly abstracted away that you cannot access it, or much of the information included within it directly.

The provided QueryServlet implementation discards information such as: the remoteAddr and remoteHost values. So there is no way to retrieve the requesting IP address if you want to do simple things like validating the source request location.

The reason I discovered this issue in the first place is that I wanted to add a nice little authenticated cache component to my service. Nothing complex, I have already established a verified sessionkey, but in order to negate simple man-in-the-middle attacks that involve sessionkey stealing I wanted to add a secondary validation step that associated a sessionkey to a source IP address. Like I said, very simple (and certainly not foolproof), but my application doesn't involve national security. It's just a quick two-factor authentication routine is all I wanted.

Well, no such luck. In order to resolve this issue I had to resort to extending the org.apache.pivot.web.server.QueryServlet class with my own modifications to the service method and then extend that class with my actual Servlet implementations.

Fortunately GrepCode is kind enough to provide us with the source of the QueryServlet class to use as a starting point. So here is a QueryServletExt class that I created that parses the remoteAddr and remoteHost values into properties that can then be accessed via getRemoteHost and getRemoteAddr methods.

public abstract class QueryServletExt extends QueryServlet {
private transient ThreadLocal<String> remoteAddr = new ThreadLocal<String>();
private transient ThreadLocal<String> remoteHost = new ThreadLocal<String>();

@Override
@SuppressWarnings("unchecked")
protected void service(HttpServletRequest request, HttpServletResponse response)
throws IOException, ServletException {
// Get client's IP address
String ipAddress = request.getRemoteAddr();
remoteAddr.set(ipAddress);
// Get client's hostname
String hostname = request.getRemoteHost();
remoteHost.set(hostname);
super.service(request, response);
}

public String getRemoteAddr() {
return remoteAddr.get();
}

public String getRemoteHost() {
return remoteHost.get();
}
}

This piece of code doesn't bring all the attributes of the HTTPRequest into your QueryServlet, but at least it will provide a template for how to include whatever pieces you do need in your code.

Tuesday, April 02, 2013

Report Conceptualization Training Strategy – Part 2, Structured Report Planning

For beginning and advanced report developers alike, the biggest challenge I have witnessed is the challenge of Report Conceptualization. This is the process of translating often incomplete business requirements into a structured plan that will produce the result that the business needs. For operational reports based on known data structures and calculations this process can be simple, often operational reports are structured very simply and the job of the report developer is to simply put the right fields in the right place. The process becomes much more difficult for strategic reports which attempt to help the business define what they should be doing. Strategic reports are often poorly understood by the business, their needs are uncertain and difficult to communicate, and vision for the end product is amorphous. Taking vague, conceptual requirements into a concrete view of business data is a challenge for developers and analysts alike.

Recently I have been mentoring two Cognos Report Developers in their process. These developers are experienced at building operations-level reports in Cognos and have a good understanding of the data and the business, but they are challenged at providing insightful analysis to the business when the business users themselves only have vague ideas of what they want to see, or in some cases have too many ideas that all jumble together.

The first step in this conceptualization process was to focus on a particular element, a decision, or a comparison that the business needs to make and determine an appropriate way to visualize the data that is needed. This was covered in Part 1, Report Visualization.

Once a vision has been established, the second step is to plan the underlying structure of the report so that the end result is a simple data structure that can be directly mapped to the visualization. This is the process I use for breaking down a difficult report into simple steps, and have trained numerous developers who use it successfully.

The Process

We will start with the user story:

“As a Product Manager I need to know year-over-year sales trends of my products to determine which products I should assign to my premium display space for each retailer.”

Based on the story we have done further analysis and come up with a detailed business requirement. The business user needs a line graph that shows monthly sales of the product lines and products they select for the current year and the previous year. The graph also needs to show the average sales of the selected products. The graph should only include sales from selected retailers.

A mock-up of the chart is given below.

Identify Your Measures

Measures are all numerical fields that can be aggregated or calculated and that are connected to dimensions. These are generally (in Cognos) located in the Fact table, but for the purposes of this process also includes deduced values such as counts or distinct counts of measures. Ie count(distinct product_id) is a measure that can be connected to other dimensions through an appropriate fact table, this produces the deduced value “Count of products for each value of dimension X where sales exist.”

In our above example our only measure is “Sales $”.

Identify Your Summarizations/Calculations

List all unique summarizations of measures alongside your measure list. These will be treated as separate measures that may or may not be able to be grouped with the list of identified measures so far. Remember that your measures will already be summarized according to their default (usually total) aggregation, you need to identify any alternate summarization such as “average”, “max”, “min”, “count” etc in this list.

Calculations are any formula that combines measures (or applies to a single identified measure). Add these alongside your measure list.

In this example we also have “Average $” as a summarization of all products.

Identify Your Aggregations/Groups

Aggregations and groups are any dimension, attribute, column, or field that the defined measures are grouped by (sums, counts, averages, etc). These groups I am calling descriptors, because the column or attribute describes an aggregation, and any particular value (ie. “Product A”) describes the values associated with it.

In our example the aggregations are: “Product”, “Calendar Month”, “Calendar Year”.

Identify Your Filters/Restrictions

Filters and restrictions are any condition applied to the data that reduces the data set. These are usually another set of descriptors (dimensions, attributes, columns, or fields) that are limited or selected in some fashion, and usually overlap in some fashion with the already defined aggregations/groups. There may be additional filters/restrictions that are not represented in the list of descriptors so far, and there may be value filters applied that are based on aggregations of the defined measures.

In our example the filters are: “Product Line – by selection”, “Product – by selection”, “Calendar Year = Current Year”, “Calendar Year = Previous Year”, "Retailer - by selection".

Build a Mapping Table
Create a table and label each column with your Measures, Summarizations, and Calcuations and label each row with your Descriptors, Filters, and Restrictions. Place a mark in each cell where the Descriptor will be used to group or filter the Measure

Our example should produce a table like this:

	Sales $	Average $
Product	X
Calendar Month	X	X
Calendar Year	X	X
Filter - Product Line	X	X
Filter - Product	X	X
Filter - Calendar Year = Current Year	X	X
Filter - Calendar Year = Previous Year	X	X
Filter - Retailer	X	X

Note that Average $ is not grouped by Product but is impacted by the Filter - Product, this is because the Average $ calculation will be performed across all products in the filtered list.

Choose Your Pivot Descriptor

The Pivot is a critical component in the analysis, it is the descriptor that is at the core of the visualization. Generally you can find it at the centre of the business requirement, it is the thing that is being reported on, and that decisions are being made about. The purpose it serves is that all other pieces of data in the analysis are connected to it and revolve around it.

This pivot will generally be used as the join key in any inner/outer joins that will be created in this report, it may not be the only item used in a join, but it will be the common thread that is used in (nearly) every join.

On simpler reports it may be unclear which descriptor is actually serving as pivot, and this discussion often centers around the calendar, whether the date is the pivot object, or whether something else (like product or location) is it. The easiest way to decide is simply asking why are you using the report? What are you reporting on? Are you reporting on Sales by Month (but shown by product)? Or are you really concerned about Product Sales (but shown by month/year)?.

Another quick way to sometimes decide which descriptor is acting as pivot (not always) is whether or not it is being shown on the report in two or more different ways (or distinct sets)? If it is, then it’s probably not your pivot. You can also look at filters, which descriptors are being filtered by the business user’s choice? These are good candidates for the Pivot.

In the example above your Pivot is probably “Product”. The other option would be “Month”, but there are a couple of reasons why this is probably not the case. 1) Product is part of a user-driven filter, the business user is choosing which product lines/products to see and is thus reporting on the sales of particular products, not sales of particular months. 2) Calendar is being included in two distinct sets, Previous Year and Current Year, this is not a hard-and-fast rule but it can imply that this is not the central piece of the view but it is reinforces reason #1.
3) Although Average $ is not being grouped by the Product, it is performing a summary of Products and in the defined report the Average is being displayed in the same manner as a Product.

Group Descriptors into Hierarchies

Identify descriptors that are related to one another in a natural or structured hierarchy, these can be easily collected as “summary” and “detail” components of a single query.

In the example the natural hierarchies are “Month, Year, Filter - Current Year”, and “Month, Year, Filter - Previous Year”. Since month is included in two separate hierarchies it is the natural joining point to connect data from the “Current Year” hierarchy and the “Previous Year” hierarchy.

If we wanted to also display the Product Line on our report we would add it as a Descriptor and it would form a third hierarchy "Product Line, Product" but for simplicity we will leave it out.

Identify Independent or Disjoint Sets

Identify filter or restriction combinations that are independent of one another. These are filter combinations that are mutually exclusive or partially overlapping, and that if both were included in the same query would produce no results or fewer rows than intended.

Our example has the mutually exclusive "Filter - Current Year" and "Filter - Previous Year". If both filters were included in the same query, this would result in no data because a row cannot be both in the Current Year AND the Previous Year.

You may note that in this situation both could be combined into the same query using an OR condition, and due to the simplicity of this example that is true. This process is not intended to produce the most optimal or shortest path to a result, but is intended to produce a consistent, repeatable, and understandable path to a result. The counter-argument to using an OR clause is that if we decided to replace "Current Year" and "Previous Year" with arbitrary (potentially overlapping) date ranges, then using an OR clause would no longer be possible and at least 2 new queries would need to be created to perform the enhancement.

Group Measures/Calculations into Queries based on Descriptor Map

First lets review our updated mapping table.

	Sales $	Average $
Product	X
¹Calendar Month	X	X
- Calendar Year	X	X
- Filter - Calendar Year = Current Year	X	X
²Calendar Month	X	X
- Calendar Year	X	X
- Filter - Calendar Year = Previous Year	X	X
Filter - Product Line	X	X
Filter - Product	X	X
Filter - Retailer	X	X

1 & 2 - Independent sets

First rule of grouping into queries: If 2 measures have exactly the same mapping (and in Cognos they must come from the same Fact table) then they can be combined into the same queries.

Second rule: Split independent sets into different queries, and add all unrelated descriptors into both queries.

Third rule: Identify each combination of measures (with different mappings) and split descriptors as a separate query.

In our example we will produce 4 queries:

Sales $ [Product, Month/Year/Filter - Current Year, Filter - Product Line, Filter - Product, Filter - Retailer]
Sales $ [Product, Month/Year/Filter - Previous Year, Filter - Product Line, Filter - Product, Filter - Retailer]
Average $ [Month/Year/Filter - Current Year, Filter - Product Line, Filter - Product, Filter - Retailer]
Average $ [Month/Year/Filter - Previous Year, Filter - Product Line, Filter - Product, Filter - Retailer]

Combine Queries by Joining on Pivot

With each query defined, the only step left is to join them together in a fashion that produces our report. There are a couple of factors that go into deciding how to join queries together. A key point to remember is that there isn't a wrong way to join them, the resulting report will still be technically valid, but the data may not be represented (or show) the values that you are intending to show. Because we have each of the queries we need the raw data is available to produce our intended result.

Guidelines:

Start with the most-inclusive and most-detailed query and add to it. Additional filters can be added later to reduce the set of results being displayed. Start with the largest set of data and summarize/reduce the working set as you add more information.
Joins (outer/inner) are used to directly compare measures using a calculation (such as percent change or difference). They are not used to indirectly compare measures (such as grouping and plotting together on a graph). Unions are used to make indirect comparisons.
Outer joins are used to add additional reference information to your pivot (ie. adding cost of goods info to products that happen to have COG values).
Inner joins are used to intersect and add conditional information to your pivot (ie. only showing products that have both Sales and COG values).

Our example is quite simple in terms of joins, we are only going to use Union joins by creating a new column in our Average $ queries to replace the missing Product column and using it to label the rows as "AVG" and "AVG-PY".

If we were going to year-over-year differentials between products and averages we would have to change our joins as follows:

JOIN Sales $ (Previous Year) to Sales $ (Current Year) using an outer join on Product and Month
JOIN Average $ (Previous Year) to Average $ (Current Year) using an outer join on Month
UNION Sales $ (Year over year) to Average $ (Year over year) as normal

This would allow us to calculate the change in Sales $ from Previous year to Current year, and calculate the change in Average $ from Previous year to Current year and display both values on our chart.

Conclusion

This is an outline of my process when building complex reports in Cognos. As described above there are a few gaps in it, and I have only touched on some of the finer points of report development (joins, multiple pivots, and aggregations for example) but I have used it effectively, and through a series of workshops have taught it to both novice and experienced Cognos report developers so that they could apply it on their own and produce solid, maintainable reports that are currently being using in production within their own organizations.

I hope this can provide some small help to Cognos developers out there, and if you have any questions or need me to clarify or expand on any of my points in this article please feel free to contact me.

Monday, January 21, 2013

Hurdles #2 - Apache Pivot - Finding Named Objects

My second Hurdles article is continuing with Apache Pivot, and the topic of this article is finding named objects which means if there is a container or a control somewhere in your window that has a name or bxml:id set, how can I find a reference to that object using the identifying attribute?

My specific example is that I have a Dialog object that contains a TextInput. When the Dialog is closed I want to find that TextInput control, read the Text value that has been provided, and act on it. To do this I have a very simple setup, inside my dialog I have created a TablePane to structure my layout and within the TablePane I have a Label, the TextInput (with an attached validator) and two buttons, a "Submit" button that on a ButtonPressed event does "dialog.close(true)" and a "Cancel" button that does "dialog.close(false)". I have also configured a DialogCloseListener in code that will process the close event, check to see if the Dialog has a result, and perform an action with the TextInput value.

I was eventually able to find two solutions to this problem, the preferred solution would depend on the situation and specific implementation, but I will present both solutions here. There may be additional solutions to this particular problem that I am not aware of, but my goal here was to get an object reference with minimal code and in a generic fashion.

Option 1: Named Component Traversal
Unfortunately in Apache Pivot container tree traversal is not as natural, convenient, or consistent as I expected. It certainly is not as powerful as an XML DOM parser or as a Java File object. Unless you are using position-based object location, the component traversal has a few requirements:

Every component in the XML tree must have a name attribute set (although name is not a required attribute)
The name attribute must be unique among the set of children for a common parent
Each node must be traversed in sequence from parent to child to find the intended descendant, there does not appear to be any kind of path-definition or recursive lookup available
getNamedComponent returns a Component object which does not have getNamedComponent as a method. This method is in the Container subclass of Component so each traversal step requires at least a Cast operation. Because there does not appear to be any kind of "getAllChildren" method, I do not know if there is any way to do a tree exploration or blind traversal (which would require reflection as well as a Cast operation)

So given the following BXML structure:

<Dialog bxml:id="dialog" title="Dialog" modal="true"
    xmlns:bxml="http://pivot.apache.org/bxml"
    xmlns="org.apache.pivot.wtk">
    <TablePane name="table">
        <columns>
            <TablePane.Column width="1*"/>
        </columns>

        <TablePane.Row height="1*">
            <Label text="Enter number:"
                styles="{horizontalAlignment:'center', verticalAlignment:'center'}"/>
            <TextInput text="0" name="numberInput" bxml:id="numberInput">
                <validator>
                    <IntValidator xmlns="org.apache.pivot.wtk.validation"/>
                </validator>
            </TextInput>
        </TablePane.Row>

        <TablePane.Row height="-1">
            <PushButton buttonData="Submit"
                ButtonPressListener.buttonPressed="dialog.close(true)"/>
            <PushButton buttonData="Cancel"
                ButtonPressListener.buttonPressed="dialog.close(false)"/>
        </TablePane.Row>
    </TablePane>
</Dialog>

The code required to locate the "numberInput" TextInput may look something like the following:

dialog.open(window,
   new DialogCloseListener() {
       public void dialogClosed(Dialog arg0, boolean arg1) {
           if(arg0.getResult()) {
               TablePane tp = (TablePane)arg0.getNamedComponent("table");
               TextInput ti = (TextInput)tp.getNamedComponent("numberInput")
               System.out.println(ti.getText());
           }
       }
   });

Option 2: BXMLSerializer Lookup
The BXMLSerializer approach is the polar opposite of the traversal approach. This approach also has a uniqueness constraint aspect to it but it is supported by the framework because violation of this constraint will result in a SerializationException being thrown.

The BXMLSerializer requires that your target component has a bxml:id attribute set. All components with a bxml:id attribute get deposited into the Namespace map of the definition file that was processed by the Serializer. However it requires that a reference to the BXMLSerializer instance that was used to parse the BXML file must be kept, and it also must be accessible to the appropriate Handler/Listener that needs to use it.

Taking the example BXML file in Option 1 the following code could be used to access the TextInput control:

private BXMLSerializer bxmlSerializer;

public void startup(Display display, Map<String, String> properties)
        throws Exception {
        bxmlSerializer = new BXMLSerializer();

        Dialog dialog = (Dialog)bxmlSerializer.readObject(Main.class, "bxml/dialog.bxml");
        dialog.open(window,
            new DialogCloseListener() {
                public void dialogClosed(Dialog arg0, boolean arg1) {
                    if(arg0.getResult()) {
                        TextInput ti = (TextInput)bxmlSerializer.getNamespace().get("numberInput");
                        System.out.println(ti.getText());
                    }
                }
            });
    }

Note that in this sample there is no hierarchy connection between the Dialog itself and the "numberInput" control, however Pivot provides a convenient way to reverse the process as it provides both "getAncestor" and "getParent" methods in the Component class that allow quick traversal up the tree once you have figured out how to get the child.

If you have an alternate method to access an arbitrary component within a window that is an improvement to any of the methods described here, please send me an email. My approaches described above were learned through trial and error because specific documentation on how to do this was lacking online and if there are any better approaches I will post them here as a follow-up.