Showing posts with label Training. Show all posts
Showing posts with label Training. Show all posts

Thursday, September 05, 2013

Application Security – Authorization Layers in Spring Security

Formerly Acegi Security System for Spring, Spring Security is a powerful, flexible, and widely used framework for authentication and authorization in your Java applications. If you are just starting with Spring Security then the Spring Source1 getting started documentation and tutorials are a great way to get your feet wet.

Once you understand the basics of how to implement a basic security framework and the wealth of options at your fingertips, the questions usually arise: “Which parts of this framework do I need to use?”, “What are they for?”, and “When do I need to use them?”.

For many applications there are 3 layers of authorization that we typically need to be concerned about when implementing Spring Security.
  1. HTTP Request Authorization – verifying that a user is authenticated (if necessary) and authorized to access a specific URL.
  2. Service Layer Authorization – verifying that a user is authorized to access a specific method, class, or service.
  3. Component Authorization – verifying that a user is authorized to see or use a specific component, operation, logic, or data.
These are the core components of the Spring Security framework and together are sufficient to provide reasonably complete authorization control for your application. Each layer serves a specific purpose and works best for that purpose. Attempting to shoe-horn all your authorization components into a single layer, using them to do more than they are intended to do will cause needless complication.2,3

HTTP Request Authorization
The basic tutorial example for security-app-context.xml4,5

<beans:beans xmlns:beans="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://www.springframework.org/schema/security"
    xsi:schemalocation="http://www.springframework.org/schema/beans 
        http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
        http://www.springframework.org/schema/security 
        http://www.springframework.org/schema/security/spring-security-3.1.xsd">

    <http use-expressions="true">
        <intercept-url access="permitAll" pattern="/index.jsp">
        <intercept-url access="hasRole('supervisor')" pattern="/secure/extreme/**">
        <intercept-url access="isAuthenticated()" pattern="/secure/**">
        <intercept-url access="isAuthenticated()" pattern="/listAccounts.html">
        <intercept-url access="isAuthenticated()" pattern="/post.html">
        <intercept-url access="denyAll" pattern="/**">
        <form-login>
        <logout>
    </logout></form-login></intercept-url>
    </intercept-url></intercept-url></intercept-url>
    </intercept-url></intercept-url>
    </http>

    <authentication-manager>
        <authentication-provider>
            <user-service>
                <user authorities="supervisor, teller, user" name="rod" password="koala">
                <user authorities="teller, user" name="dianne" password="emu">
                <user authorities="user" name="scott" password="wombat">
                <user authorities="user" name="peter" password="opal">
            </user></user></user></user></user-service>
        </authentication-provider>
    </authentication-manager>
</beans:beans>

The basic example provides a simple template for setting up user accounts, roles, and permissions based on URL patterns in your application. Although most real-world implementations will replace the authentication-provider due to the limitations of the example, the intercept-url example is reasonable to use with almost any framework that provides different views based on the provided URL.

Purpose
The primary focus of the HTTP Request Authorization layer is to provide catch-all security for your application to prevent unauthorized users from directly linking to, and accessing functions that they are not allowed to access. This removes the necessity of adding custom authentication code to every page of your application (depending on your framework and architecture) and gives you a universal way to limit the severity of access/authentication defects by forgetting to include or making mistakes with your authentication code.

Limitations
The usefulness of this layer drops dramatically as application complexity increases and each distinct URL provides a wealth of functions to the user. Monolithic application frameworks that are built entirely around a single URL may only find the basic authentication service useful, whereas applications designed to segment functionality into different URLs by role will get the most value out of it.

Service Layer Authentication
The basic tutorial example for security annotations in classes and methods:3

public interface BankService {
    public Account readAccount(Long id);
    public Account[] findAccounts();

    @PreAuthorize(
            "hasRole('supervisor') or " +
            "hasRole('teller') and (#account.balance + #amount >= -#account.overdraft)" )
    public Account post(Account account, double amount);
}    

The basic example demonstrates annotating a method with a preauthorize Spring EL expression. This provides a powerful framework to provide complex security rules around both methods and classes and ensure your service operations are secure.

Purpose
The primary purpose of Service Layer Authentication using annotations or interceptors is to safeguard access to services or operations that should only be accessed by certain roles. This allows you to ensure that only administrators can access administrative functions, read-only users cannot access write operations, and to mitigate the chance that coding mistakes may provide accidental access to services and operations that a role should not have access to. It is best used as a safeguard to prevent unintentional access to sensitive services.

Limitations
Due to the nature of the class and method annotations, Service Layer Authentication does not provide a useful interface into the visibility of the services it protects. It provides reactive security to negate attempts to access a service, it does nothing to provide proactive information about which roles can access the service. Common questions about Service Layer Authentication often ask about how to catch the security exceptions that occur or use the annotations to make control-flow decisions6,7. The answer to those questions is complicated, but more importantly it should be irrelevant. This layer is not intended to provide information to make those decisions, and if the application is built well it should never be visible to the user. It is best used only as a safeguard to avoid the consequences of mistakes made in the HTTP Request Authentication, and the Component Authorization layers.

Component Authorization
An example of JSP Taglib security:8

<security:authorize ifAnyGranted="ROLE_ADMIN">
    <tr>
    <td colspan="2">
        <input type="submit" value="<spring:message code="label.add"/>"/>
    </td>
    </tr>
</security:authorize>

An example of inline security:9

Authentication auth = SecurityContextHolder.getContext().getAuthentication();
if (auth != null) {
   if (auth.getPrincipal() instanceof UserDetails) {
      report.setUser(new User(((UserDetails) auth.getPrincipal()).getUsername()));
   } else {
      report.setUser(new User(auth.getPrincipal().toString()));
   }
}

These examples demonstrate two quick methods of using Component Authorization, through the use of the Spring Security JSP Taglib and using the Spring Security java API.

Purpose
This layer provides component-level security and allows you to make control flow decisions based on role. It is the connecting layer between the page-based HTTP Request Authorization layer and the method and class level of the Service Layer Authentication that is vital for any application that provides heavyweight or multi-function URLs. This is the developer’s security layer that allows you to turn on and off components or make decisions at any point in your code to provide access to specific functions, links, or workflows.

Limitations
Using Component Authorization is repetitive and requires an intimate understanding of which roles have access to which operations and when. It is not optimal to use to provide page based security and basic authentication, because that is better handled with the HTTP Request Authorization layer which is easier, universal, and more reliable. It is not optimal to provide class a method layer security, because that is better handled with the Service Layer Authentication which can annotate interfaces, abstract classes, and interceptors and does not require as much repetition or context-related knowledge to be applied effectively.

Final Thoughts
Spring Security is a useful and powerful tool, but it is best used when each type of security layer it provides is used effectively and for the purpose that it was designed. A carefully considered multi-prong approach to securing your application will provide a simpler, more elegant, and more secure solution.

References

Tuesday, June 11, 2013

On Problem Solving

Earlier this year I had the opportunity to attend an excellent workshop on Problem Solving given by Peter de Jager. Myself coming from a Mathematics and Computer Science background I have sat through many lectures on problem solving techniques, how to deconstruct a problem into its’ component parts, how to formalize statements using first-order logic, etc. Although I enjoy those subjects, I was pleasantly surprised that Peter took a very different approach to his seminar. He took a hands on approach by giving us dozens of physical and mental problems throughout the day to challenge us while illustrating specific points about the process of problem solving. In the end what we took away from the session didn’t include specific tools on how to solve a type of problem, it was a deeper understanding of problems in general, why they can occur, and new perspectives on how to solve problems and pre-emptively warding them off before they can start.


The key to being a more effective problem solver in everyday life lies in understanding ourselves and human nature in general. The problem is literally one of perspective.

This is certainly not a new idea and our culture is littered with references attempting to express this fact “Thinking outside the box”, “Can’t see the forest for the trees”, “Too close to the problem - What’s the bigger picture?” even the Socratic’ Method outlined in Plato’s dialogues in the 5th century BC is all about perspective and that the key to understanding a problem is to change your perspective, question your assumptions, modify your view of it.

One of the key tenants is understanding that you bring a lifetime of preconceptions with you when you are looking at a problem and questioning those preconceptions, modifying them is a fundamental skill.

Over the course of the day Peter led us through examples and problems of perspective, with tidbits of wisdom, real-world examples, common misconceptions, and traps to avoid.

A tool to help with problem solving is understanding labels. Labels are an essential tool for communication, when I refer to “a nail” it immediately conjures up all the things that a nail is and what it is used for. However when we need to solve a problem, labels can be a hindrance because they come loaded with preconceptions. Sometimes we are so locked into that perspective that we cannot think of other ways a nail can be seen or used. If I called a nail an “awl”, or an “icepick”, or a “model train bridge support strut” or even “sharp metal cylinder with a lip” it suddenly becomes an entirely different object. Labels are powerful, and sometimes we need to understand all the properties and attributes of an object independent of its’ label.

My favourite example that was given of this power was the story of a group of Christian people attempting and failing to hang a small crucifix on a wall with a nail. During this process they tried every conceivable method and object at their disposal to secure the nail to the wall, except for the single object that they had that was the same size, weight, and approximate shape of a hammer. The cross. To this group of people the crucifix had such importance, such value that it was impossible for them to conceive of it as a hammer on their own.

The last topic I will bring up was arguably the most important and useful in everyday life of everything that was covered. Consider the statement “People are resistant to change” do you agree or don’t you? It’s something we hear often, people don’t like change, change is frightening, there’s even a whole management discipline called “change management”. But is it true?

Make yourself a list of checkmarks for every time you have:
  • 1 for each time you’ve gone to a new school
  • 1 for each time you’ve taken a trip more than 1 hour away
  • 1 for each time you’ve gotten a new pet
  • 1 for each time you’ve moved
  • 1 more for each time you’ve moved to a new city
  • 5 for a new country
  • 1 for each time you’ve started a relationship
  • 1 for each time you’ve broken up a relationship
  • 2 for each time you’ve gotten married
  • 10 if you (intentionally) decided to have children
  • 5 more for any subsequent children
  • Have you bought a car, gotten a new job, been promoted, changed positions in your job, switched banks, invested, flown on a plane, done an extreme or potentially dangerous sport.

Do you think you really resist change? Do you really think everyone else is much different than you?

These are all really big changes that are taken on willingly and in many cases enthusiastically. So why is there the misconception that people resist change?

The key is that people don’t resist change, we embrace change when we choose it. We change happily and often when we decide that the change is right for us. This is the essence of change management: answering the question “Why?”.

If you want to induce change in others you want them to choose the change in order to be successful. Change by dictatorship rarely works well and is never easy (or well received). If you want to convince someone to embrace a change there are 7 key questions that they will have (that they might not even know that they have) that if you can answer for them effectively will go a long way towards your change being successful.
  1. Why?
  2. What’s in it for me?
  3. Monday – what am I going to start/need to do differently on Monday?
  4. What might go wrong?
  5. What won’t change?
  6. What will go wrong or be difficult?
  7. Signposts - How will you measure progress towards the change?  


Succinct answers to these questions can prevent many problems that may occur that lend credence to the statement “People resist change”. Answering them won’t guarantee success, or even approval, but without those answers change is far more likely to fail.

Tuesday, April 02, 2013

Report Conceptualization Training Strategy – Part 2, Structured Report Planning


For beginning and advanced report developers alike, the biggest challenge I have witnessed is the challenge of Report Conceptualization. This is the process of translating often incomplete business requirements into a structured plan that will produce the result that the business needs. For operational reports based on known data structures and calculations this process can be simple, often operational reports are structured very simply and the job of the report developer is to simply put the right fields in the right place. The process becomes much more difficult for strategic reports which attempt to help the business define what they should be doing. Strategic reports are often poorly understood by the business, their needs are uncertain and difficult to communicate, and vision for the end product is amorphous. Taking vague, conceptual requirements into a concrete view of business data is a challenge for developers and analysts alike.

Recently I have been mentoring two Cognos Report Developers in their process. These developers are experienced at building operations-level reports in Cognos and have a good understanding of the data and the business, but they are challenged at providing insightful analysis to the business when the business users themselves only have vague ideas of what they want to see, or in some cases have too many ideas that all jumble together.

The first step in this conceptualization process was to focus on a particular element, a decision, or a comparison that the business needs to make and determine an appropriate way to visualize the data that is needed. This was covered in Part 1, Report Visualization.
Once a vision has been established, the second step is to plan the underlying structure of the report so that the end result is a simple data structure that can be directly mapped to the visualization. This is the process I use for breaking down a difficult report into simple steps, and have trained numerous developers who use it successfully.

The Process
We will start with the user story:
“As a Product Manager I need to know year-over-year sales trends of my products to determine which products I should assign to my premium display space for each retailer.”

Based on the story we have done further analysis and come up with a detailed business requirement. The business user needs a line graph that shows monthly sales of the product lines and products they select for the current year and the previous year. The graph also needs to show the average sales of the selected products. The graph should only include sales from selected retailers.

A mock-up of the chart is given below.

Identify Your Measures
Measures are all numerical fields that can be aggregated or calculated and that are connected to dimensions. These are generally (in Cognos) located in the Fact table, but for the purposes of this process also includes deduced values such as counts or distinct counts of measures. Ie count(distinct product_id) is a measure that can be connected to other dimensions through an appropriate fact table, this produces the deduced value “Count of products for each value of dimension X where sales exist.”

In our above example our only measure is “Sales $”.

Identify Your Summarizations/Calculations
List all unique summarizations of measures alongside your measure list. These will be treated as separate measures that may or may not be able to be grouped with the list of identified measures so far. Remember that your measures will already be summarized according to their default (usually total) aggregation, you need to identify any alternate summarization such as “average”, “max”, “min”, “count” etc in this list.

Calculations are any formula that combines measures (or applies to a single identified measure). Add these alongside your measure list.

In this example we also have “Average $” as a summarization of all products.

Identify Your Aggregations/Groups
Aggregations and groups are any dimension, attribute, column, or field that the defined measures are grouped by (sums, counts, averages, etc). These groups I am calling descriptors, because the column or attribute describes an aggregation, and any particular value (ie. “Product A”) describes the values associated with it.

In our example the aggregations are: “Product”, “Calendar Month”, “Calendar Year”.

Identify Your Filters/Restrictions
Filters and restrictions are any condition applied to the data that reduces the data set. These are usually another set of descriptors (dimensions, attributes, columns, or fields) that are limited or selected in some fashion, and usually overlap in some fashion with the already defined aggregations/groups. There may be additional filters/restrictions that are not represented in the list of descriptors so far, and there may be value filters applied that are based on aggregations of the defined measures.

In our example the filters are: “Product Line – by selection”, “Product – by selection”, “Calendar Year = Current Year”, “Calendar Year = Previous Year”, "Retailer - by selection".

Build a Mapping Table
Create a table and label each column with your Measures, Summarizations, and Calcuations and label each row with your Descriptors, Filters, and Restrictions. Place a mark in each cell where the Descriptor will be used to group or filter the Measure

Our example should produce a table like this:

Sales $Average $
Product
X

Calendar Month
X
X
Calendar Year
X
X
Filter - Product Line
X
X
Filter - Product
X
X
Filter - Calendar Year = Current Year
X
X
Filter - Calendar Year = Previous Year
X
X
Filter - Retailer
X
X

Note that Average $ is not grouped by Product but is impacted by the Filter - Product, this is because the Average $ calculation will be performed across all products in the filtered list.

Choose Your Pivot Descriptor
The Pivot is a critical component in the analysis, it is the descriptor that is at the core of the visualization. Generally you can find it at the centre of the business requirement, it is the thing that is being reported on, and that decisions are being made about. The purpose it serves is that all other pieces of data in the analysis are connected to it and revolve around it.

This pivot will generally be used as the join key in any inner/outer joins that will be created in this report, it may not be the only item used in a join, but it will be the common thread that is used in (nearly) every join.

On simpler reports it may be unclear which descriptor is actually serving as pivot, and this discussion often centers around the calendar, whether the date is the pivot object, or whether something else (like product or location) is it. The easiest way to decide is simply asking why are you using the report? What are you reporting on? Are you reporting on Sales by Month (but shown by product)? Or are you really concerned about Product Sales (but shown by month/year)?.

Another quick way to sometimes decide which descriptor is acting as pivot (not always) is whether or not it is being shown on the report in two or more different ways (or distinct sets)? If it is, then it’s probably not your pivot. You can also look at filters, which descriptors are being filtered by the business user’s choice? These are good candidates for the Pivot.

In the example above your Pivot is probably “Product”. The other option would be “Month”, but there are a couple of reasons why this is probably not the case. 1) Product is part of a user-driven filter, the business user is choosing which product lines/products to see and is thus reporting on the sales of particular products, not sales of particular months. 2) Calendar is being included in two distinct sets, Previous Year and Current Year, this is not a hard-and-fast rule but it can imply that this is not the central piece of the view but it is reinforces reason #1.
3) Although Average $ is not being grouped by the Product, it is performing a summary of Products and in the defined report the Average is being displayed in the same manner as a Product.

Group Descriptors into Hierarchies
Identify descriptors that are related to one another in a natural or structured hierarchy, these can be easily collected as “summary” and “detail” components of a single query.
In the example the natural hierarchies are “Month, Year, Filter - Current Year”, and “Month, Year, Filter - Previous Year”. Since month is included in two separate hierarchies it is the natural joining point to connect data from the “Current Year” hierarchy and the “Previous Year” hierarchy.

If we wanted to also display the Product Line on our report we would add it as a Descriptor and it would form a third hierarchy "Product Line, Product" but for simplicity we will leave it out.


Identify Independent or Disjoint Sets
Identify filter or restriction combinations that are independent of one another. These are filter combinations that are mutually exclusive or partially overlapping, and that if both were included in the same query would produce no results or fewer rows than intended.

Our example has the mutually exclusive "Filter - Current Year" and "Filter - Previous Year". If both filters were included in the same query, this would result in no data because a row cannot be both in the Current Year AND the Previous Year.

You may note that in this situation both could be combined into the same query using an OR condition, and due to the simplicity of this example that is true. This process is not intended to produce the most optimal or shortest path to a result, but is intended to produce a consistent, repeatable, and understandable path to a result. The counter-argument to using an OR clause is that if we decided to replace "Current Year" and "Previous Year" with arbitrary (potentially overlapping) date ranges, then using an OR clause would no longer be possible and at least 2 new queries would need to be created to perform the enhancement.

Group Measures/Calculations into Queries based on Descriptor Map
First lets review our updated mapping table.

Sales $Average $
Product
X

1Calendar Month
X
X
 -  Calendar Year
X
X
 -  Filter - Calendar Year = Current Year
X
X
2Calendar Month
X
X
 -  Calendar Year
X
X
 -  Filter - Calendar Year = Previous Year
X
X
Filter - Product Line
X
X
Filter - Product
X
X
Filter - Retailer
X
X
1 & 2 - Independent sets

First rule of grouping into queries: If 2 measures have exactly the same mapping (and in Cognos they must come from the same Fact table) then they can be combined into the same queries.

Second rule: Split independent sets into different queries, and add all unrelated descriptors into both queries.

Third rule: Identify each combination of measures (with different mappings) and split descriptors as a separate query.

In our example we will produce 4 queries:

  1. Sales $ [Product, Month/Year/Filter - Current Year, Filter - Product Line, Filter - Product, Filter - Retailer]
  2. Sales $ [Product, Month/Year/Filter - Previous Year, Filter - Product Line, Filter - Product, Filter - Retailer]
  3. Average $ [Month/Year/Filter - Current Year, Filter - Product Line, Filter - Product, Filter - Retailer]
  4. Average $ [Month/Year/Filter - Previous Year, Filter - Product Line, Filter - Product, Filter - Retailer]

Combine Queries by Joining on Pivot
With each query defined, the only step left is to join them together in a fashion that produces our report. There are a couple of factors that go into deciding how to join queries together. A key point to remember is that there isn't a wrong way to join them, the resulting report will still be technically valid, but the data may not be represented (or show) the values that you are intending to show. Because we have each of the queries we need the raw data is available to produce our intended result.

Guidelines:

  1. Start with the most-inclusive and most-detailed query and add to it. Additional filters can be added later to reduce the set of results being displayed. Start with the largest set of data and summarize/reduce the working set as you add more information.
  2. Joins (outer/inner) are used to directly compare measures using a calculation (such as percent change or difference). They are not used to indirectly compare measures (such as grouping and plotting together on a graph). Unions are used to make indirect comparisons.
  3. Outer joins are used to add additional reference information to your pivot (ie. adding cost of goods info to products that happen to have COG values).
  4. Inner joins are used to intersect and add conditional information to your pivot (ie. only showing products that have both Sales and COG values).

Our example is quite simple in terms of joins, we are only going to use Union joins by creating a new column in our Average $ queries to replace the missing Product column and using it to label the rows as "AVG" and "AVG-PY".

If we were going to year-over-year differentials between products and averages we would have to change our joins as follows:

  1. JOIN Sales $ (Previous Year) to Sales $ (Current Year) using an outer join on Product and Month
  2. JOIN Average $ (Previous Year) to Average $ (Current Year) using an outer join on Month
  3. UNION Sales $ (Year over year) to Average $ (Year over year) as normal
This would allow us to calculate the change in Sales $ from Previous year to Current year, and calculate the change in Average $ from Previous year to Current year and display both values on our chart.

Conclusion
This is an outline of my process when building complex reports in Cognos. As described above there are a few gaps in it, and I have only touched on some of the finer points of report development (joins, multiple pivots, and aggregations for example) but I have used it effectively, and through a series of workshops have taught it to both novice and experienced Cognos report developers so that they could apply it on their own and produce solid, maintainable reports that are currently being using in production within their own organizations.

I hope this can provide some small help to Cognos developers out there, and if you have any questions or need me to clarify or expand on any of my points in this article please feel free to contact me.

Monday, November 12, 2012

Report Conceptualization Training Strategy – Part 1, Report Visualization


For beginning and advanced report developers alike, the biggest challenge I have witnessed is the challenge of Report Conceptualization. This is the process of translating often incomplete business requirements into a structured plan that will produce the result that the business needs. For operational reports based on known data structures and calculations this process can be simple, often operational reports are structured very simply and the job of the report developer is to simply put the right fields in the right place. The process becomes much more difficult for strategic reports which attempt to help the business define what they should be doing. Strategic reports are often poorly understood by the business, their needs are uncertain and difficult to communicate, and vision for the end product is amorphous. Taking vague, conceptual requirements into a concrete view of business data is a challenge for developers and analysts alike.

The first step is understanding which are the best visualization options to meet the business requirements. Often a table of numbers is what a business unit understands and asks for because they are used to dealing with operational reports where they need numbers. But sometimes a well-designed visualization of the data makes it easier to understand and gives the business the interpretation that they need in order to make a decision without having to spend time crunching numbers.

Visualization of a data set needs to be chosen carefully to properly communicate the information that needs to be understood. A poorly chosen visualization, especially if it is poorly documented, can provide confusing, useless, and sometimes misleading information.

A couple of my favourite visualizations that do an excellent job of communicating information are:

1)  Florence Nightingale’s Diagram of the causes of mortality in the army in the East

The purpose of Nightingale’s chart was to illustrate that the primary causes of death amongst soldiers in the Crimean war were due to preventable diseases. The polar area diagram does a fantastic job of showing this, and by how much. As an aside, Nightingale could have exaggerated her diagram by choosing to use a polar radius diagram where the radial measurement is linear with the value instead of the area of the wedge. Since the eye naturally compares areas it would imply that the scale of deaths due to preventable causes was that much larger, but to her credit Nightingale strove for precise accuracy in her representations.

2)  Charles Minard’s Flow map of Napoleon’s March

Minard’s graph combines a variety of pieces of information in a novel way, representing the course of Napolean’s march geographically, as well as including the number of soldiers on both the initial march and the retreat from Moscow and the successive losses incurred, but also the temperature on the return march showing the impact of the weather on casualties.

That being said, visualization is something that needs to be developed with cooperation from the business people who are going to be using it. Even if a 100% Stacked Bar Chart is a perfect representation of the information that needs to be communicated, it is worthless if the business users do not understand what it represents and how to use it, or are trying to interpret more information from the visualization than exists because of their preconceptions about what a bar chart is.

One example is the use of two types of bar charts: Stacked Bar Chart, and 100 Percent Stacked Bar Chart. The Stacked Bar Chart can often be confused with an Area Chart as the stacked bars are misinterpreted as “overlapping” by assuming the view is a projection of the normal Standard Bar Chart from the end. Likewise the 100 Percent Stacked Bar Chart is confused with the Stacked Bar Chart assuming that the height is representative of magnitude not share, this confusion arises because the business user does not see the 100 Percent Stacked Bar Chart as analogous to a series of Pie Charts.

Remember when designing charts for business use that creative interpretation can hinder clarity. A grid of Pie charts may feel clumsy, but may do a better job of communicating effectively with your business user.

Deciding
How do you decide when to use what kind of chart?

The first step in choosing an appropriate visualization is to understand what you are trying to communicate. The purpose of a chart is to convey information about some kind of relationship between pieces of data. There is always at least two pieces of information in a chart (otherwise it is a very boring chart) and the type of relationship between those pieces of information helps determine the most effective way to display it.

What kind of relationship do you want to illustrate?

  • Do you want to draw a comparative relationship?
    Ex. How do the values of A compare to B, compare to C, over time?
  • Do you want to illustrate the existence of a relationship?
    Ex. As the values of A increase, what happens to B or C?
  • Do you want to understand a composition, how pieces make up a whole?
    Ex. How are the values of A broken down into groups B and C?
  • Do you want to show how values in a relationship are distributed?
    Ex. How often does A occur by value?
One of the better examples I have seen on how to start understanding this process is by Extreme Presentation Method and their Chart Chooser (http://www.extremepresentation.com/design/charts/). This chooser presents a nice compact decision tree to understand how certain charts are best used to explain specific relationships. It is neither perfect, nor complete, but it is an excellent starting point.

Another good resource is the Periodic Table of Visualization Methods by Visual Literacy (http://www.visual-literacy.org/periodic_table/periodic_table.html#). This resource gives a wide spectrum of visualization options and groups them by What is being visualized (Data, Information, Concept, Strategy, Metaphor, Compound), whether the visualization is Process or Structure, whether the visualization is on the Overview or Detail level or both, and whether a visualization is geared towards Convergent or Divergent thinking. It does not do a good job of explaining when or how to use any particular visualization, but it is a nice complement to the Extreme Presentation decision tree by providing a bit of context around the purpose of certain visualizations.

Tuesday, October 02, 2012

Performance Testing a Web Framework Application

Performance testing a web application that is built on a web framework can pose significant challenges, whether it is using Java Servlets, Struts, Rails, or others. These challenges are present regardless of the performance testing tool in use, in this case using Rational Performance Tester, but are equally evident in JMeter and other load generation tools.
A “normal” web application we will define as a web application that does not rely on any web frameworks, uses JSP or ASP pages, contains a limited amount of session information, and is generally stateless. Other attributes of a normal web application include:
  • Each page has a unique URL and serves a specific purpose or function within the web application.
  • Changing the page request attributes will modify the dataset, operations, or view of the requested URL, but will not change the underlying purpose or function of the page.
  • Stateful operations are performed by chaining a series of multiple page URLs together, each serving an appropriate purpose. Out of workflow requests (requesting a different URL) are not blocked and simply terminate the current workflow.
The operation of a normal web application differs from how web framework applications are constructed. Web framework applications generally contain a significant amount of session information and have a workflow that is stateful and limit options for performing out of workflow requests. Other differing attributes include:
  • A single URL encapsulates many different (usually related) purposes and functions.
  • Stateful operations consist of a series of requests that are chained together using a combination of request attributes and stateful information stored in the session.
  • Monlithic applications often appear under a single URL.
  • Changing or breaking out of predetermined workflows may be strictly controlled and prevented by disallowing any unexpected state changes.
As a result of these differences there are some key considerations when attempting to performance test a web framework application.
  1. Workflows within a web framework may be inflexible. This means that simple errors may cause cascading problems that invalidate the remainder of a test because the test may be unable to escape from a workflow if the correct sequence is breached due to a page timeout, or validation error due to an incorrect datapool.
  2. Out of flow, looping, and branching tests can be difficult or impossible to record because the request to initiate a loop or branch may change. There may not be a consistent “start of workflow” page request that can be made to initiate or restart a workflow.
  3. Error conditions must be planned for and carefully handled since an error state may negate all other requests until the error is handled correctly.
  4. Minor changes to the application may force entire test suites to be rewritten due to inability to record, insert, and chain new parameters or pages into an existing test.
  5. Similar workflows with different data may require individual test scripts due to changes in screen components.
In order to make your performance testing of a web framework application as successful as possible, here are some strategies I have adopted to reduce the amount of time I spend rewriting test scripts for each new code deploy, and to reduce the number of page errors I get during the course of a test run.
  1. PLAN your tests before recording! Understand which functions you want to hit with each test and ensure you have scripts that are focused with as few extraneous requests as possible.
  2. Create many short, focused tests instead of long all-encompassing tests to reduce the potential for errors and for error chaining.
  3. Separate read-only tests from data-entry tests and pre-populate data whenever possible to avoid long “setup” sequences.
  4. Avoid test dependency where later functions or tests heavily depend on earlier parts to configure environment/data. Either preconfigure your environment and datasets, or rely on the test itself to perform only the critical data entry prior to testing it’s assigned function.
  5. Chain multiple, small, independent tests in a schedule to cover desired functionality. Branching and looping should be done at the schedule level whenever possible to avoid needing to record branches and loops within individual tests.
  6. If you do need to include datapools that cause variation in functionality, record the largest, most inclusive workflow and dataset, and then populate your datapool with values that result in restricted subsets of the recorded workflow.
  7. Establish easily accessible, non-blockable base points in your test. These base points or pages must be able to short-circuit functionality that has become error locked and should be returned to as often as possible in your recording to allow “reset” to occur and reduce the impact of error chaining. Using base points allows you to more easily include other constructs such as loops, conditionals, or branches in your recording.