Friday, November 23, 2012

Gotcha - Teradata Views

Encountered another interesting "gotcha" again involving Teradata v13.1 and how it handles metadata for views. We encountered this issue within Cognos Framework Manager v10.1.1 when attempting to use a view created in Teradata as a query subject.

The exact Cognos error that we received was:
RQP-DEF-0177 An error occurred while performing operation 'sqlScrollBulkFetch' status='-9'.
UDQ-SQL-0107 A general exception has occurred during the operation "fetch".
[Teradata][ODBC Teradata Driver][Teradata Database] Internal error: Please do not resubmit the last request. SubCode, CrashCode:
After running a UDA trace and a Teradata ODBC driver trace and reviewing the log files we discovered a statement that was causing the error message:
HELP COLUMN "DB_NAME"."VIEW_NAME"."PK_ID_FIELD_NAME"
Running this query manually on the database gave a more detailed, but still obscure error message:
HELP Failed. 3610: Internal error: Please do not resubmit the last request. SubCode, CrashCode:
The view itself that we were debugging was extremely complex, but after some experimentation I was able to produce the following simple view definition that still caused the error.

CREATE VIEW DB_NAME.VIEW_NAME AS
SELECT
T1.FIELD1,T2.PK_ID_FIELD_NAME
FROM
DB_NAME.PARENT_TABLE T1,
DB_NAME.CHILD_TABLE T2
WHERE T1.FK_ID_FIELD_NAME = T2.PK_ID_FIELD_NAME
;

Simple right? Gotcha #2 is that this error only appeared on 2 of our 3 environments, Development and UAT showed this issue, but our SystemTest environment worked without a problem.

We were able to devise a temporary workaround, because the HELP query specifically identified a problem with the PK_ID_FIELD_NAME on the CHILD_TABLE we were able to replace it by using the FK_ID_FIELD_NAME on the PARENT_TABLE which fixed the error message. However this was not a solution to the problem, because logically retrieving the primary key of a joined table in a view should NOT cause a problem.

The Solution

The exact reason for why this problem was happening on 2 out of 3 of our systems is still unknown, we suspect there is corrupt or missing column metadata that was causing the inconsistency. Nevertheless we did find a solution to the problem.

The problem was resolved by explicitly naming the view's columns in the view definition. For whatever reason, this bypassed the metadata error and allowed the view to be used in both Cognos and Teradata SQL Assistant. Below is the fixed view definition with the changes highlighted in green:
CREATE VIEW DB_NAME.VIEW_NAME (FIELD1, PK_ID_FIELD_NAME) AS
SELECT
T1.FIELD1,T2.PK_ID_FIELD_NAME
FROM
DB_NAME.PARENT_TABLE T1,
DB_NAME.CHILD_TABLE T2
WHERE T1.FK_ID_FIELD_NAME = T2.PK_ID_FIELD_NAME
;
This allowed the HELP COLUMN metadata to be generated correctly for the view and fixed this issue without having to restructure the view query itself.

Monday, November 12, 2012

Report Conceptualization Training Strategy – Part 1, Report Visualization


For beginning and advanced report developers alike, the biggest challenge I have witnessed is the challenge of Report Conceptualization. This is the process of translating often incomplete business requirements into a structured plan that will produce the result that the business needs. For operational reports based on known data structures and calculations this process can be simple, often operational reports are structured very simply and the job of the report developer is to simply put the right fields in the right place. The process becomes much more difficult for strategic reports which attempt to help the business define what they should be doing. Strategic reports are often poorly understood by the business, their needs are uncertain and difficult to communicate, and vision for the end product is amorphous. Taking vague, conceptual requirements into a concrete view of business data is a challenge for developers and analysts alike.

The first step is understanding which are the best visualization options to meet the business requirements. Often a table of numbers is what a business unit understands and asks for because they are used to dealing with operational reports where they need numbers. But sometimes a well-designed visualization of the data makes it easier to understand and gives the business the interpretation that they need in order to make a decision without having to spend time crunching numbers.

Visualization of a data set needs to be chosen carefully to properly communicate the information that needs to be understood. A poorly chosen visualization, especially if it is poorly documented, can provide confusing, useless, and sometimes misleading information.

A couple of my favourite visualizations that do an excellent job of communicating information are:

1)  Florence Nightingale’s Diagram of the causes of mortality in the army in the East

The purpose of Nightingale’s chart was to illustrate that the primary causes of death amongst soldiers in the Crimean war were due to preventable diseases. The polar area diagram does a fantastic job of showing this, and by how much. As an aside, Nightingale could have exaggerated her diagram by choosing to use a polar radius diagram where the radial measurement is linear with the value instead of the area of the wedge. Since the eye naturally compares areas it would imply that the scale of deaths due to preventable causes was that much larger, but to her credit Nightingale strove for precise accuracy in her representations.

2)  Charles Minard’s Flow map of Napoleon’s March

Minard’s graph combines a variety of pieces of information in a novel way, representing the course of Napolean’s march geographically, as well as including the number of soldiers on both the initial march and the retreat from Moscow and the successive losses incurred, but also the temperature on the return march showing the impact of the weather on casualties.

That being said, visualization is something that needs to be developed with cooperation from the business people who are going to be using it. Even if a 100% Stacked Bar Chart is a perfect representation of the information that needs to be communicated, it is worthless if the business users do not understand what it represents and how to use it, or are trying to interpret more information from the visualization than exists because of their preconceptions about what a bar chart is.

One example is the use of two types of bar charts: Stacked Bar Chart, and 100 Percent Stacked Bar Chart. The Stacked Bar Chart can often be confused with an Area Chart as the stacked bars are misinterpreted as “overlapping” by assuming the view is a projection of the normal Standard Bar Chart from the end. Likewise the 100 Percent Stacked Bar Chart is confused with the Stacked Bar Chart assuming that the height is representative of magnitude not share, this confusion arises because the business user does not see the 100 Percent Stacked Bar Chart as analogous to a series of Pie Charts.

Remember when designing charts for business use that creative interpretation can hinder clarity. A grid of Pie charts may feel clumsy, but may do a better job of communicating effectively with your business user.

Deciding
How do you decide when to use what kind of chart?

The first step in choosing an appropriate visualization is to understand what you are trying to communicate. The purpose of a chart is to convey information about some kind of relationship between pieces of data. There is always at least two pieces of information in a chart (otherwise it is a very boring chart) and the type of relationship between those pieces of information helps determine the most effective way to display it.

What kind of relationship do you want to illustrate?

  • Do you want to draw a comparative relationship?
    Ex. How do the values of A compare to B, compare to C, over time?
  • Do you want to illustrate the existence of a relationship?
    Ex. As the values of A increase, what happens to B or C?
  • Do you want to understand a composition, how pieces make up a whole?
    Ex. How are the values of A broken down into groups B and C?
  • Do you want to show how values in a relationship are distributed?
    Ex. How often does A occur by value?
One of the better examples I have seen on how to start understanding this process is by Extreme Presentation Method and their Chart Chooser (http://www.extremepresentation.com/design/charts/). This chooser presents a nice compact decision tree to understand how certain charts are best used to explain specific relationships. It is neither perfect, nor complete, but it is an excellent starting point.

Another good resource is the Periodic Table of Visualization Methods by Visual Literacy (http://www.visual-literacy.org/periodic_table/periodic_table.html#). This resource gives a wide spectrum of visualization options and groups them by What is being visualized (Data, Information, Concept, Strategy, Metaphor, Compound), whether the visualization is Process or Structure, whether the visualization is on the Overview or Detail level or both, and whether a visualization is geared towards Convergent or Divergent thinking. It does not do a good job of explaining when or how to use any particular visualization, but it is a nice complement to the Extreme Presentation decision tree by providing a bit of context around the purpose of certain visualizations.

Tuesday, October 16, 2012

Gotcha: Teradata Union Queries

Encountered an interesting "gotcha" today involving Teradata v13.1 and how it handles data types with UNION queries. The context of this discovery was made within Cognos Report Studio v10.1.1 but we discovered it was a global issue within the generated SQL, not anything native to Cognos itself.

The problem was to do with using hard-coded string values within a UNION statement. If the string value is in the first component of a UNION then Teradata will truncate any column joined to that string via the UNION to the length of the string.

Here is my example situation:

Table foo:
foo_id INTEGER Primary Key
foo_name VARCHAR(15)

Table bar:
bar_id INTEGER Primary Key
name VARCHAR(100)

SELECT 'a-string' as name, foo_id FROM foo
UNION
SELECT name, bar_id FROM bar

Teradata will auto-detect the data type of 'a-string' as a VARCHAR(8) and will then cast bar.name as a VARCHAR(8) which will cause it to truncate anything beyond the first 8 characters. So a value of 'Hello World!' within the column bar.name will display as 'Hello Wo' in the above query.

This truncation does not take place if we replace 'a-string' with the column foo_name which is a VARCHAR(15). If we use foo_name instead, Teradata will correctly detect the largest data type from both sides of the query (VARCHAR(15) and VARCHAR(100)) and will cast all values as a VARCHAR(100) to prevent truncation.

In order to workaround this issue we must explicitly cast the hardcoded string 'a-string' as a datatype sufficiently long to contain any joined data. In this case CAST('a-string' AS VARCHAR(100)) is sufficient to resolve the truncation issue.

Tuesday, October 02, 2012

Performance Testing a Web Framework Application

Performance testing a web application that is built on a web framework can pose significant challenges, whether it is using Java Servlets, Struts, Rails, or others. These challenges are present regardless of the performance testing tool in use, in this case using Rational Performance Tester, but are equally evident in JMeter and other load generation tools.
A “normal” web application we will define as a web application that does not rely on any web frameworks, uses JSP or ASP pages, contains a limited amount of session information, and is generally stateless. Other attributes of a normal web application include:
  • Each page has a unique URL and serves a specific purpose or function within the web application.
  • Changing the page request attributes will modify the dataset, operations, or view of the requested URL, but will not change the underlying purpose or function of the page.
  • Stateful operations are performed by chaining a series of multiple page URLs together, each serving an appropriate purpose. Out of workflow requests (requesting a different URL) are not blocked and simply terminate the current workflow.
The operation of a normal web application differs from how web framework applications are constructed. Web framework applications generally contain a significant amount of session information and have a workflow that is stateful and limit options for performing out of workflow requests. Other differing attributes include:
  • A single URL encapsulates many different (usually related) purposes and functions.
  • Stateful operations consist of a series of requests that are chained together using a combination of request attributes and stateful information stored in the session.
  • Monlithic applications often appear under a single URL.
  • Changing or breaking out of predetermined workflows may be strictly controlled and prevented by disallowing any unexpected state changes.
As a result of these differences there are some key considerations when attempting to performance test a web framework application.
  1. Workflows within a web framework may be inflexible. This means that simple errors may cause cascading problems that invalidate the remainder of a test because the test may be unable to escape from a workflow if the correct sequence is breached due to a page timeout, or validation error due to an incorrect datapool.
  2. Out of flow, looping, and branching tests can be difficult or impossible to record because the request to initiate a loop or branch may change. There may not be a consistent “start of workflow” page request that can be made to initiate or restart a workflow.
  3. Error conditions must be planned for and carefully handled since an error state may negate all other requests until the error is handled correctly.
  4. Minor changes to the application may force entire test suites to be rewritten due to inability to record, insert, and chain new parameters or pages into an existing test.
  5. Similar workflows with different data may require individual test scripts due to changes in screen components.
In order to make your performance testing of a web framework application as successful as possible, here are some strategies I have adopted to reduce the amount of time I spend rewriting test scripts for each new code deploy, and to reduce the number of page errors I get during the course of a test run.
  1. PLAN your tests before recording! Understand which functions you want to hit with each test and ensure you have scripts that are focused with as few extraneous requests as possible.
  2. Create many short, focused tests instead of long all-encompassing tests to reduce the potential for errors and for error chaining.
  3. Separate read-only tests from data-entry tests and pre-populate data whenever possible to avoid long “setup” sequences.
  4. Avoid test dependency where later functions or tests heavily depend on earlier parts to configure environment/data. Either preconfigure your environment and datasets, or rely on the test itself to perform only the critical data entry prior to testing it’s assigned function.
  5. Chain multiple, small, independent tests in a schedule to cover desired functionality. Branching and looping should be done at the schedule level whenever possible to avoid needing to record branches and loops within individual tests.
  6. If you do need to include datapools that cause variation in functionality, record the largest, most inclusive workflow and dataset, and then populate your datapool with values that result in restricted subsets of the recorded workflow.
  7. Establish easily accessible, non-blockable base points in your test. These base points or pages must be able to short-circuit functionality that has become error locked and should be returned to as often as possible in your recording to allow “reset” to occur and reduce the impact of error chaining. Using base points allows you to more easily include other constructs such as loops, conditionals, or branches in your recording.