I am posting a link to a podcast: Gamers With Jobs - Conference Call Episode 381 (released today). I received a couple of shout-outs on this episode from my friend Bill Harris who was on to talk about his just-released-on-steam game Gridiron Solitaire.
I've been helping Bill for the better part of 3 years, teaching him how to code from scratch, helping him with problems, and answering his questions - so getting to see him actually release his brainchild to the public is a very proud moment.
More information can be found at Bill and Eli Productions and on the Steam Gridiron Solitaire Product Page.
Wednesday, January 29, 2014
Wednesday, January 15, 2014
Gotcha #4 - WPF - Restore Events and Media Elements
Today's article comes courtesy of assisting Bill Harris' (Dubious Quality) work on GridIron Solitaire. The original problem comes from identifying a bug that sound effects were not playing when a system returns from sleep mode if the game had been left open.
Some basic details on the application: GridIron Solitaire uses Windows Presentation Foundation (WPF) for its UI framework, it has been developed in VisualBasic 2010, and sound effects are played by using System.Windows.Controls.MediaElement class.
The problem in this case was that sound effects that had been initialized prior to the system enter sleep mode did not continue playing after the system returned to an active state. Sounds initialized after the system returned to an active state continued to play fine. When first presented with this problem it did not seem to be a terribly difficult one to solve, nor an uncommon one to encounter. There are many games released by major studios that have difficulty handling system sleep and restore, and even minimization or alt-tab window switching. The most recent experience I have had with these problems include Civilization V in which I have encountered alt-tab artifacts (game remains visible in the background), as well as restore from minimization (texture corruption) and restore from sleep (fatal crash) problems.
The first line of investigation that we followed was the possibility that the MediaElement was failing to play after a restore, so we constructed a plan to catch the MediaElement.MediaFailed event and reset the MediaElement in that instance. On the positive side this approach solved an unrelated application crash problem caused by a missing media file, but it was soon determined that the MediaFailed event was not being fired when the sounds failed to play after a system restore.
PowerModeChanged Event Solution #1
Further research revealed that a MediaElement's Source property becomes invalidated during the sleep/restore cycle, and that resetting the Source property and restarting the MediaElement. One of the suggestions was to catch the PowerModeChanged Event and check for PowerModes.Suspend and PowerModes.Resume states, any suspend/resume code needed by your application can be performed in this block. The resulting code catches the PowerModeChanged Event, and on Resume resets every MediaElement Source property to its correct value.
Private Sub SystemEvents_PowerModeChanged(ByVal sender As Object, ByVal e As PowerModeChangedEventArgs)
Select Case e.Mode
Case PowerModes.Resume
Reinitialize_Sounds()
Case PowerModes.StatusChange
Case PowerModes.Suspend
BackgroundEffectALoopPlayer.Pause()
BackgroundEffectBPlayer.Pause()
End Select
End Sub
Private Sub Reinitialize_Sounds()
SoundEffectA.Source = Nothing
SoundEffectA.Source = New Uri("Resources/SoundEffectA.mp3", UriKind.Relative)
'...etc...
End Sub
The effect of this code seemed to be satisfactory at first. Although it slowed down the time for the application to restore from a suspended state, it wasn't a particular problem as additional processing time required to restore from a suspended state is normally accepted and expected. A minor issue was that ongoing sounds would not restart until the next time the code required them to be played, which was easily resolved by programmatically restarting ongoing sounds within the restore code.
However, it was only observed later - and generally on lower-end systems - that in some cases a few sound effects were not being restarted after restore. An additional confounding factor was that the sound effects that were not returning were not consistent, an apparent random selection of 2 or 3 would not resume and this effect was not reproducible in debug mode.
Application_Activated Event Solution #2
Considering the possibility that the PowerModeChanged Event was not firing correctly, or that there was some other conflict regarding MediaElement objects that was occurring after the PowerModeChanged Event the proposed alternative was to move the restore code into an Application_Activated Event handler. The Application_Activated Event is fired on a different but related and less-specific set of conditions to the PowerModeChanged Event. The theory was that perhaps the systems where the continuing sound failure was being observed were tablet and netbook systems which may be impacting the events being fired.
Once the restore code was transferred and redeployed, the same issues were observed occurring in precisely the same circumstances as the PowerModeChanged Event solution.
Brute-force Solution #3
In an attempt to find any way past the problem, Bill attempted to brute-force the solution by forcing every sound to reload the Source property every time that it was activated. Although the performance of this approach was going to be unacceptable, the lag time for loading a sound every time it is played would be noticeable to the player, as a debugging approach it was reasonable.
The result: slow, but successful. Despite every sound effect suffering from noticeable lag, after a suspend/restore cycle every sound effect returned and played correctly on every test system configuration.
Problem Identification
It was at this point that the cause of the problem was identified. The process that restores an application from a suspended state causes MediaElement sources to be invalidated. The problem with using either the PowerModeChanged Event or the Application_Activated Event for setting the source property, is that the event handlers are operating on a different thread than the application restore process and these threads are entering into a race condition.
In the first two solutions, on slower systems a few of the lighter-weight sound effects were being loaded into their MediaElement objects before the application restore process that invalidates the MediaElement sources was completed. As a result a few of the MediaElement sources were being invalidated after they had already been corrected - causing a few of the sound effects to fail in an inconsistent and unpredictable manner.
On-demand Resource Loading Solution #4
The final solution is to enforce on-demand resource loading and caching whenever a MediaElement is required to play a sound effect, moving the responsibility for checking the existence of a media source from the pre-loader to the code that plays the MediaElement itself.
As a result the PowerModeChanged Event handler is being used to nullify (set to Nothing) all MediaElement sources when a suspend event occurs. Then every time a MediaElement is about to be played the code first rechecks its source to ensure that it is not null, and reloads the source if it does not exist before playing it.
Private Sub SystemEvents_PowerModeChanged(ByVal sender As Object, ByVal e As PowerModeChangedEventArgs)
Select Case e.Mode
Case PowerModes.Resume
Case PowerModes.StatusChange
Case PowerModes.Suspend
SoundEffectA.Source = Nothing
SoundEffectB.Source = Nothing
'...etc...
End Select
End Sub
'In sound effect Play code:
If SoundEffectA.Source = Nothing Then
SoundEffectA.Source = New Uri("Resources/SoundEffectA.mp3", UriKind.Relative)
End If
SoundEffectA.Play()
The final effect is that sound effect failure has been eliminated, and a slight lag occurs the first time each sound effect is played immediately after a system restore, but returns to optimal performance soon after once each sound effect has been re-cached.
Some basic details on the application: GridIron Solitaire uses Windows Presentation Foundation (WPF) for its UI framework, it has been developed in VisualBasic 2010, and sound effects are played by using System.Windows.Controls.MediaElement class.
The problem in this case was that sound effects that had been initialized prior to the system enter sleep mode did not continue playing after the system returned to an active state. Sounds initialized after the system returned to an active state continued to play fine. When first presented with this problem it did not seem to be a terribly difficult one to solve, nor an uncommon one to encounter. There are many games released by major studios that have difficulty handling system sleep and restore, and even minimization or alt-tab window switching. The most recent experience I have had with these problems include Civilization V in which I have encountered alt-tab artifacts (game remains visible in the background), as well as restore from minimization (texture corruption) and restore from sleep (fatal crash) problems.
The first line of investigation that we followed was the possibility that the MediaElement was failing to play after a restore, so we constructed a plan to catch the MediaElement.MediaFailed event and reset the MediaElement in that instance. On the positive side this approach solved an unrelated application crash problem caused by a missing media file, but it was soon determined that the MediaFailed event was not being fired when the sounds failed to play after a system restore.
PowerModeChanged Event Solution #1
Further research revealed that a MediaElement's Source property becomes invalidated during the sleep/restore cycle, and that resetting the Source property and restarting the MediaElement. One of the suggestions was to catch the PowerModeChanged Event and check for PowerModes.Suspend and PowerModes.Resume states, any suspend/resume code needed by your application can be performed in this block. The resulting code catches the PowerModeChanged Event, and on Resume resets every MediaElement Source property to its correct value.
Private Sub SystemEvents_PowerModeChanged(ByVal sender As Object, ByVal e As PowerModeChangedEventArgs)
Select Case e.Mode
Case PowerModes.Resume
Reinitialize_Sounds()
Case PowerModes.StatusChange
Case PowerModes.Suspend
BackgroundEffectALoopPlayer.Pause()
BackgroundEffectBPlayer.Pause()
End Select
End Sub
Private Sub Reinitialize_Sounds()
SoundEffectA.Source = Nothing
SoundEffectA.Source = New Uri("Resources/SoundEffectA.mp3", UriKind.Relative)
'...etc...
End Sub
The effect of this code seemed to be satisfactory at first. Although it slowed down the time for the application to restore from a suspended state, it wasn't a particular problem as additional processing time required to restore from a suspended state is normally accepted and expected. A minor issue was that ongoing sounds would not restart until the next time the code required them to be played, which was easily resolved by programmatically restarting ongoing sounds within the restore code.
However, it was only observed later - and generally on lower-end systems - that in some cases a few sound effects were not being restarted after restore. An additional confounding factor was that the sound effects that were not returning were not consistent, an apparent random selection of 2 or 3 would not resume and this effect was not reproducible in debug mode.
Application_Activated Event Solution #2
Considering the possibility that the PowerModeChanged Event was not firing correctly, or that there was some other conflict regarding MediaElement objects that was occurring after the PowerModeChanged Event the proposed alternative was to move the restore code into an Application_Activated Event handler. The Application_Activated Event is fired on a different but related and less-specific set of conditions to the PowerModeChanged Event. The theory was that perhaps the systems where the continuing sound failure was being observed were tablet and netbook systems which may be impacting the events being fired.
Once the restore code was transferred and redeployed, the same issues were observed occurring in precisely the same circumstances as the PowerModeChanged Event solution.
Brute-force Solution #3
In an attempt to find any way past the problem, Bill attempted to brute-force the solution by forcing every sound to reload the Source property every time that it was activated. Although the performance of this approach was going to be unacceptable, the lag time for loading a sound every time it is played would be noticeable to the player, as a debugging approach it was reasonable.
The result: slow, but successful. Despite every sound effect suffering from noticeable lag, after a suspend/restore cycle every sound effect returned and played correctly on every test system configuration.
Problem Identification
It was at this point that the cause of the problem was identified. The process that restores an application from a suspended state causes MediaElement sources to be invalidated. The problem with using either the PowerModeChanged Event or the Application_Activated Event for setting the source property, is that the event handlers are operating on a different thread than the application restore process and these threads are entering into a race condition.
In the first two solutions, on slower systems a few of the lighter-weight sound effects were being loaded into their MediaElement objects before the application restore process that invalidates the MediaElement sources was completed. As a result a few of the MediaElement sources were being invalidated after they had already been corrected - causing a few of the sound effects to fail in an inconsistent and unpredictable manner.
On-demand Resource Loading Solution #4
The final solution is to enforce on-demand resource loading and caching whenever a MediaElement is required to play a sound effect, moving the responsibility for checking the existence of a media source from the pre-loader to the code that plays the MediaElement itself.
As a result the PowerModeChanged Event handler is being used to nullify (set to Nothing) all MediaElement sources when a suspend event occurs. Then every time a MediaElement is about to be played the code first rechecks its source to ensure that it is not null, and reloads the source if it does not exist before playing it.
Private Sub SystemEvents_PowerModeChanged(ByVal sender As Object, ByVal e As PowerModeChangedEventArgs)
Select Case e.Mode
Case PowerModes.Resume
Case PowerModes.StatusChange
Case PowerModes.Suspend
SoundEffectA.Source = Nothing
SoundEffectB.Source = Nothing
'...etc...
End Select
End Sub
'In sound effect Play code:
If SoundEffectA.Source = Nothing Then
SoundEffectA.Source = New Uri("Resources/SoundEffectA.mp3", UriKind.Relative)
End If
SoundEffectA.Play()
The final effect is that sound effect failure has been eliminated, and a slight lag occurs the first time each sound effect is played immediately after a system restore, but returns to optimal performance soon after once each sound effect has been re-cached.
Monday, November 25, 2013
Counterexample #1 - Performance Engineering of Healthcare.gov
For the past few months we have all had ringside seats to the spectacular failure of planning and communication that is Healthcare.gov - the personalized health insurance marketplace run by the United States Federal Government.
We now know that the project team and its managers were aware of problems with the application as early as March. That insufficient testing, evolving requirements, and performance were all contributors to the limitations that were seen at launch where the system could only handle 1,100 users per day. Considering that initial estimates were anticipating 50 to 60 thousand simultaneous users and in reality has been seeing upwards of 250,000 simultaneous users, this is a remarkable example of the impact of failing to engineer for performance.
On September 27th, four days before go-live, the Acting Directory of the CMS Office of Enterprise Management David Nelson wrote the following, illuminating, quote: "We cannot proactively find or replicate actual production capacity problems without an appropriately sized operational performance testing environment." By September 30th, the day before go-live another email : "performance degradation started when there were around 1,100 to 1,200 users".
The Catalogue of Catastrophe, a list of failed or troubled projects around the world has this to say about the project: "Healthcare.gov joins the list of projects that underestimated the volume of transactions they would be facing (see "Failure to address performance requirements" for further examples)."
If we example the list of Classic Mistakes as to why projects fail we can see that Healthcare.gov committed no less than 7 out of the top 10. Clay Shirky has written
a fabulous article titled Healthcare.gov and the Gulf Between Planning and Reality, that explains the scope and magnitude of failure of communication that occurred on this project, as well as the inherent flaw in the statement "Failure is not an option."
We now know that the project team and its managers were aware of problems with the application as early as March. That insufficient testing, evolving requirements, and performance were all contributors to the limitations that were seen at launch where the system could only handle 1,100 users per day. Considering that initial estimates were anticipating 50 to 60 thousand simultaneous users and in reality has been seeing upwards of 250,000 simultaneous users, this is a remarkable example of the impact of failing to engineer for performance.
On September 27th, four days before go-live, the Acting Directory of the CMS Office of Enterprise Management David Nelson wrote the following, illuminating, quote: "We cannot proactively find or replicate actual production capacity problems without an appropriately sized operational performance testing environment." By September 30th, the day before go-live another email : "performance degradation started when there were around 1,100 to 1,200 users".
The Catalogue of Catastrophe, a list of failed or troubled projects around the world has this to say about the project: "Healthcare.gov joins the list of projects that underestimated the volume of transactions they would be facing (see "Failure to address performance requirements" for further examples)."
If we example the list of Classic Mistakes as to why projects fail we can see that Healthcare.gov committed no less than 7 out of the top 10. Clay Shirky has written
a fabulous article titled Healthcare.gov and the Gulf Between Planning and Reality, that explains the scope and magnitude of failure of communication that occurred on this project, as well as the inherent flaw in the statement "Failure is not an option."
Thursday, September 05, 2013
Application Security – Authorization Layers in Spring Security
Formerly Acegi Security System for Spring, Spring Security
is a powerful, flexible, and widely used framework for authentication and
authorization in your Java applications. If you are just starting with Spring
Security then the Spring Source1 getting started documentation and
tutorials are a great way to get your feet wet.
Once you understand the basics of how to implement a basic
security framework and the wealth of options at your fingertips, the questions
usually arise: “Which parts of this framework do I need to use?”, “What are
they for?”, and “When do I need to use them?”.
For many applications there are 3 layers of authorization that we typically need to be concerned about when implementing Spring Security.
- HTTP Request Authorization – verifying that a user is authenticated (if necessary) and authorized to access a specific URL.
- Service Layer Authorization – verifying that a user is authorized to access a specific method, class, or service.
- Component Authorization – verifying that a user is authorized to see or use a specific component, operation, logic, or data.
HTTP Request
Authorization
The basic tutorial example for security-app-context.xml4,5
<beans:beans xmlns:beans="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.springframework.org/schema/security"
xsi:schemalocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
http://www.springframework.org/schema/security
http://www.springframework.org/schema/security/spring-security-3.1.xsd">
<http use-expressions="true">
<intercept-url access="permitAll" pattern="/index.jsp">
<intercept-url access="hasRole('supervisor')" pattern="/secure/extreme/**">
<intercept-url access="isAuthenticated()" pattern="/secure/**">
<intercept-url access="isAuthenticated()" pattern="/listAccounts.html">
<intercept-url access="isAuthenticated()" pattern="/post.html">
<intercept-url access="denyAll" pattern="/**">
<form-login>
<logout>
</logout></form-login></intercept-url>
</intercept-url></intercept-url></intercept-url>
</intercept-url></intercept-url>
</http>
<authentication-manager>
<authentication-provider>
<user-service>
<user authorities="supervisor, teller, user" name="rod" password="koala">
<user authorities="teller, user" name="dianne" password="emu">
<user authorities="user" name="scott" password="wombat">
<user authorities="user" name="peter" password="opal">
</user></user></user></user></user-service>
</authentication-provider>
</authentication-manager>
</beans:beans>
The basic example provides a simple template for setting up user accounts, roles, and permissions based on URL patterns in your application. Although most real-world implementations will replace the authentication-provider due to the limitations of the example, the intercept-url example is reasonable to use with almost any framework that provides different views based on the provided URL.
Purpose
The primary focus of the HTTP Request Authorization layer is
to provide catch-all security for your application to prevent unauthorized
users from directly linking to, and accessing functions that they are not
allowed to access. This removes the necessity of adding custom authentication
code to every page of your application (depending on your framework and architecture)
and gives you a universal way to limit the severity of access/authentication
defects by forgetting to include or making mistakes with your authentication
code.
Limitations
The usefulness of this layer drops dramatically as
application complexity increases and each distinct URL provides a wealth of
functions to the user. Monolithic application frameworks that are built
entirely around a single URL may only find the basic authentication service
useful, whereas applications designed to segment functionality into different
URLs by role will get the most value out of it.
Service Layer
Authentication
The basic tutorial example for security annotations in
classes and methods:3
public interface BankService {
public Account readAccount(Long id);
public Account[] findAccounts();
@PreAuthorize(
"hasRole('supervisor') or " +
"hasRole('teller') and (#account.balance + #amount >= -#account.overdraft)" )
public Account post(Account account, double amount);
}
The basic example demonstrates annotating a method with a preauthorize Spring EL expression. This provides a powerful framework to provide complex security rules around both methods and classes and ensure your service operations are secure.
Purpose
The primary purpose of Service Layer Authentication using
annotations or interceptors is to safeguard access to services or operations
that should only be accessed by certain roles. This allows you to ensure that
only administrators can access administrative functions, read-only users cannot
access write operations, and to mitigate the chance that coding mistakes may
provide accidental access to services and operations that a role should not
have access to. It is best used as a safeguard to prevent unintentional access
to sensitive services.
Limitations
Due to the nature of the class and method annotations,
Service Layer Authentication does not provide a useful interface into the
visibility of the services it protects. It provides reactive security to negate
attempts to access a service, it does nothing to provide proactive information
about which roles can access the service. Common questions about Service Layer
Authentication often ask about how to catch the security exceptions that occur
or use the annotations to make control-flow decisions6,7. The answer
to those questions is complicated, but more importantly it should be
irrelevant. This layer is not intended to provide information to make those
decisions, and if the application is built well it should never be visible to
the user. It is best used only as a safeguard to avoid the consequences of
mistakes made in the HTTP Request Authentication, and the Component
Authorization layers.
Component
Authorization
An example of JSP Taglib security:8
<security:authorize ifAnyGranted="ROLE_ADMIN">
<tr>
<td colspan="2">
<input type="submit" value="<spring:message code="label.add"/>"/>
</td>
</tr>
</security:authorize>
An example of inline security:9
Authentication auth = SecurityContextHolder.getContext().getAuthentication();
if (auth != null) {
if (auth.getPrincipal() instanceof UserDetails) {
report.setUser(new User(((UserDetails) auth.getPrincipal()).getUsername()));
} else {
report.setUser(new User(auth.getPrincipal().toString()));
}
}
These examples demonstrate two quick methods of using Component Authorization, through the use of the Spring Security JSP Taglib and using the Spring Security java API.
Purpose
This layer provides component-level security and allows you
to make control flow decisions based on role. It is the connecting layer
between the page-based HTTP Request Authorization layer and the method and
class level of the Service Layer Authentication that is vital for any
application that provides heavyweight or multi-function URLs. This is the
developer’s security layer that allows you to turn on and off components or
make decisions at any point in your code to provide access to specific
functions, links, or workflows.
Limitations
Using Component Authorization is repetitive and requires an
intimate understanding of which roles have access to which operations and when.
It is not optimal to use to provide page based security and basic
authentication, because that is better handled with the HTTP Request
Authorization layer which is easier, universal, and more reliable. It is not
optimal to provide class a method layer security, because that is better
handled with the Service Layer Authentication which can annotate interfaces,
abstract classes, and interceptors and does not require as much repetition or
context-related knowledge to be applied effectively.
Final Thoughts
Spring Security is a useful and powerful tool, but it is
best used when each type of security layer it provides is used effectively and
for the purpose that it was designed. A carefully considered multi-prong
approach to securing your application will provide a simpler, more elegant, and
more secure solution.
References
Wednesday, August 14, 2013
Agile Performance Testing
Performance Testing is an often neglected component of application development and system replacement projects. It is often ignored or relegated to the end of the project for a period of “performance tuning activities” in favor of functional development. And when a project starts to go off the tracks, targets get pushed, then QA and time allocated for tuning gets cut.
Why does this legacy of waterfall planning continue to exist within an Agile world?
Waiting until a system is functionally complete to start performance testing will not save money and increases risk that can be mitigated by incorporating it into your cycle of sprints.
Failure to test the performance of your system can mean risking an underbuilt environment leading to delays, downtime, and unhappy users. When the system is up and running, it may be slow, unresponsive, and extended queues can infuriate users.
Performance testing at the end of the project during stabilization means you may have time to build out your environment to meet initial demand, but specific performance problems due to poor code, or inefficient architecture will not have time to be resolved unless go-live is delayed. Performance testing too late in the project risks sluggish performance and intolerable wait times for some operations and can mean dissatisfied users and cycles of emergency patches to improve performance.
Performance testing can be accomplished in an agile project by incorporating it as part of the agile process and be willing to prioritize it appropriately.
Effective performance testing is something that is planned for and included starting from inception of the project and is part of a continuous cycle of QA. Every story must include as part of its QA acceptance a set of performance metrics that it must meet before it can be marked as complete.
The standard story card “As an xxxx I want yyyy so that zzzz” defines the typical requirements of a story and are generally further defined by acceptance criteria (“x is working”, “x cannot do y without doing z”, “x is stored in y for use elsewhere”).
Acceptance criteria has the following benefits:
- The get the team to think through how a feature or piece of functionality will work from the user’s perspective
- They remove ambiguity from requirements
- They form the tests that will confirm that a feature or piece of functionality is working and complete.
But acceptance criteria generally only defines functional acceptance. You will rarely see acceptance criteria such as the following: “x must return results in less than y seconds when the server is under z load 19 times out of 20 and less than u seconds when the server is under w load 18 times out of 20.”
A good performance testing plan will define:
- The performance criteria that the system is required to meet
- An explanation of how the performance criteria will be measured and how it matches against business objectives
- Remediation steps to explain how failures will be prioritized, handled, and resolved.
A team that has a solid grasp of the importance of system performance can incorporate performance testing tasks and remediation into an agile project by defining performance objectives, systematically evaluating the system, and defining failures as remediation stories that get fed into the backlog and prioritized.
A common issue with many development teams is a shortage of resources with the depth of experience to conduct effective and efficient performance testing. One option to consider is hiring a team with the knowledge and specialization to analyze, manage, educate and implement a performance testing plan in a cost-effective manner.
The MNP Performance Testing team has the tools and experience to conduct thorough performance analysis while integrating seamlessly within the Agile process on projects both large and small.
Subscribe to:
Posts (Atom)