Thursday, November 17, 2011

Error Handling in SOA - Part 1

           Error handling is an important part in any software development but most of the times projects don't give enough attention to the error handling while designing the applications or services. Projects always spend time on the functionality and happy paths and miss critical error handling scenarios.

In monolithic applications, error handling could be easy as all the functionality resides within the application and it could be easy to change in the later stage if required. Whereas error handling is a significant step in SOA applications as SOA integrates heterogeneous applications across the organization, vendors and partners. It is difficult to capture all possible errors scenarios and changes to these services later would be complex and requires lot of effort. Hence focus on the error handling is significant in the SOA based applications.

Focus on error handling from the early stages (Requirements, Design and Development) ensures that appropriate error handling standards/best practices are put in place for processes, services and components at all platforms. While designing error handling, one should look bound exception handling to determine stakeholders for different kind of errors, monitoring, notification, classification of errors, governance process and others.

In this blog, I am going to talk about what information is required and when it should be captured to design an appropriate error handling in SOA. It's crucial to capture error handling scenarios in Functional Design (Service Identification) and Technical Design (Service Specification) phases. A proper error handling   in SOA reduces lot of maintenance effort, helps FAM & TAM and helps to meet SLA's of services

Classification of errors:
1.     Business errors
These are business rule exceptions, service provider functional errors like Customer is invalid, customer credit check failed, stock not available
2.     Technical or runtime errors
These are runtime exceptions such as mapping exception, invalid characters, and null point exceptions.
3.     System unavailability
These are end points unavailable exception such as requested service is unavailable due to maintenance or break down.

Errors can be further classified into recoverable and non-recoverable errors.

Recoverable Errors: Recoverable errors are errors which can be recovered by taking alternative paths such errors are result of failure to a business rule or end point is not available. Business errors and system unavailability are recoverable errors.

Non-recoverable errors: non-recoverable errors are errors which can’t be recovered such errors are runtime errors. Technical errors are mostly non-recoverable errors.
----------------------------------------------------------------------------------------------------------------------
Functional Design or Service Identification Phase:
In this phase, a functional analyst defines service requirements, goals, capabilities, use cases, business processes, etc. Analyst should also try to answer questions like below to capture requirements around the business exceptions.
  1. What kinds of error could occur?
  2. How to handle those errors?
  3. Who are the stake holders to receive error notifications?
  4. What kind of monitoring capabilities are required?
  5. Does these errors require roll back scenarios?
  6. What kind of errors are eligible for retry?
  7. Is it allowed to update message data before retry?
Business error scenarios: Identify all possible business exceptions that flags the business operation has invalid. Defining information like below will help service designers to consider possible business exception scenarios and provide solutions to handle them.

Error Scenario: describe the error scenario
Error Code: Define a error code for all business exceptions
Error Text: A meaningful text which describes the error
Severity Level: define the severity of the error
Notification: If this requires a notification, who and where notification should be sent
Monitoring: How errors should be displayed on the monitoring GUI
Retry: Is it retry-able? if yes, who should be able to retry? (Consumer or middle-ware)
Message Editable: is it allowed to update the message data from monitoring GUI before retry?
----------------------------------------------------------------------------------------------------------------------
Technical Design or Service Specification Phase:
In this phase, a technical architect defines message pattern, invocation style, service composition, schemas and technical design around the service implementation. Architect should consider above defined business exception requirements while designing the service and should also answer questions like above to define technical exceptions.

Technical /Runtime error scenarios: Identify all possible places where logic could throw an exception and if possible try to catch the specific exception or use default catch to catch all runtime exceptions. Defining information like below will help would help service implementers to consider all possible cases.


Error scenario: describe the error scenario
Error Code: Since it is difficult to define all possible runtime exceptions, try to define critical errors which could occur.
Error Text: A meaningful text which describes the error
Severity Level: define the severity of the error
Notification: If this requires a notification, who and where should be sent
Monitoring: How this should be displayed on the monitoring GUI
Retry: Is it retry-able? if yes, who should be able to retry? (Consumer or middle-ware)
Message Editable: is it allowed to update the message data from monitoring GUI to resubmit?
Roll Back scenarios: if applicable, define roll back scenario to compensate the transactions

----------------------------------------------------------------------------------------------------------------------
Fault message data model:
Identify metadata and common schema to describe the errors consistently across the organization. This metadata and schema could be reusable across the services and would be very useful for service monitoring and notifications.

Sample metadata attributes:
  • Message Code
  • Message Text
  • Message Status: Warning/Error/Success
  • Severity
  • Source
  • Message Text Language
  • Time-stamp
  • Message Classification: Business/runtime/system unavailability
----------------------------------------------------------------------------------------------------------------------
Fault propagation:
Fault propagation can be accomplished in different ways based on the service message pattern like request-reply, request-response and one way.

Request-Reply:
In this pattern, consumer waits for response hence fault should be propagated directly to consumers immediately.

Request-response:
In this pattern, fault would be propagated to consumer but service could also look for alternative paths before propagating fault to consumer. there are different ways to handle faults in this pattern and would be based on service requirements.

One way:
In this pattern, fault propagation to consumer doesn't require but fault should be logged and handled. design and implementation would be based on service requirements. 

Irrespective of pattern, there are two popular ways to propagate fault to consumers or monitoring applications. 
  1. SOAP fault
  2. Custom fault payload either part of response message to message header 
Both has it's pros and cons.Choice is mostly based on existing application capabilities and platforms. As a best practice, non-recoverable faults are better suited to be returned as SOAP faults and recoverable business faults are better to be returned using custom fault payload either part of response or response message header but custom payload requires an additional effort from consumer to parse through the response to determine the fault.
----------------------------------------------------------------------------------------------------------------------
In my next blog, I'll write about error handling during service implementation (realization) phase.

9 comments: