Wednesday, February 22, 2012

Using Spring Batch for Large Volume SOA Integration Projects

Introduction

Over the last several years, I've worked on a number of Oracle Fusion Middleware projects that used Oracle SOA Suite for batch data integration.  SOA Suite has some great technology adapters that make data integration easy, but definitely has challenges with processing large volumes of data.  Common issues that I've run into include:
  • Out of memory errors when reading data from a database query that returns many rows
  • When writing to a file that has header or footer elements, having to keep all the XML output payload in memory
  • Very slow performance
  • Lack of support in the DBAdapter and other technology adapters for processing rows in chunks
  • Inability to suspend a process and then restart it without processing all the records over again
  • No built-in support for the concept of batch jobs and a job execution/monitoring environment.

Recently, I've started using the Spring Batch framework developed by the same folks that have made the very popular Spring Framework.  Spring Batch offers some very compelling features and integrates nicely with SOA Suite 11g. 

High-Level View of Spring Batch Framework

The following diagram appears in Section 3.0 of the Spring Batch Reference Documentation and serves to illustrate the major components of the framework.

 
The blue components are part of the job configuration and administrations side of Spring Batch.  The yellow components are used to read, process, and write data to a variety of different sources or destinations.  In typical SOA Suite terminology, the yellow components are technology adapters.

Item Readers or Writers support a wide set of technology including:
  • Databases - includes support for JDBC, JPA, Hibernate, and iBATIS.
  • Files -with fixed or delimited formats
  • XML - includes streaming very large XML documents
  • Message technology - JMS
  • Custom-developed Java objects 

Spring Batch also supports partitioning/chunking as shown by the following diagram taken from the Spring Batch web site.




ItemReader/ItemWriter Comparison with Oracle SOA Suite Technology Adapters

Some features of the Spring Batch Item Readers that I find really useful are:

Database Reader
  • Supports streaming of data using cursors or pages so the entire result set does not have to be stored in memory
  • Queries can be easily changed at run time
  • Fine-grained control of the fetch size, maximum rows, and query timeouts
  • Support for cursors with stored procedures or functions
File-based Reader

  • Much easier to specify the file format than using the Oracle Native Format Wizard
  • Support for handling comments in the input file
  • Ability to specify the number of lines to skip 
  • Ability to handle different record layouts within the same file
  • Remembers the last record processed in case you need to stop or restart
XML Reader
  • Can support huge XML payloads by using the streaming StAX parser
  • Easy to split huge XML payloads into smaller chunks

The biggest downside that the average SOA Suite developer would find with using Spring Batch is the lack of wizard supported configuration and the requirement to develop code using Java code and Spring components. 

I will provide some additional examples of using Spring Batch and Oracle SOA Suite 11g together in further blog posts.

1 comment:

  1. Very Good Article Bruce. I feel Oracle's DB Adapter is bit of pain, when you have data centric integration.I have a scenario where I have update 50 tables after performing some transformation,but creating 50 Adapters for that seems very unreasonable. I am looking forward for your next post with additional examples of using Spring Batch and Oracle SOA Suite 11g.

    ReplyDelete