Understanding ItemProcessor in Spring Batch with Practical, Real-World Examples

Understanding ItemProcessor in Spring Batch (Deep Practical Guide)

ItemProcessor is one of the most powerful—and often misunderstood—components in Spring Batch. It acts as a **bridge** between reading data and writing data, allowing you to **validate**, **transform**, **filter**, or **enrich** each item before it moves forward in the batch pipeline.

Processing Flow:
ItemReader → ItemProcessor → ItemWriter

ItemProcessor decides whether data is valid, transformed, enriched, or discarded.

๐Ÿ” What Is ItemProcessor?

ItemProcessor is a functional interface with a single method:

T process(T item) throws Exception;
You return: - **Processed/transformed item** → continues to writer - **null** → item is filtered/skipped - **throw Exception** → step may fail unless SkipPolicy is defined

๐Ÿงฑ When to Use an ItemProcessor?

  • Validate fields (email, phone, numbers)
  • Transform names, formats, dates, units
  • Filter invalid or incomplete rows
  • Enrich employee data (e.g., fetch department)
  • Mask sensitive information
  • Apply business logic before persistence
๐Ÿ’ก Rule of thumb: Use ItemProcessor for *per-row operations*. For batch-wide operations, use listeners instead.

๐Ÿงช Practical Example: Email Validation Processor

๐Ÿ“Œ EmailValidationProcessor.java

@Component
public class EmailValidationProcessor implements ItemProcessor {

    @Override
    public Employee process(Employee employee) {

        String email = employee.getEmail();

        if (email != null && email.matches("^[A-Za-z0-9+_.-]+@[A-Za-z0-9.-]+$")) {
            return employee;  // valid → move to writer
        }

        return null; // invalid → skip this item
    }
}

Returning null automatically tells Spring Batch to skip the record.

Spring Batch considers "null return" as "filtered item, not an error".

⚙️ Updating Step Configuration to Use Processor

@Bean
public Step csvStep() {
    return new StepBuilder("csv-step", jobRepository)
            .chunk(10, transactionManager)
            .reader(csvReader())
            .processor(emailValidationProcessor)
            .writer(jpaItemWriter())
            .build();
}

Only one line changes: .processor(emailValidationProcessor)


๐Ÿ“Œ Advanced ItemProcessor Examples

1️⃣ Transforming Names (Uppercase)

public class NameFormatterProcessor implements ItemProcessor {

    @Override
    public Employee process(Employee emp) {
        emp.setName(emp.getName().toUpperCase());
        return emp;
    }
}
---

2️⃣ Filtering Salaries Below Threshold

public class SalaryFilterProcessor implements ItemProcessor {

    @Override
    public Employee process(Employee emp) {
        return emp.getSalary() < 20000 ? null : emp;
    }
}
---

3️⃣ Enriching Data with External API

public class DepartmentEnrichmentProcessor implements ItemProcessor {

    @Autowired
    private DeptService deptService;

    @Override
    public Employee process(Employee emp) {
        String dept = deptService.getDepartment(emp.getEmail());
        emp.setDepartment(dept);
        return emp;
    }
}
---

4️⃣ Multiple Processors Using CompositeItemProcessor

@Bean
public CompositeItemProcessor compositeProcessor() {

    List> processors = List.of(
        new EmailValidationProcessor(),
        new NameFormatterProcessor(),
        new SalaryFilterProcessor()
    );

    CompositeItemProcessor cip = new CompositeItemProcessor<>();
    cip.setDelegates(processors);
    return cip;
}
๐Ÿ’ก CompositeItemProcessor = chain of processors Great for complex pipelines.

๐Ÿง  Internal Flow of ItemProcessor

FlatFileItemReader → Employee  
  ↓  
EmailValidationProcessor → null (if invalid)  
  ↓  
NameFormatterProcessor → "JOHN"  
  ↓  
JpaItemWriter → Persist to DB

๐Ÿšจ Common Mistakes to Avoid

  • Throwing exceptions instead of returning null
  • Doing multi-row operations inside processor (use listeners instead)
  • Using heavy operations (API calls) without caching
  • Not logging skipped records
  • Processing-sensitive data without masking

๐Ÿงฉ Logging Skipped Records (Best Practice)

@Override
public Employee process(Employee employee) {

    if (!isValid(employee)) {
        log.warn("Skipping invalid record: {}", employee);
        return null;
    }

    return employee;
}

❓ FAQ

1. Is ItemProcessor mandatory?

No. If you don’t need filtering/transformation, you can skip it.

2. Can ItemProcessor skip bad records?

Yes — return null.

3. What if a processor throws an exception?

The step fails unless SkipPolicy or FaultTolerantStep is configured.

4. Can I use multiple processors?

Yes — using CompositeItemProcessor.

5. Is ItemProcessor executed per chunk or per row?

Per row.


๐Ÿ“ Summary

  • ItemProcessor is ideal for validation, transformation, filtering, and enrichment.
  • Returning null skips the record safely.
  • CompositeItemProcessor allows chaining multiple processors.
  • Use processors for business logic, not step-level logic.
  • Always log skipped records in production pipelines.

Mastering ItemProcessor makes your Spring Batch jobs significantly more robust, reusable, and production-ready.

๐Ÿ”„ Related Spring Batch Processing Guides

Learn how ItemProcessor fits into the Spring Batch ecosystem by exploring related topics such as file reading, error handling, job flow control, and performance optimization.

๐Ÿงฑ Spring Batch Core Components

Understand how ItemProcessor works alongside ItemReader and ItemWriter in chunk-oriented batch processing.

๐Ÿ“‚ Read Multiple CSV Files

Learn how processed records flow from multiple CSV sources through ItemProcessor logic.

๐Ÿ“ฅ CSV to Database with Spring Batch

See how transformed data from ItemProcessor is written efficiently into database tables.

๐Ÿšซ Skip Policy & Error Handling

Handle validation failures and transformation errors inside ItemProcessor using skip policies.

๐Ÿ”€ Conditional Flow in Spring Batch Jobs

Control job execution paths based on processing outcomes and ItemProcessor results.

๐Ÿงต Multithreaded Step in Spring Batch

Improve throughput by executing ItemProcessor logic in parallel with thread-safe configurations.