Spring Batch — Import CSV to Database (Complete Guide)
Importing CSV files into a database is one of the most common batch tasks. This guide shows a practical, production-ready approach using Spring Batch: how to read CSVs, validate records, skip faulty rows, write to DB efficiently, and design for restarts and observability.
FlatFileItemReader + ItemProcessor for validation + JdbcBatchItemWriter or JpaItemWriter for writes. Add faultTolerant() with skip or SkipPolicy, tune chunk size, and ensure DB connection pool >= concurrent writers.
Why this guide is different
- Focus on real-world pitfalls (validation, duplicates, partial failures)
- Production advice: retries, skip listeners, idempotency
- Configuration and tuning: chunk size, DB pool, throughput
- Complete runnable snippets (drop into Spring Boot)
Maven dependencies
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-batch</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-jdbc</artifactId> </dependency> <dependency> <groupId>com.h2database</groupId> <artifactId>h2</artifactId> <scope>runtime</scope> </dependency>
application.properties (example)
spring.datasource.url=jdbc:h2:mem:testdb spring.datasource.username=sa spring.datasource.password= spring.h2.console.enabled=true # Spring Batch spring.batch.job.enabled=false # Hikari tuning (example) spring.datasource.hikari.maximum-pool-size=30
Sample CSV (employees.csv)
id,name,email,department,salary 1,John Doe,john@example.com,Engineering,72000 2,Jane Smith,jane@example.com,HR,50000 3,Broken,invalid-email,Sales,45000 4,Sam Blue,sam@example.com,Engineering,80000
Entity (POJO)
public class Employee {
private Long id;
private String name;
private String email;
private String department;
private BigDecimal salary;
// getters & setters
}
FlatFileItemReader — robust setup
Use FlatFileItemReader with a LineMapper (tokenizer + field set mapper). Add @StepScope to inject job parameters like file path.
@Bean
@StepScope
public FlatFileItemReader<Employee> reader(@Value("#{jobParameters['file']}") String file) {
return new FlatFileItemReaderBuilder<Employee>()
.name("employeeReader")
.resource(new FileSystemResource(file))
.linesToSkip(1)
.delimited()
.names("id","name","email","department","salary")
.fieldSetMapper(fieldSet -> {
Employee e = new Employee();
e.setId(fieldSet.readLong("id"));
e.setName(fieldSet.readString("name"));
e.setEmail(fieldSet.readString("email"));
e.setDepartment(fieldSet.readString("department"));
e.setSalary(fieldSet.readBigDecimal("salary"));
return e;
})
.build();
}
Validation in ItemProcessor
Validate business rules in the processor. Throw an exception for invalid data so SkipPolicy or skip configuration can handle it.
public class EmployeeProcessor implements ItemProcessor<Employee, Employee> {
@Override
public Employee process(Employee emp) throws Exception {
if (emp.getEmail() == null || !emp.getEmail().contains("@")) {
throw new IllegalArgumentException("Invalid email: " + emp.getEmail());
}
if (emp.getSalary() == null || emp.getSalary().doubleValue() < 0) {
throw new IllegalArgumentException("Invalid salary");
}
// trim / normalize
emp.setName(emp.getName().trim());
return emp;
}
}
JdbcBatchItemWriter — efficient batch writes
JdbcBatchItemWriter is fast for bulk inserts. Use parameterized SQL and map fields from the item.
@Bean
public JdbcBatchItemWriter<Employee> writer(DataSource ds) {
return new JdbcBatchItemWriterBuilder<Employee>()
.dataSource(ds)
.sql("INSERT INTO employee (id, name, email, department, salary) VALUES (:id, :name, :email, :department, :salary)")
.beanMapped()
.build();
}
Job configuration — chunk + fault tolerance
Use chunk-oriented step, tune chunk size (commit interval), and enable fault tolerance with skips. Also attach a SkipListener to log & audit skipped rows.
@Bean
public Job importJob(JobRepository jobRepo, Step importStep) {
return new JobBuilder("importJob", jobRepo)
.start(importStep)
.build();
}
@Bean
public Step importStep(JobRepository jobRepo, PlatformTransactionManager txManager,
FlatFileItemReader<Employee> reader,
EmployeeProcessor processor,
JdbcBatchItemWriter<Employee> writer) {
return new StepBuilder("importStep", jobRepo)
.<Employee, Employee>chunk(100, txManager)
.reader(reader)
.processor(processor)
.writer(writer)
.faultTolerant()
.skip(IllegalArgumentException.class)
.skipLimit(50)
.listener(new LoggingSkipListener())
.build();
}
SkipListener — audit skipped records
public class LoggingSkipListener implements SkipListener<Employee, Employee> {
@Override
public void onSkipInRead(Throwable t) { /* log read errors */ }
@Override
public void onSkipInWrite(Employee item, Throwable t) { /* log write errors */ }
@Override
public void onSkipInProcess(Employee item, Throwable t) {
// persist to error table or write to failed CSV for reprocessing
System.err.println("Skipped: " + item + " because " + t.getMessage());
}
}
Transactional behavior & restartability
- Chunk commits happen per transaction. If a write fails, that chunk rolls back.
- To support job restarts, rely on Spring Batch metadata (JobRepository / execution context).
- Avoid in-processor state that cannot be reconstructed on restart (use StepExecutionContext if needed).
Idempotency: make writes safe
If a write could be attempted multiple times (retries, restarts), ensure idempotency. Use DB upserts or a unique constraint + insert-if-not-exists logic.
-- Example Postgres upsert (for JdbcBatchItemWriter) INSERT INTO employee (id, name, email, department, salary) VALUES (:id, :name, :email, :department, :salary) ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name, email = EXCLUDED.email, department = EXCLUDED.department, salary = EXCLUDED.salary;
Performance tuning checklist
- Chunk size: start with 100, test 50/200/500 — larger reduces commits but increases rollback cost
- JDBC batch size: ensure your writer batches SQL statements (JdbcBatchItemWriter does by default)
- DB pool: maxPoolSize >= concurrent writer threads
- Monitoring: measure job duration, DB latency, connection usage, GC pauses
Observability: instrumenting with Micrometer
@Autowired MeterRegistry registry;
Timer processTimer = registry.timer("batch.employee.process.time");
public Employee process(Employee e) {
return processTimer.record(() -> {
// validation and enrichment
return e;
});
}
Testing & local debugging
# Trigger job via REST (example)
curl -s "http://localhost:8080/jobs/import?file=/tmp/employees.csv"
# Use JobLauncherTestUtils for integration tests
JobExecution exec = jobLauncherTestUtils.launchJob(new JobParametersBuilder()
.addString("file", "/tmp/test.csv")
.toJobParameters());
assertEquals(BatchStatus.COMPLETED, exec.getStatus());
Troubleshooting common issues
| Symptom | Possible cause | Fix |
|---|---|---|
| Many skipped rows | Validation rules too strict or bad CSV | Log failed rows, relax/adjust validation, provide sample to users |
| Job slow / DB connections exhausted | Too many concurrent writers vs pool size | Increase Hikari maxPoolSize or reduce parallelism |
| Duplicates after restart | Writes not idempotent | Use upsert or unique constraints + dedupe logic |
Real-world pattern: Failed-items dead-letter table
Persist failed rows into an error table with failure reason and job metadata. This allows analysts to inspect and reprocess only problematic items.
CREATE TABLE employee_failed ( id BIGINT, payload CLOB, error_message VARCHAR(1000), job_name VARCHAR(200), failed_at TIMESTAMP );
Sample REST controller to trigger job
@RestController
@RequestMapping("/jobs")
public class JobLauncherController {
@Autowired JobLauncher jobLauncher;
@Autowired Job importJob;
@GetMapping("/import-csv")
public String importCsv(@RequestParam String file) throws Exception {
JobParameters params = new JobParametersBuilder()
.addString("file", file)
.addLong("time", System.currentTimeMillis())
.toJobParameters();
jobLauncher.run(importJob, params);
return "Job launched";
}
}
FAQ
- Is FlatFileItemReader thread-safe? No — use StepScope or partitioning to avoid sharing a reader across threads.
- How many records per second? Depends on DB, network, and chunk size — benchmark! Use Micrometer counters.
- Should I use JdbcBatchItemWriter or JpaItemWriter? JdbcBatchItemWriter is usually faster for bulk inserts; JPA can be used if you need entity lifecycle features.
Conclusion
This pattern (reader → processor → writer) with robust validation, skip & audit, idempotent writes, and sensible tuning will make CSV import jobs reliable and maintainable in production. If you'd like, I can convert this into a runnable GitHub repo (pom, Spring Boot app, sample CSV) so you can clone and run locally.
๐ฅ Related Spring Batch CSV Processing Guides
Build robust CSV-to-database batch pipelines by exploring related Spring Batch concepts such as file handling, processing logic, error management, and performance tuning.
๐งฑ Spring Batch Core Components
Understand how ItemReader, ItemProcessor, and ItemWriter work together in a complete CSV-to-DB batch job.
๐ Read Multiple CSV Files
Extend CSV-to-DB jobs to support multiple input files using resource-aware ItemReaders.
๐ Spring Batch ItemProcessor Example
Apply validation, transformation, and enrichment logic before writing CSV records into the database.
๐ซ Skip Policy & Error Handling
Handle malformed CSV rows and database write failures gracefully using skip and fault-tolerant configurations.
๐ Conditional Flow in Spring Batch Jobs
Control job execution paths based on CSV validation or database write outcomes.
๐งต Multithreaded Step in Spring Batch
Improve throughput when importing large CSV files into the database using parallel processing.