Description
Jimmi Changa opened BATCH-1586 and commented
OBSERVED BEHAVIOR
The primary keys for the the spring_batch tables can are not always being saved in sequence. For example, we would see an rows created in the following order: 1, 2, 3, 4, 20, 21, 22, 40, 41, 23, 24, 42.
Having gaps between primary keys is common due to Oracle sequences not reverting during a rollback. What is not common, especially in a non-clustered environment, is the sequences having the appearance of decreasing. In this example, the sequence went from 41 to 23. This is also normally not an issue if the developer does not rely on using the primary key for ordering or querying for order. However, in this case, the third party Spring Batch library does indeed query on the primary key for ordering which is a design flaw leading to Spring Batch thinking our job has already run ("JobInstance already exists and is not restartable").
ROOT CAUSE
In a clustered DB environment, whenever someone opens a database connection, there is no guarantee which cluster node the user will connect to if the sequences are setup per the SQL files included with Spring Batch. In Oracle, the default cache size for a sequence is 20 when created as such in the resources file business-schema-oracle10g.sql:
CREATE SEQUENCE CUSTOMER_SEQ START WITH 5;
CREATE SEQUENCE BATCH_STAGING_SEQ START WITH 0 MINVALUE 0;
CREATE SEQUENCE TRADE_SEQ START WITH 0 MINVALUE 0;
This eventually leads to possibilities where the primary key can be written out of sequence specially when new DB connections are established. This is not a problem because most DBs only guarantee a unique primary key -- not a primary key in order. The true problem seems to be that Spring Batch is relying on the primary key being in order which I've always been taught should not be done. Developers shouldn't read the primary key for ordering.
PROPOSED SOLUTIONS
There are several potential solutions.
(1) Specify 'nocache' for each of the three sequences
(e.g., CREATE SEQUENCE CUSTOMER_SEQ START WITH 5 NOCACHE;)
(2) Specify a shared pooling for each of the three sequences
(3) Design the Spring Batch tables/logic not to rely on the primary key increasing in order. A new column could be added to the table, for example, for tracking ordering.
I lean towards solution (3) because it would solve the problem regardless of which DB utilized and is considered good software design. This primary key sequencing issue may also be present in other DBs besides Oracle so solution (3) would be database agnostic. Solution (1) could be a cheap and easy fix until (3) is completed.
Affects: 2.0.4
2 votes, 6 watchers