Spring Batch参考文档中文版

6.13.1 自定义 ItemReader 示例

为了实现这个目的,我们实现一个简单的 ItemReader, 从给定的list中读取数据。我们将实现最基本的 ItemReader 功能, read:

public class CustomItemReader<T> implements ItemReader<T>{

    List<T> items;

    public CustomItemReader(List<T> items) {
        this.items = items;
    }

    public T read() throws Exception, UnexpectedInputException,
       NoWorkFoundException, ParseException {

        if (!items.isEmpty()) {
            return items.remove(0);
        }
        return null;
    }
}

这是一个简单的类, 传入一个 items list, 每次读取时删除其中的一条并返回。如果list里面没有内容,则将返回null, 从而满足 ItemReader 的基本要求, 测试代码如下所示:

List<String> items = new ArrayList<String>();
items.add("1");
items.add("2");
items.add("3");

ItemReader itemReader = new CustomItemReader<String>(items);
assertEquals("1", itemReader.read());
assertEquals("2", itemReader.read());
assertEquals("3", itemReader.read());
assertNull(itemReader.read());

使 ItemReader 支持重启

现在剩下的问题就是让 ItemReader 变为可重启的。到目前这一步,如果发生掉电之类的情况,那么必须重新启动 ItemReader,而且是从头开始。在很多时候这是允许的,但有时侯更好的处理办法是让批处理作业在上次中断的地方重新开始。判断的关键是根据 reader 是有状态的还是无状态的。无状态的 reader 不需要考虑重启的情况, 但有状态的则需要根据其最后一个已知的状态来重新启动。出于这些原因, 官方建议尽可能地让 reader 成为无状态的,使开发者不需要考虑重新启动的情况。

如果需要保存状态信息,那应该使用 ItemStream 接口:

public class CustomItemReader<T> implements ItemReader<T>, ItemStream {

    List<T> items;
    int currentIndex = 0;
    private static final String CURRENT_INDEX = "current.index";

    public CustomItemReader(List<T> items) {
        this.items = items;
    }

    public T read() throws Exception, UnexpectedInputException,
        ParseException {

        if (currentIndex < items.size()) {
            return items.get(currentIndex++);
        }

        return null;
    }

    public void open(ExecutionContext executionContext) throws ItemStreamException {
        if(executionContext.containsKey(CURRENT_INDEX)){
            currentIndex = new Long(executionContext.getLong(CURRENT_INDEX)).intValue();
        }
        else{
            currentIndex = 0;
        }
    }

    public void update(ExecutionContext executionContext) throws ItemStreamException {
        executionContext.putLong(CURRENT_INDEX, new Long(currentIndex).longValue());
    }

    public void close() throws ItemStreamException {}
}

每次调用 ItemStream 的 update 方法时, ItemReader 的当前 index 都会被保存到给定的 ExecutionContext 中,key 为 'current.index'。当调用 ItemStream 的open 方法时, ExecutionContext会检查是否包含该 key 对应的条目。如果找到key, 那么当前索引 index 就好移动到该位置。这是一个相当简单的例子,但它仍然符合通用原则:

ExecutionContext executionContext = new ExecutionContext();
((ItemStream)itemReader).open(executionContext);
assertEquals("1", itemReader.read());
((ItemStream)itemReader).update(executionContext);

List<String> items = new ArrayList<String>();
items.add("1");
items.add("2");
items.add("3");
itemReader = new CustomItemReader<String>(items);

((ItemStream)itemReader).open(executionContext);
assertEquals("2", itemReader.read());

大多数ItemReaders具有更加复杂的重启逻辑。例如 JdbcCursorItemReader , 存储了游标(Cursor)中最后所处理的行的 row id。

还值得注意的是 ExecutionContext 中使用的 key 不应该过于简单。这是因为 ExecutionContext 被一个 Step中的所有 ItemStreams 共用。在大多数情况下,使用类名加上 key 的方式应该就足以保证唯一性。然而,在极端情况下, 同一个类的多个 ItemStream 被用在同一个Step中时( 如需要输出两个文件的情况),就需要更加具备唯一性的name标识。出于这个原因,Spring Batch 的许多ItemReader 和 ItemWriter 实现都有一个 setName() 方法, 允许覆盖默认的 key name。