问题 如何使用JpaRepository进行批量(多行)插入?


打电话的时候 saveAll 我的方法 JpaRepository 很久了 List<Entity> 从服务层,Hibernate的跟踪日志记录显示每个实体发出的单个SQL语句。

我可以强制它进行批量插入(即多行),而无需手动拨动 EntityManger,事务等甚至原始的SQL语句字符串?

对于多行插入,我的意思不仅仅是从以下过渡:

start transaction
INSERT INTO table VALUES (1, 2)
end transaction
start transaction
INSERT INTO table VALUES (3, 4)
end transaction
start transaction
INSERT INTO table VALUES (5, 6)
end transaction

至:

start transaction
INSERT INTO table VALUES (1, 2)
INSERT INTO table VALUES (3, 4)
INSERT INTO table VALUES (5, 6)
end transaction

而是:

start transaction
INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)
end transaction

在PROD中,我使用的是CockroachDB,性能差异很大。

下面是一个重现问题的简单示例(H2为简单起见)。


./src/main/kotlin/ThingService.kt

package things

import org.springframework.boot.autoconfigure.SpringBootApplication
import org.springframework.boot.runApplication
import org.springframework.web.bind.annotation.RestController
import org.springframework.web.bind.annotation.GetMapping
import org.springframework.data.jpa.repository.JpaRepository
import javax.persistence.Entity
import javax.persistence.Id
import javax.persistence.GeneratedValue

interface ThingRepository : JpaRepository<Thing, Long> {
}

@RestController
class ThingController(private val repository: ThingRepository) {
    @GetMapping("/test_trigger")
    fun trigger() {
        val things: MutableList<Thing> = mutableListOf()
        for (i in 3000..3013) {
            things.add(Thing(i))
        }
        repository.saveAll(things)
    }
}

@Entity
data class Thing (
    var value: Int,
    @Id
    @GeneratedValue
    var id: Long = -1
)

@SpringBootApplication
class Application {
}

fun main(args: Array<String>) {
    runApplication<Application>(*args)
}

./src/main/resources/application.properties

jdbc.driverClassName = org.h2.Driver
jdbc.url = jdbc:h2:mem:db
jdbc.username = sa
jdbc.password = sa

hibernate.dialect=org.hibernate.dialect.H2Dialect
hibernate.hbm2ddl.auto=create

spring.jpa.generate-ddl = true
spring.jpa.show-sql = true

spring.jpa.properties.hibernate.jdbc.batch_size = 10
spring.jpa.properties.hibernate.order_inserts = true
spring.jpa.properties.hibernate.order_updates = true
spring.jpa.properties.hibernate.jdbc.batch_versioned_data = true

./build.gradle.kts

import org.jetbrains.kotlin.gradle.tasks.KotlinCompile

plugins {
    val kotlinVersion = "1.2.30"
    id("org.springframework.boot") version "2.0.2.RELEASE"
    id("org.jetbrains.kotlin.jvm") version kotlinVersion
    id("org.jetbrains.kotlin.plugin.spring") version kotlinVersion
    id("org.jetbrains.kotlin.plugin.jpa") version kotlinVersion
    id("io.spring.dependency-management") version "1.0.5.RELEASE"
}

version = "1.0.0-SNAPSHOT"

tasks.withType<KotlinCompile> {
    kotlinOptions {
        jvmTarget = "1.8"
        freeCompilerArgs = listOf("-Xjsr305=strict")
    }
}

repositories {
    mavenCentral()
}

dependencies {
    compile("org.springframework.boot:spring-boot-starter-web")
    compile("org.springframework.boot:spring-boot-starter-data-jpa")
    compile("org.jetbrains.kotlin:kotlin-stdlib-jdk8")
    compile("org.jetbrains.kotlin:kotlin-reflect")
    compile("org.hibernate:hibernate-core")
    compile("com.h2database:h2")
}

跑:

./gradlew bootRun

触发DB INSERT:

curl http://localhost:8080/test_trigger

日志输出:

Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)

5942
2018-06-09 08:11


起源

请检查我的答案,希望它会有所帮助: stackoverflow.com/a/50694902/5380322 - Cepr0
@ Cepr0谢谢,但我已经这样做了(累积在列表中并调用 saveAll。我刚刚添加了一个最小的代码示例来重现问题。 - Tobias Hermann
你有没有订 hibernate.jdbc.batch_size 属性? - Cepr0
@ Cepr0是的。 (往上看) - Tobias Hermann
这是不正确的,必须采用以下形式: spring.jpa.properties.hibernate.jdbc.batch_size - Cepr0


答案:


要使用Sring Boot和Spring Data JPA获得批量插入,您只需要两件事:

  1. 设置选项 spring.jpa.properties.hibernate.jdbc.batch_size 达到你需要的适当值(例如:20)。

  2. 使用 saveAll() 您的回购方法与准备插入的实体列表。

工作实例是 这里

关于将insert语句转换为如下所示:

INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)

这在PostgreSQL中可用:您可以设置选项 reWriteBatchedInserts 在jdbc连接字符串中为true:

jdbc:postgresql://localhost:5432/db?reWriteBatchedInserts=true

然后jdbc驱动程序会这样做 这种转变

有关批量的其他信息,您可以找到 这里

更新

Kotlin的演示项目: SB-科特林分批嵌入演示


7
2018-06-15 21:36



谢谢。我想让你的Kotlin演示运行,但还没有成功。我做 git clone https://github.com/Cepr0/sb-kotlin-batch-insert-demo, cd sb-kotlin-batch-insert-demo 和 mvn package 但最终会出现以下错误: gist.github.com/Dobiasd/7f1163110b52876f171d43e17af0853c - Tobias Hermann
@TobiasHermann请再次检查 - 我修好了...... - Cepr0


您可以将Hibernate配置为批量DML。看一下 Spring Data JPA - 并发批量插入/更新。我认为答案的第2部分可以解决您的问题:

启用DML语句的批处理启用批处理支持   这样可以减少到数据库的往返次数   插入/更新相同数量的记录。

从批处理INSERT和UPDATE语句引用:

hibernate.jdbc.batch_size = 50

hibernate.order_inserts = true

hibernate.order_updates = true

hibernate.jdbc.batch_versioned_data = true

UPDATE:你必须以不同的方式设置hibernate属性 application.properties 文件。它们位于命名空间下: spring.jpa.properties.*。示例可能如下所示:

spring.jpa.properties.hibernate.jdbc.batch_size = 50
spring.jpa.properties.hibernate.order_inserts = true
....

3
2018-06-09 08:15



谢谢你的建议。我试了一下,但没办法。我已经在我的问题中添加了一个最小的代码示例来重现问题,即使您的设置也是如此。 - Tobias Hermann
谢谢,我调整了我的配置(并相应地更新了我的问题),但仍然没有运气。 - Tobias Hermann
你有没有尝试过不同的数据库,或者你的H2需要吗? @TobiasHermann我建议接下来尝试使用MySQL数据库。并非所有数据库驱动程序都正确实现JDBC批处理插入/更新 - rieckpil
我尝试使用CockroachDB 2.0.2。它支持多行插入,当我手动创建所需的插件时,速度提高了大约10倍 java.sql.PreparedStatement 在我的应用程序中,使用raw发送出去 java.sql.Connection 的 javax.sql.DataSource。 - Tobias Hermann


底层问题是SimpleJpaRepository中的以下代码:

@Transactional
public <S extends T> S save(S entity) {
    if (entityInformation.isNew(entity)) {
        em.persist(entity);
        return entity;
    } else {
        return em.merge(entity);
    }
}

除批量大小属性设置外,还必须确保SimpleJpaRepository类调用是持久的而不是合并。有几种方法可以解决这个问题:使用 @Id 没有查询序列的生成器,比如

@Id
@GeneratedValue(generator = "uuid2")
@GenericGenerator(name = "uuid2", strategy = "uuid2")
var id: Long

或者强制持久性通过让您的实体实现Persistable并覆盖它来将记录视为新记录 isNew() 呼叫

@Entity
class Thing implements Pesistable<Long> {
    var value: Int,
    @Id
    @GeneratedValue
    var id: Long = -1
    @Transient
    private boolean isNew = true;
    @PostPersist
    @PostLoad
    void markNotNew() {
        this.isNew = false;
    }
    @Override
    boolean isNew() {
        return isNew;
    }
}

或者覆盖 save(List) 并使用实体管理器进行调用 persist()

@Repository
public class ThingRepository extends SimpleJpaRepository<Thing, Long> {
    private EntityManager entityManager;
    public ThingRepository(EntityManager entityManager) {
        super(Thing.class, entityManager);
        this.entityManager=entityManager;
    }

    @Transactional
    public List<Thing> save(List<Thing> things) {
        things.forEach(thing -> entityManager.persist(thing));
        return things;
    }
}

以上代码基于以下链接:


3
2018-06-16 18:06



感谢Jean分享有用的链接。但仍有一个问题,坚持 @Generated  @Id 使用的值 Persistable 方法。只有在我手动设置时才会执行批处理 id 我自己的逻辑领域。如果我依靠 @Generated 为了我的 Long  id 属性,那么语句不会批量运行。您共享的所有链接均不使用 @Generated 类型策略 Persistable 方法。我甚至检查了第二个链接中提供的Github代码链接,但它也分配了 id 属性手动。 - iamharish15