问题 Apache Spark EOF异常


我在运行一个读取文本文件并收集结果的简单作业时收到EOFException。这在我的开发机器上运行正常但在独立模式下执行它时失败(单机,主机+工作器)。我的设置是Apache Spark 0.9.1 Hadoop 2预建。

我正在使用sbt-assembly插件部署我的代码并生成可执行的jar文件。

相关堆栈跟踪:

14/05/27 08:22:03 WARN scheduler.TaskSetManager: Loss was due to java.io.EOFException
java.io.EOFException
    at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2742)
    at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1014)
    at org.apache.hadoop.io.WritableUtils.readCompressedByteArray(WritableUtils.java:39)
    at org.apache.hadoop.io.WritableUtils.readCompressedString(WritableUtils.java:87)
    at org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:185)
    at org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2378)
    at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
    at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77)
    at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1001)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1892)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
    at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:165)
    at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1001)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1892)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1914)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1914)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1001)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1892)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1914)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1914)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1001)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1892)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1914)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1914)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1001)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1892)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1914)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1914)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1001)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1892)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1914)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1914)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1001)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1892)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1914)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1797)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
    at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
    at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
    at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1836)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1795)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:701)

[编辑]

请注意我更改了序列化程序,我现在正在使用Kryo(只是试过它,看看是否是这个问题)。

我的火花背景:

//Load Spark config file
lazy val conf = ConfigFactory.load

//Set Spark config object
val sparkConf = new SparkConf()
      .setMaster(conf.getString("spark.prod.master"))  //Something like spark://host:port
      .setAppName(conf.getString("app.name"))
      .set("spark.executor.memory", conf.getString("spark.prod.config.executorMemory"))
      .set("spark.cores.max", conf.getString("spark.prod.config.coresMax"))
      .set("spark.serializer", conf.getString("spark.prod.config.serializer"))
      .set("spark.kryo.registrator", conf.getString("spark.prod.config.kryoRegistrator"))
      .set("spark.kryoserializer.buffer.mb", conf.getString("spark.prod.config.kryoSerializerBufferSize"))
      .set("spark.logConf", conf.getString("spark.prod.config.logConf"))

任何提示?


9046
2018-05-27 08:38


起源

你是怎么加的? jar 文件?你用过了吗? SparkContext.addJar? - visakh
@visakh我将master配置属性设置为master的spark url(spark:// whatever),然后从master中启动可执行jar文件。一切都正常,但在提交一些任务后,我开始收到这些错误。 - jarandaf
@visakh jar文件包含一个main类,其main方法执行所有与spark相关的代码。 - jarandaf
我不确定如果没有看到用于读取文件的代码,就可以回答这个问题。 - Daniel Darabos
@DanielDarabos没什么特别的,我只是使用sc.textFile方法来读取本地文本文件(我没有使用hdfs)并使用一些map / filter / etc转换。 - jarandaf


答案:


经过几天的努力,我终于想出了一个解决方案。我不得不添加相应的 hadoop-client 依赖以避免这种奇怪的异常。

在那之后,其他一些已经 报错了 出现了。连接拒绝问题的解决方案是:

  1. 更改 sbin/start-master.sh 和/或 sbin/start-slaves.sh 并设置 $SPARK_MASTER_IP 至 hostname -f 代替 hostname。似乎Akka只适用于完全限定的名称,而不适用于主机名或IP地址。
  2. 也设置 $SPARK_MASTER_IP 在 conf/spark-env.sh 至 hostname -f 这样集群工作者就可以到达主人。
  3. 确保这一点 conf/slaves 还使用完全限定的域名而不是主机名/ IP地址。

在这些变化之后,一切正常。

希望它可以帮助别人!


12
2018-05-29 09:13



你是怎么找到解决方案的。 hadoop-client? - Landon Kuhn