i have hadoop map-reduce job uses kitsdk datasetkeyinputformat. configured read parquet file.
eveery time run job following exception:
error: java.io.eofexception @ java.io.datainputstream.readfully(datainputstream.java:197) @ java.io.datainputstream.readfully(datainputstream.java:169) @ parquet.hadoop.parquetinputsplit.readarray(parquetinputsplit.java:304) @ parquet.hadoop.parquetinputsplit.readfields(parquetinputsplit.java:263) @ org.apache.hadoop.io.serializer.writableserialization$writabledeserializer.deserialize(writableserialization.java:71) @ org.apache.hadoop.io.serializer.writableserialization$writabledeserializer.deserialize(writableserialization.java:42) @ org.apache.hadoop.mapred.maptask.getsplitdetails(maptask.java:372) @ org.apache.hadoop.mapred.maptask.runnewmapper(maptask.java:754) @ org.apache.hadoop.mapred.maptask.run(maptask.java:341) @ org.apache.hadoop.mapred.yarnchild$2.run(yarnchild.java:163) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:415) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1671) @ org.apache.hadoop.mapred.yarnchild.main(yarnchild.java:158)
the same file can read map-reduce jobs created hive. i.e. can query successfully.
to isolate possible issue have created map-reduce job based on kitesdk example mapreduce. still same exception.
note: avro , csv formats work well.
Comments
Post a Comment