Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hdfswriter 写入出现 IO 异常 #374

Closed
nmgliangwei opened this issue Sep 28, 2021 · 20 comments
Closed

hdfswriter 写入出现 IO 异常 #374

nmgliangwei opened this issue Sep 28, 2021 · 20 comments
Assignees

Comments

@nmgliangwei
Copy link

从 hdfs 读取写入 hdfs 时出现 IO异常,看报错是下面这段代码出现了异常.
image

�[36m2021-09-28 18:28:56.070�[0;39m �[32m[0-0-1-writer]�[0;39m �[1;31mERROR�[0;39m �[35mWriterRunner        �[0;39m - Writer Runner Received Exceptions:
com.wgzhao.addax.common.exception.AddaxException: Code:[HdfsWriter-04], Description:[您配置的文件在写入时出现IO异常.]. - java.lang.NullPointerException
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.transportParRecord(HdfsHelper.java:218)
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:628)
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsWriter$Task.startWrite(HdfsWriter.java:481)
	at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
	at java.lang.Thread.run(Thread.java:748)
 - java.lang.NullPointerException
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.transportParRecord(HdfsHelper.java:218)
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:628)
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsWriter$Task.startWrite(HdfsWriter.java:481)
	at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
	at java.lang.Thread.run(Thread.java:748)

	at com.wgzhao.addax.common.exception.AddaxException.asAddaxException(AddaxException.java:66)
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:635)
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsWriter$Task.startWrite(HdfsWriter.java:481)
	at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException: null
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.transportParRecord(HdfsHelper.java:218)
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:628)
	... 3 common frames omitted
Exception in thread "taskGroup-0" com.wgzhao.addax.common.exception.AddaxException: Code:[HdfsWriter-04], Description:[您配置的文件在写入时出现IO异常.]. - java.lang.NullPointerException
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.transportParRecord(HdfsHelper.java:218)
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:628)
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsWriter$Task.startWrite(HdfsWriter.java:481)
	at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
	at java.lang.Thread.run(Thread.java:748)
 - java.lang.NullPointerException
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.transportParRecord(HdfsHelper.java:218)
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:628)
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsWriter$Task.startWrite(HdfsWriter.java:481)
	at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
	at java.lang.Thread.run(Thread.java:748)

	at com.wgzhao.addax.common.exception.AddaxException.asAddaxException(AddaxException.java:66)
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:635)
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsWriter$Task.startWrite(HdfsWriter.java:481)
	at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.transportParRecord(HdfsHelper.java:218)
	at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:628)
	... 3 more
@wgzhao
Copy link
Owner

wgzhao commented Sep 28, 2021

麻烦贴出完整的输出日志

@nmgliangwei
Copy link
Author

nmgliangwei commented Sep 28, 2021

	�[36m2021-09-28 19:12:45.629�[0;39m �[32m[        main]�[0;39m �[34mINFO �[0;39m �[35mPerfTrace           �[0;39m - PerfTrace traceId=job_-1, isEnable=false, priority=0
	�[36m2021-09-28 19:12:45.630�[0;39m �[32m[        main]�[0;39m �[34mINFO �[0;39m �[35mJobContainer        �[0;39m - Addax jobContainer starts job.
	�[36m2021-09-28 19:12:45.632�[0;39m �[32m[        main]�[0;39m �[34mINFO �[0;39m �[35mJobContainer        �[0;39m - Set jobId = 0
	�[36m2021-09-28 19:12:45.647�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mHdfsReader$Job      �[0;39m - init() begin...
	�[36m2021-09-28 19:12:45.760�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mDFSUtil             �[0;39m - hadoopConfig details:{"finalParameters":[]}
	�[36m2021-09-28 19:12:45.760�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mHdfsReader$Job      �[0;39m - init() ok and end...
[INFO] 2021-09-28 19:12:47.242  - [taskAppId=TASK-61-778-1771]:[138] -  -> �[36m2021-09-28 19:12:46.359�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mJobContainer        �[0;39m - Addax Reader.Job [hdfsreader] do prepare work .
	�[36m2021-09-28 19:12:46.359�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mHdfsReader$Job      �[0;39m - prepare(), start to getAllFiles...
	�[36m2021-09-28 19:12:46.359�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mDFSUtil             �[0;39m - get HDFS all files in path = [/user/hive/warehouse/xxx.db/xxx/dt=202101]
[INFO] 2021-09-28 19:12:49.243  - [taskAppId=TASK-61-778-1771]:[138] -  -> �[36m2021-09-28 19:12:48.492�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mDFSUtil             �[0;39m - [hdfs://xxx:8020/user/hive/warehouse/xxx.db/xxx/dt=202101/69408a43ae2c63c7-7d2dba9200000000_86403196_data.0.parq]是[PARQUET]类型的文件, 将该文件加入source files列表
	�[36m2021-09-28 19:12:49.187�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mDFSUtil             �[0;39m - [hdfs://xxx:8020/user/hive/warehouse/xxx.db/xxx/dt=202101/69408a43ae2c63c7-7d2dba9200000001_784827062_data.0.parq]是[PARQUET]类型的文件, 将该文件加入source files列表
[INFO] 2021-09-28 19:12:50.244  - [taskAppId=TASK-61-778-1771]:[138] -  -> �[36m2021-09-28 19:12:49.987�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mDFSUtil             �[0;39m - [hdfs://xxx:8020/user/hive/warehouse/xxx.db/xxx/dt=202101/69408a43ae2c63c7-7d2dba9200000002_2050807539_data.0.parq]是[PARQUET]类型的文件, 将该文件加入source files列表
	�[36m2021-09-28 19:12:49.987�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mHdfsReader$Job      �[0;39m - 您即将读取的文件数为: [3], 列表为: [[hdfs://xxx:8020/user/hive/warehouse/xxx.db/xxx/dt=202101/69408a43ae2c63c7-7d2dba9200000000_86403196_data.0.parq, hdfs://xxx:8020/user/hive/warehouse/xxx.db/xxx/dt=202101/69408a43ae2c63c7-7d2dba9200000001_784827062_data.0.parq, hdfs://xxx:8020/user/hive/warehouse/xxx.db/xxx/dt=202101/69408a43ae2c63c7-7d2dba9200000002_2050807539_data.0.parq]]
	�[36m2021-09-28 19:12:49.988�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mJobContainer        �[0;39m - Addax Writer.Job [hdfswriter] do prepare work .
	�[36m2021-09-28 19:12:50.021�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mJobContainer        �[0;39m - Job set Channel-Number to 1 channels.
	�[36m2021-09-28 19:12:50.021�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mHdfsReader$Job      �[0;39m - split() begin...
	�[36m2021-09-28 19:12:50.024�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mJobContainer        �[0;39m - Addax Reader.Job [hdfsreader] splits to [3] tasks.
	�[36m2021-09-28 19:12:50.025�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mHdfsWriter$Job      �[0;39m - begin splitting ...
	�[36m2021-09-28 19:12:50.030�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mHdfsWriter$Job      �[0;39m - split wrote file name:[hdfs://hdfs-namenode:8020/user/hive/warehouse/xxx_bak.db/xxx/dt=202101/.1dc61d76_9279_472e_8e62_d5d5b1437077/xxx__a78f529d_b489_4455_816d_53cf1e4bd44f]
	�[36m2021-09-28 19:12:50.031�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mHdfsWriter$Job      �[0;39m - split wrote file name:[hdfs://hdfs-namenode:8020/user/hive/warehouse/xxx_bak.db/xxx/dt=202101/.1dc61d76_9279_472e_8e62_d5d5b1437077/xxx__6b7ab01d_edc5_47ae_9f3f_7e674e3e3096]
	�[36m2021-09-28 19:12:50.032�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mHdfsWriter$Job      �[0;39m - split wrote file name:[hdfs://hdfs-namenode:8020/user/hive/warehouse/xxx_bak.db/xxx/dt=202101/.1dc61d76_9279_472e_8e62_d5d5b1437077/xxx__f5b77c27_2ad3_475e_9d6e_ca315b4a741a]
	�[36m2021-09-28 19:12:50.032�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mHdfsWriter$Job      �[0;39m - end splitting.
	�[36m2021-09-28 19:12:50.032�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mJobContainer        �[0;39m - Addax Writer.Job [hdfswriter] splits to [3] tasks.
	�[36m2021-09-28 19:12:50.050�[0;39m �[32m[       job-0]�[0;39m �[34mINFO �[0;39m �[35mJobContainer        �[0;39m - Scheduler starts [1] taskGroups.
	�[36m2021-09-28 19:12:50.061�[0;39m �[32m[ taskGroup-0]�[0;39m �[34mINFO �[0;39m �[35mTaskGroupContainer  �[0;39m - taskGroupId=[0] start [1] channels for [3] tasks.
	�[36m2021-09-28 19:12:50.068�[0;39m �[32m[ taskGroup-0]�[0;39m �[34mINFO �[0;39m �[35mChannel             �[0;39m - Channel set byte_speed_limit to -1, No bps activated.
	�[36m2021-09-28 19:12:50.068�[0;39m �[32m[ taskGroup-0]�[0;39m �[34mINFO �[0;39m �[35mChannel             �[0;39m - Channel set record_speed_limit to -1, No tps activated.
	�[36m2021-09-28 19:12:50.095�[0;39m �[32m[0-0-0-reader]�[0;39m �[34mINFO �[0;39m �[35mDFSUtil             �[0;39m - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
	�[36m2021-09-28 19:12:50.095�[0;39m �[32m[0-0-0-reader]�[0;39m �[34mINFO �[0;39m �[35mHdfsReader$Task     �[0;39m - read start
	�[36m2021-09-28 19:12:50.095�[0;39m �[32m[0-0-0-reader]�[0;39m �[34mINFO �[0;39m �[35mHdfsReader$Task     �[0;39m - reading file : [hdfs://xxx:8020/user/hive/warehouse/xxx.db/xxx/dt=202101/69408a43ae2c63c7-7d2dba9200000000_86403196_data.0.parq]
	�[36m2021-09-28 19:12:50.095�[0;39m �[32m[0-0-0-writer]�[0;39m �[34mINFO �[0;39m �[35mHdfsWriter$Task     �[0;39m - write to file : [hdfs://hdfs-namenode:8020/user/hive/warehouse/xxx_bak.db/xxx/dt=202101/.1dc61d76_9279_472e_8e62_d5d5b1437077/xxx__a78f529d_b489_4455_816d_53cf1e4bd44f]
	�[36m2021-09-28 19:12:50.095�[0;39m �[32m[0-0-0-reader]�[0;39m �[34mINFO �[0;39m �[35mDFSUtil             �[0;39m - Start Read orcfile [hdfs://xxx:8020/user/hive/warehouse/xxx.db/xxx/dt=202101/69408a43ae2c63c7-7d2dba9200000000_86403196_data.0.parq].
	�[36m2021-09-28 19:12:50.215�[0;39m �[32m[0-0-0-writer]�[0;39m �[34mINFO �[0;39m �[35mHdfsHelper          �[0;39m - write parquet file hdfs://hdfs-namenode:8020/user/hive/warehouse/xxx_bak.db/xxx/dt=202101/.1dc61d76_9279_472e_8e62_d5d5b1437077/xxx__a78f529d_b489_4455_816d_53cf1e4bd44f
[INFO] 2021-09-28 19:12:52.245  - [taskAppId=TASK-61-778-1771]:[138] -  -> �[36m2021-09-28 19:12:51.562�[0;39m �[32m[0-0-0-writer]�[0;39m �[1;31mERROR�[0;39m �[35mHdfsHelper          �[0;39m - 写文件文件[hdfs://hdfs-namenode:8020/user/hive/warehouse/xxx_bak.db/xxx/dt=202101/.1dc61d76_9279_472e_8e62_d5d5b1437077/xxx__a78f529d_b489_4455_816d_53cf1e4bd44f]时发生IO异常,请检查您的网络是否正常!
	�[36m2021-09-28 19:12:51.563�[0;39m �[32m[0-0-0-writer]�[0;39m �[34mINFO �[0;39m �[35mHdfsHelper          �[0;39m - start delete tmp dir [hdfs://hdfs-namenode:8020/user/hive/warehouse/xxx_bak.db/xxx/dt=202101/.1dc61d76_9279_472e_8e62_d5d5b1437077] .
	�[36m2021-09-28 19:12:51.568�[0;39m �[32m[0-0-0-writer]�[0;39m �[34mINFO �[0;39m �[35mHdfsHelper          �[0;39m - finish delete tmp dir [hdfs://hdfs-namenode:8020/user/hive/warehouse/xxx_bak.db/xxx/dt=202101/.1dc61d76_9279_472e_8e62_d5d5b1437077] .
	�[36m2021-09-28 19:12:51.571�[0;39m �[32m[0-0-0-writer]�[0;39m �[1;31mERROR�[0;39m �[35mWriterRunner        �[0;39m - Writer Runner Received Exceptions:
	com.wgzhao.addax.common.exception.AddaxException: Code:[HdfsWriter-04], Description:[您配置的文件在写入时出现IO异常.]. - java.lang.NullPointerException
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.transportParRecord(HdfsHelper.java:218)
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:628)
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsWriter$Task.startWrite(HdfsWriter.java:481)
		at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
		at java.lang.Thread.run(Thread.java:748)
	 - java.lang.NullPointerException
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.transportParRecord(HdfsHelper.java:218)
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:628)
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsWriter$Task.startWrite(HdfsWriter.java:481)
		at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
		at java.lang.Thread.run(Thread.java:748)
	
		at com.wgzhao.addax.common.exception.AddaxException.asAddaxException(AddaxException.java:66)
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:635)
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsWriter$Task.startWrite(HdfsWriter.java:481)
		at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
		at java.lang.Thread.run(Thread.java:748)
	Caused by: java.lang.NullPointerException: null
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.transportParRecord(HdfsHelper.java:218)
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:628)
		... 3 common frames omitted
	Exception in thread "taskGroup-0" com.wgzhao.addax.common.exception.AddaxException: Code:[HdfsWriter-04], Description:[您配置的文件在写入时出现IO异常.]. - java.lang.NullPointerException
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.transportParRecord(HdfsHelper.java:218)
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:628)
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsWriter$Task.startWrite(HdfsWriter.java:481)
		at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
		at java.lang.Thread.run(Thread.java:748)
	 - java.lang.NullPointerException
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.transportParRecord(HdfsHelper.java:218)
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:628)
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsWriter$Task.startWrite(HdfsWriter.java:481)
		at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
		at java.lang.Thread.run(Thread.java:748)
	
		at com.wgzhao.addax.common.exception.AddaxException.asAddaxException(AddaxException.java:66)
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:635)
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsWriter$Task.startWrite(HdfsWriter.java:481)
		at com.wgzhao.addax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:81)
		at java.lang.Thread.run(Thread.java:748)
	Caused by: java.lang.NullPointerException
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.transportParRecord(HdfsHelper.java:218)
		at com.wgzhao.addax.plugin.writer.hdfswriter.HdfsHelper.parquetFileStartWrite(HdfsHelper.java:628)
		... 3 more

@nmgliangwei
Copy link
Author

我大概找到问题了,是因为 int 字段有 null 值

@nmgliangwei
Copy link
Author

这段代码应该能处理字段为 null 的吧
if (column.getRawData() == null) {
builder.set(colname, null);
}

@nmgliangwei
Copy link
Author

@wgzhao 我把目标端都改成 string 也会报这个错...

@wgzhao
Copy link
Owner

wgzhao commented Sep 28, 2021

你用的哪个版本,方便贴出 json 文件来吗?

@nmgliangwei
Copy link
Author

nmgliangwei commented Sep 28, 2021

addax: 4.0.3
json如下,我把字段 xxx 了

{
		"content":[
			{
				"reader":{
					"parameter":{
						"path":"/user/hive/warehouse/xxx.db/xxx/dt=202101",
						"column":[
							{
								"index":"0",
								"type":"STRING"
							},
							{
								"index":"1",
								"type":"STRING"
							},
							{
								"index":"2",
								"type":"STRING"
							},
							{
								"index":"3",
								"type":"STRING"
							},
							{
								"index":"4",
								"type":"STRING"
							},
							{
								"index":"5",
								"type":"STRING"
							},
							{
								"index":"6",
								"type":"STRING"
							},
							{
								"index":"7",
								"type":"STRING"
							},
							{
								"index":"8",
								"type":"STRING"
							},
							{
								"index":"9",
								"type":"STRING"
							},
							{
								"index":"10",
								"type":"STRING"
							},
							{
								"index":"11",
								"type":"STRING"
							},
							{
								"index":"12",
								"type":"STRING"
							},
							{
								"index":"13",
								"type":"STRING"
							},
							{
								"index":"14",
								"type":"STRING"
							},
							{
								"index":"15",
								"type":"STRING"
							},
							{
								"index":"16",
								"type":"STRING"
							},
							{
								"index":"17",
								"type":"STRING"
							},
							{
								"index":"18",
								"type":"STRING"
							},
							{
								"index":"19",
								"type":"STRING"
							},
							{
								"index":"20",
								"type":"STRING"
							},
							{
								"index":"21",
								"type":"STRING"
							},
							{
								"index":"22",
								"type":"STRING"
							},
							{
								"index":"23",
								"type":"STRING"
							},
							{
								"index":"24",
								"type":"STRING"
							},
							{
								"index":"25",
								"type":"STRING"
							},
							{
								"index":"26",
								"type":"STRING"
							},
							{
								"index":"27",
								"type":"STRING"
							},
							{
								"index":"28",
								"type":"STRING"
							},
							{
								"index":"29",
								"type":"STRING"
							},
							{
								"index":"30",
								"type":"STRING"
							},
							{
								"index":"31",
								"type":"STRING"
							},
							{
								"index":"32",
								"type":"STRING"
							},
							{
								"index":"33",
								"type":"STRING"
							},
							{
								"index":"34",
								"type":"STRING"
							},
							{
								"index":"35",
								"type":"STRING"
							},
							{
								"index":"36",
								"type":"STRING"
							},
							{
								"index":"37",
								"type":"STRING"
							},
							{
								"index":"38",
								"type":"STRING"
							},
							{
								"index":"39",
								"type":"STRING"
							}
						],
						"defaultFS":"hdfs://xxx:8020",
						"encoding":"UTF-8",
						"fileType":"parquet"
					},
					"name":"hdfsreader"
				},
				"writer":{
					"parameter":{
						"path":"/user/hive/warehouse/xxx.db/xxx/dt=202101",
						"fileName":"xxx",
						"column":[
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							},
							{
								"name":"xxx",
								"type":"STRING"
							}
						],
						"defaultFS":"hdfs://hdfs-namenode:8020",
						"writeMode":"overwrite",
						"fieldDelimiter":"\u0001",
						"fileType":"parquet"
					},
					"name":"hdfswriter"
				}
			}
		],
		"setting":{
			"speed":{
				"channel":1
			}
		}
	}

@nmgliangwei
Copy link
Author

@wgzhao 我把源和目标字段类型都改成 STRING 抽取了. 还是报错.

@wgzhao
Copy link
Owner

wgzhao commented Sep 28, 2021

不止字段类型的问题,我代码里有 test 例子,可以设置int 类型为null,是不会报错的。
方便私发一个小一点的要读取的parquet 文件给我吗 ([email protected]),我看看本地能不能复现这个问题

@wgzhao
Copy link
Owner

wgzhao commented Sep 28, 2021

大概知道原因了,我等下上传一个编译好jar包给你

@wgzhao
Copy link
Owner

wgzhao commented Sep 28, 2021

请下载这个压缩文件,解压后替换 plugin/writer/hdfswriter/ 文件夹下对应的jar文件,然后测试

hdfswriter-4.0.3.jar.zip

@nmgliangwei
Copy link
Author

@wgzhao 666,可以了.
column这个值为 null 了对吧.

@wgzhao
Copy link
Owner

wgzhao commented Sep 29, 2021

@nmgliangwei 是的,增加了一个commit,当值为 null 值,不再直接传递 null 值给 column, 而是设定 column = new StringColumn()

@wgzhao wgzhao closed this as completed Sep 29, 2021
@nmgliangwei
Copy link
Author

@wgzhao 发现个问题,虽然写入了, 但是很多字段写入的都是这个" java.nio.HeapByteBuffer[pos=0 lim=22 cap=22] " 不同字段 lim 和 cap 值 不一样

@wgzhao wgzhao reopened this Sep 29, 2021
@wgzhao
Copy link
Owner

wgzhao commented Sep 29, 2021

我在本次测试看能否复现这个问题

@wgzhao
Copy link
Owner

wgzhao commented Sep 29, 2021

本地没有复现这个问题,是什么类型的字段呢,是这些字段的值为空的情况吗?

@nmgliangwei
Copy link
Author

有 bigint 有 string 的字段, 字段中有 NULL 的情况, 我发现好像,源有值的,写入目标会变成java.nio.HeapByteBuffe, 源为 NULL 的,目标还是 NULL

@nmgliangwei
Copy link
Author

源 hadoop 版本:Hadoop 2.6.0
目标 hadoop 版本:Hadoop 3.0.0

@wgzhao
Copy link
Owner

wgzhao commented Sep 29, 2021

方便的话,源文件能给一个吗?
parquet 格式复杂
目前并不能兼容所有特征
我看下是不是读取就有问题

@nmgliangwei
Copy link
Author

@wgzhao 邮箱发你了点数据案例.你看一下. 我怀疑是读取的问题, 我把读取的换成 impala ,写入还是 hdfs parquet格式. 数据是正常的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants