Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading EBCDIC file with multiple structure #663

Open
MJames1030 opened this issue Mar 26, 2024 · 1 comment
Open

Reading EBCDIC file with multiple structure #663

MJames1030 opened this issue Mar 26, 2024 · 1 comment
Labels
question Further information is requested

Comments

@MJames1030
Copy link

Hi,

I'm using the cobrix library on databricks to read EBCDIC file. I have now a copybook with multiples structure. When I read the EBCDIC file all the data of the file are in the structure of the first structure.

Here my option :
spark.read.format("cobol").option("record_length", "100").option("pedantic", "true").option("ebcdic_code_page", "cp1047").option("drop_value_fillers", "false").option("drop_group_fillers", "false").

I have also tried with the option : is_record_sequence = true

Here an extract of the file.

STRUCTURE DE RECORD DU COM-SET 03 00010000
01 REC-TYPE1. 00020000
03 T1-RECTYPE PIC X(02). 00030000
03 T1-MODYEAR PIC X(04). 00040000
03 T1-CODEMOD PIC X(06). 00050000
03 T1-LANGUE PIC X(02). 00060000
03 T1-SEQNUMB PIC X(03). 00070000
03 FILLER PIC X(13). 00080000
03 T1-CODEMODDESC PIC X(40). 00090000
03 T1-FILLER PIC X(02). 00100000
03 T1-MARQUE PIC X(01). 00110000
03 T1-PROCCOD PIC X(01). 00120000
03 T1-STARTDAT PIC X(07). 00130000
03 T1-EXPIRDAT PIC X(07). 00140000
03 T1-MODDATE PIC X(07). 00150000
03 T1-MODTIME PIC X(04). 00160000
03 FILLER PIC X(01). 00161000
00162000
01 REC-TYPE2. 00163000
03 T2-RECTYPE PIC X(02). 00164000
03 T2-MODYEAR PIC X(04). 00165000
03 T2-CLASS PIC X(02). 00166000
03 T2-LANGUE PIC X(02). 00167000
03 T2-PRNUMB PIC X(03). 00168000
03 T2-SEQNUMB PIC X(03). 00169000
03 FILLER PIC X(14). 00170000
03 T2-PRNRDESC PIC X(40). 00180000
03 FILLER PIC X(02). 00190000
03 T2-MARQUE PIC X(01). 00200000
03 T2-PROCCOD PIC X(01). 00210000
03 T2-STARTDAT PIC X(07). 00220000
03 T2-EXPIRDAT PIC X(07). 00230000
03 T2-MODDATE PIC X(07). 00240000
03 T2-MODTIME PIC X(04). 00250000
03 FILLER PIC X(01). 00260000
00270000
01 REC-TYPE3. 00280000
03 T3-RECTYPE PIC X(02). 00290000
03 T3-MODYEAR PIC X(04). 00300000
03 T3-CLASS PIC X(02). 00310000
03 T3-LANGUE PIC X(02). 00320000
03 T3-PACKAGE PIC X(03). 00330000
03 T3-SEQNUMB PIC X(03). 00340000
03 FILLER PIC X(14). 00350000
03 T3-PACKDESC PIC X(40). 00360000
03 FILLER PIC X(02). 00370000
03 T3-BRAND PIC X(01). 00380000
03 T3-PROCCOD PIC X(01). 00390000
03 T3-STARTDAT PIC X(07). 00400000
03 T3-EXPIRDAT PIC X(07). 00410000
03 T3-MODDATE PIC X(07). 00420000
03 T3-MODTIME PIC X(04). 00430000
03 FILLER PIC X(01). 00440000

There are 15 structures like this

Do you know who I can solve this ?

Thank you in advance,
Jamal

@MJames1030 MJames1030 added the question Further information is requested label Mar 26, 2024
@yruslan
Copy link
Collaborator

yruslan commented Mar 28, 2024

If the first record looks good, but rest of records are not, the most likely is the cobybook is not aligned with the record size.

Please, try the latest version 2.6.11, since a bug was fixed that caused Cobrix to ignore .option("record_length", "100") in certain circumstances.

If that won't help, you can use .option("debug", "true) to debug and determine the correct record size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants