Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example coolblue #428

Merged
merged 5 commits into from
Mar 31, 2023
Merged

Example coolblue #428

merged 5 commits into from
Mar 31, 2023

Conversation

bosd
Copy link
Collaborator

@bosd bosd commented Oct 23, 2022

These invoices have 2 different types of lines.

The problem is in the line
Incl. Thuiskopieheffing: Thuiskopie €3.50 1 21% € 4,24
where the content of the column prijs per stuk has moved to the front.

Made a template in the past, which could parse both the regular lines and the custom line as mentioned above.

lines:
 - start: 'Artikel'
   end: 'Exclusief BTW'
   first_line: ^(?P<product>(\w+(?:\s\S+)*))\s+(?P<qty>[-]?\d+)\s+[€]?\s+(?P<price_unit>(\d{,3}[.]?\d{,3}[,]\d{2}))\s+(?P<tax_percent>\d+)[%]?\s+[€]?\s(?P<price_subtotal>([-]?\d{,3}[.]?\d+[,]\d{2}))
   last_line: ^\s+(?P<product>(Serienummer[:]\s\w+(?:\s\S+)*))
   line: ^\s+(?P<product>(\w+(?:\s\S+)*))\s+(?P<qty>[-]?\d+)\s*$
 - start: 'Artikel'
   end: 'Exclusief BTW'
   line: ^\s+(?P<product>(\w+(?:[:|.]|\s\S+)*))\s+(?P<qty>[-]?\d+)\s+(?P<tax_percent>\d+)[%]?\s+[€]?\s(?P<price_subtotal>([-]?\d{0,3}?[.]?\d+[,]\d{2}))
   types:
     qty: float
     price_unit: float
     price_subtotal: float
     tax_percent: float

Yet, the current code cannot support this kind of syntax.
Converted it to the current syntax. But had to comment out the second line rule.

fields:
  lines:
     parser: lines
     start: 'Artikel'
     end: 'Exclusief BTW'
     first_line: ^(?P<product>(\S+(?:\s\S+)*))\s+(?P<qty>[-]?\d+)\s+[€]?\s+(?P<price_unit>(\d{0,3}[.]?\d{0,3}[,]\d{2}))\s+(?P<tax_percent>\d+)[%]?\s+[€]?\s(?P<price_subtotal>([-]?\d{0,3}[.]?\d+[,]\d{2}))
     last_line: ^\s+(?P<product>(Serienummer[:]\s\w+(?:\s\S+)*)) # \s*$
     line:
       - ^\s+(?P<product>(\w+(?:\s\S+)*))\s+(?P<qty>[-]?\d+)\s*$
       # next line is for parsing thuiskopie heffing
       # - ^\s+(?P<product>(\w+(?:[:|.]|\s\S+)*))\s+(?P<qty>[-]?\d+)\s+(?P<tax_percent>\d+)[%]?\s+[€]?\s(?P<price_subtotal>([-]?\d{0,3}?[.]?\d+[,]\d{2}))
     types:
       qty: float
       price_unit: float
       price_subtotal: float
       tax_percent: float

If the second line is not commented out, there is the following error.

``` Traceback (most recent call last): File "/home/bosd/.local/bin/invoice2data", line 10, in sys.exit(main()) File "/home/bosd/.local/lib/python3.7/site-packages/invoice2data/main.py", line 221, in main res = extract_data(f.name, templates=templates, input_module=input_module) File "/home/bosd/.local/lib/python3.7/site-packages/invoice2data/main.py", line 108, in extract_data return t.extract(optimized_str) File "/home/bosd/.local/lib/python3.7/site-packages/invoice2data/extract/invoice_template.py", line 182, in extract value = parser.parse(self, k, v, optimized_str) File "/home/bosd/.local/lib/python3.7/site-packages/invoice2data/extract/parsers/lines.py", line 129, in parse row[name] = template.coerce_type(row[name], types[name]) File "/home/bosd/.local/lib/python3.7/site-packages/invoice2data/extract/invoice_template.py", line 149, in coerce_type return float(self.parse_number(value)) File "/home/bosd/.local/lib/python3.7/site-packages/invoice2data/extract/invoice_template.py", line 123, in parse_number ), "Decimal separator cannot be present several times" AssertionError: Decimal separator cannot be present several times ```

@rmilecki Do you have any idea how to parse both line layouts?

@rmilecki
Copy link
Collaborator

What JSON output would you expect for this invoice? Basic lines should probably look like:

{
	"product": "Apple iPad Air Wifi 16 GB Zilver",
	"qty": 1,
	"price_unit": 399.0,
	"tax_percent": 21,
	"price_subtotal": 399.0
},
{
	"product": "Decoded Leather Slim Cover Apple iPad Air 2 Zwart",
	"qty": 1,
	"price_unit": 69.99,
	"tax_percent": 21,
	"price_subtotal": 69.99
},
{
	"product": "Nintendo 3DS XL Wit + Blauw",
	"qty": 1,
	"price_unit": 189.0,
	"tax_percent": 21,
	"price_subtotal": 189.0
},
{
	"product": "Nintendo AC-adapter",
	"qty": 1,
	"price_unit": 14.99,
	"tax_percent": 21,
	"price_subtotal": 14.99
},
{
	"product": "Mario Kart 7 3DS",
	"qty": 1,
	"price_unit": 14.99,
	"tax_percent": 21,
	"price_subtotal": 14.99
}

I don't know however what to do with those extra lines.

Should we have something like

{
	"product": "Apple iPad Air Wifi 16 GB Zilver",
	"qty": 1,
	"price_unit": 399.0,
	"tax_percent": 21,
	"price_subtotal": 399.0,
	"incl": "Thuiskopieheffing: Thuiskopie €3.50",
	"incl_qty": 1,
	"incl_tax_percent": 21,
	"incl_price_subtotal": 4.24,
	"sn": "SDMPP373MPP15"
}

?

@bosd
Copy link
Collaborator Author

bosd commented Oct 24, 2022

Here's the result from my previous template / code.

I like your version better, as the serial number could be captured in a separate field.
Actually I think that is the way forward.

The challenge is, the thuiskopie line.
The upstream lib is re-assembling the lines. So the amounts in the most right column Prijs incl. BTW should add up to the total invoice amount. Therefore it should be passed as it's own line. Even when it is technically part of the product.

coolblue2 output


[
    {
        "issuer": "Coolblue B.V.",
        "amount": 4904.94,
        "amount_untaxed": 4053.67,
        "partner_coc": "24330087",
        "narration": "Ordernummer: 12508334",
        "note": [
            "Klantnummer: 6669263",
            "Ordernummer: 12508334",
            "Orderdatum: 29 maart 2014",
            "Alles voor een glimlach."
        ],
        "partner_website": [],
        "partner_name": "Coolblue B.V.",
        "country_code": "nl",
        "partner_zip": "3012 CN",
        "partner_city": "Rotterdam",
        "partner_street": "Gildeweg 8",
        "bic": "INGBNL2A",
        "iban": "NL50INGB0683251309",
        "date": "2014-03-29",
        "invoice_number": "992288600",
        "vat": "NL810433941B01",
        "payment_method": "iDEAL",
        "currency": "EUR",
        "lines": [
            {
                "product": "Decoded Leather Sleeve 15,4'' Vintage Bruin",
                "qty": 2.0,
                "price_unit": 99.99,
                "tax_percent": 21.0,
                "price_subtotal": 199.98
            },
            {
                "product": "Apple MacBook Pro Retina 13,3'' + Apple Magic Mouse\nApple MacBook Pro Retina 15,4'' 256 GB",
                "qty": 11.0,
                "price_unit": 2321.0,
                "tax_percent": 21.0,
                "price_subtotal": 2321.0
            },
            {
                "product": "Incl. Thuiskopieheffing: Thuiskopie €3.50",
                "qty": 1.0,
                "tax_percent": 21.0,
                "price_subtotal": 4.24
            },
            {
                "product": "Apple Magic Mouse",
                "qty": 1.0
            },
            {
                "product": "Microsoft Office Mac Home and Student 2011 NL PKC",
                "qty": 1.0,
                "price_unit": 124.99,
                "tax_percent": 21.0,
                "price_subtotal": 124.99
            },
            {
                "product": "HP USB 3.0 Port Replicator 3005pr (H1L08ET)",
                "qty": 1.0,
                "price_unit": 159.99,
                "tax_percent": 21.0,
                "price_subtotal": 159.99
            },
            {
                "product": "MSI GS60 2QE-226NL Ghost Pro\nSerienummer: GS60 2QE-226NLK141900003666",
                "qty": 1.0,
                "price_unit": 1999.0,
                "tax_percent": 21.0,
                "price_subtotal": 1999.0
            },
            {
                "product": "Hex Outpost Origin Rugzak 15'' Grijs",
                "qty": 1.0,
                "price_unit": 79.99,
                "tax_percent": 21.0,
                "price_subtotal": 79.99
            }
        ],
        "desc": "Invoice from Coolblue B.V."
    }
]


coolblue3 output


[
    {
        "issuer": "Coolblue B.V.",
        "amount": 717.97,
        "amount_untaxed": 593.36,
        "partner_coc": "24330087",
        "narration": "Ordernummer: 12572103",
        "note": [
            "Klantnummer: 6669263",
            "Ordernummer: 12572103",
            "Orderdatum: 18 april 2014",
            "Alles voor een glimlach."
        ],
        "partner_website": [],
        "partner_name": "Coolblue B.V.",
        "country_code": "nl",
        "partner_zip": "3012 CN",
        "partner_city": "Rotterdam",
        "partner_street": "Gildeweg 8",
        "bic": "INGBNL2A",
        "iban": "NL50INGB0683251309",
        "date": "2014-04-19",
        "invoice_number": "993548900",
        "vat": "NL810433941B01",
        "payment_method": "iDEAL",
        "currency": "EUR",
        "lines": [
            {
                "product": "Apple iPad Air Wifi 16 GB Zilver",
                "qty": 1.0,
                "price_unit": 399.0,
                "tax_percent": 21.0,
                "price_subtotal": 399.0
            },
            {
                "product": "Serienummer: SDMPP373MPP15"
            },
            {
                "product": "Incl. Thuiskopieheffing: Thuiskopie €3.50",
                "qty": 1.0,
                "tax_percent": 21.0,
                "price_subtotal": 4.24
            },
            {
                "product": "Decoded Leather Slim Cover Apple iPad Air 2 Zwart",
                "qty": 1.0,
                "price_unit": 69.99,
                "tax_percent": 21.0,
                "price_subtotal": 69.99
            },
            {
                "product": "Nintendo 3DS XL Wit + Blauw",
                "qty": 1.0,
                "price_unit": 189.0,
                "tax_percent": 21.0,
                "price_subtotal": 189.0
            },
            {
                "product": "Nintendo AC-adapter",
                "qty": 1.0,
                "price_unit": 14.99,
                "tax_percent": 21.0,
                "price_subtotal": 14.99
            }
        ],
        "desc": "Invoice from Coolblue B.V."
    }
]


@rmilecki
Copy link
Collaborator

rmilecki commented Nov 5, 2022

The challenge is, the thuiskopie line.

I see, this one is a real headache. It seems we need to make Incl. Thuiskopieheffing: Thuiskopie €3.50 an independent line. That means Apple Magic Mouse can't be assigned to the Apple MacBook Pro Retina 13,3'' + Apple Magic Mouse line (as we already finished processing that one).

This is the best solution I could came up with so far:

# -*- coding: utf-8 -*-
issuer: Coolblue B.V.
keywords:
  - Coolblue B.V.
fields:
  amount: Totaal\s+€\s+([\d\.]+,[\d][\d])
  date: Factuurdatum:\s+(\d{2}\s+\w+\s+\d{4})
  invoice_number: Factuurnummer:\s+(\d+)
  lines:
    parser: lines
    start: Artikel\s+Aantal.*
    end: Exclusief BTW
    first_line:
      - (?P<name>.*)\s+(?P<qty>\d+)\s+€\s+(?P<price_unit>\d+,\d\d)\s+(?P<tax_percent>\d+)%\s+€\s+(?P<price_subtotal>\d+,\d\d)
      - (?P<name>.*)\s+(?P<qty>\d+)\s+(?P<tax_percent>\d+)%\s+€\s+(?P<price_subtotal>\d+,\d\d)
      - (?P<name>.*)\s+(?P<qty>\d+)$
    line:
      - 'Serienummer: (?P<serial>.*)'
    types:
      qty: int
      price_unit: float
      tax_percent: int
      price_subtotal: float
options:
  decimal_separator: ','

Lines output:

[
    {
        "name":"Decoded Leather Sleeve 15,4'' Vintage Bruin",
        "qty":2,
        "price_unit":99.99,
        "tax_percent":21,
        "price_subtotal":199.98
    },
    {
        "name":"Apple MacBook Pro Retina 15,4'' 256 GB",
        "qty":1
    },
    {
        "name":"Incl. Thuiskopieheffing: Thuiskopie €3.50",
        "qty":1,
        "tax_percent":21,
        "price_subtotal":4.24
    },
    {
        "name":"Apple Magic Mouse",
        "qty":1
    },
    {
        "name":"Microsoft Office Mac Home and Student 2011 NL PKC",
        "qty":1,
        "price_unit":124.99,
        "tax_percent":21,
        "price_subtotal":124.99
    },
    {
        "name":"HP USB 3.0 Port Replicator 3005pr (H1L08ET)",
        "qty":1,
        "price_unit":159.99,
        "tax_percent":21,
        "price_subtotal":159.99,
        "serial":"GS60 2QE-226NLK141900003666"
    },
    {
        "name":"Hex Outpost Origin Rugzak 15'' Grijs",
        "qty":1,
        "price_unit":79.99,
        "tax_percent":21,
        "price_subtotal":79.99
    },
    {
        "name":"Case-Mate Barely There Case Sony Xperia Z3 Transparant",
        "qty":1,
        "price_unit":19.99,
        "tax_percent":21,
        "price_subtotal":19.99
    }
]

@rmilecki
Copy link
Collaborator

rmilecki commented Feb 3, 2023

@bosd: what is your opinion on those templates & outputs is provided in my 2 above comments? Are they sufficient? Or do you have any better ideas?

@bosd
Copy link
Collaborator Author

bosd commented Feb 28, 2023

The compare is failling, duno yet what is causing the difference between my local system and the repo.

@bosd bosd force-pushed the example-coolblue branch 2 times, most recently from 4feccea to abfea06 Compare March 13, 2023 07:41
@bosd
Copy link
Collaborator Author

bosd commented Mar 13, 2023

Tests keep failing.
The error seems to be:

E                       AssertionError: False is not true : Failed to verify parsing result for /home/runner/work/invoice2data/invoice2data/tests/compare/coolblue2.json

tests/test_cli.py:78: AssertionError
------------------------------ Captured log call -------------------------------
ERROR    invoice2data.extract.invoice_template:invoice_template.py:226 Failed to parse field partner_website with parser regex
=============================== warnings summary ===============================

image

For this file, the partner_website field is supposed to fail.
But it should not lead to failling tests.

Is this one failling because of this logger line?

@bosd bosd force-pushed the example-coolblue branch 2 times, most recently from 49b4f36 to 09a3458 Compare March 13, 2023 20:29
@bosd
Copy link
Collaborator Author

bosd commented Mar 14, 2023

previous statement was wrong.
The problem was indeed in the parsing of the website field. I should have updated my local instance to include the latest code. So the compare file was generated properly.

@bosd bosd marked this pull request as ready for review March 31, 2023 21:45
@bosd bosd merged commit 3770a1e into invoice-x:master Mar 31, 2023
@bosd bosd deleted the example-coolblue branch March 31, 2023 21:46
@RiekertKBW
Copy link

RiekertKBW commented Aug 29, 2023

You can also input an array of line patterns.

So, simply enclose your strings in square brackets and separate them with a colon:
['x', 'y']

For example:

(...)
first_line: (...)
line: ['Three digits:\s*(?P\d{3})', 'Four digits:\s*(?P\d{4})']
last_line: (...)
(...)

This results in each example going into one line item, as follows:

(...)
"lines":
[{
"threeD": "123",
"fourD": "4321"
},
{
"threeD": "789",
"fourD": "9876"
}]
(...)

My problem was that I had to extract data from lines that spanned across five actual lines. I searched for a while and debugged invoice2data until I reached this point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants