Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with other templates on [Windows] #566

Open
abhigyanasatpathy opened this issue Aug 26, 2024 · 19 comments
Open

Issues with other templates on [Windows] #566

abhigyanasatpathy opened this issue Aug 26, 2024 · 19 comments

Comments

@abhigyanasatpathy
Copy link

Steps to add new template

To add a new template, we recommend this workflow:

1. Copy existing template to new file

Find a template that is roughly similar to what you need and copy it to
a new file. It's good practice to use reverse domain notation. E.g.
country.company.division.language.yml or
fr.mobile.enterprise.french.yml. Language is not always needed.
Template folder are searched recursively for files ending in .yml.

2. Change invoice issuer

Just used in the output. Best to use the company name.

3. Set keyword

Look at the invoice and find the best identifying string. Tax number +
company name are good options. Remember, all keywords need to be found
for the template to be used.

Keywords are compared before processing the extracted text.

4. First test run

Now we're ready to see how far we are off. Run invoice2data with the
following debug command to see if your keywords match and how much work
is needed for dates, etc.

invoice2data --template-folder tpl --debug invoice-XXX.pdf

This test run shows you how the program will "see" the text in the
invoice. Parsing PDFs is sometimes a bit unpredictable. Also make sure
your template is used. You should already receive some data from static
fields or currencies.

5. Add regular expressions

Now you can use the debugging text to add regex fields for the
information you need. It's a good idea to copy parts of the text
directly from the debug output and then replace the dynamic parts with
regex. Keep in mind that some characters need escaping. To test, re-run
the above command.

  • date field: First capture the date. Then see if dateparser
    handles it correctly. If not, add your format or language under
    options.
  • amount: Capture the number without currency code. If you expect
    high amounts, replace the thousand separator. Currently we don't
    parse numbers via locals (TODO)

6. Done

Now you're ready to commit and push your template, so others get a
chance to use and improve it.

My Question:
I have added new template in yml with regex accordingly but when i am parsing that invoice pdf it is not parsing showing error .

Error message:
(invoice2data-env) D:\invoice2data-master\src\invoice2data>invoice2data --output-format csv --output-name output/invoices.csv input/demoinvoice.pdf
←[94mINFO:←[0minvoice2data.extract.loader:←[94m Loaded 189 templates from D:\invoice2data-master\invoice2data-env\Lib\site-packages\invoice2data\extract\templates←[0m
←[94mINFO:←[0mpikepdf._core:←[94m pikepdf C++ to Python logger bridge initialized←[0m
Scanning contents ---------------------------------------- 100% 1/1 0:00:00
←[1;43mWARNING:←[0mocrmypdf._pipeline:←[1;43m This PDF is marked as a Tagged PDF. This often indicates that the PDF was generated from an office document and does not need OCR. PDF pages processed by OCRmyPDF may not be tagged correctly.←[0m
OCR ---------------------------------------- 0% 0/1 -:--:--←[1;43mWARNING:←[0mocrmypdf._pipeline:←[1;43m Weighted average image DPI is 152.1, max DPI is 247.7. The discrepancy may indicate a high detail region on this page, but could also indicate a problem with the input PDF file. Page image will be rendered at 400.0 DPI.←[0m
OCR ---------------------------------------- 100% 1/1 0:00:00
Linearizing ---------------------------------------- 100% 100/100 0:00:00
←[94mINFO:←[0minvoice2data.input.ocrmypdf:←[94m Text extraction made with ocrmypdf←[0m
←[1;41mERROR:←[0mroot:←[1;41m No template for input/demoinvoice.pdf←[0m

@bosd
Copy link
Collaborator

bosd commented Aug 27, 2024

Hi,
Your steps for adding a template are correct.

Did you verify your installation of invoice2data is running properly, by testing I on one of the example files?

@abhigyanasatpathy
Copy link
Author

Yes it is running properly.
Thank you for cooperating me.
Btw can you please tell me the process again?
I have created templates/myinvoice and inside it in.myinvoice.yml and regex according to my pdf .
So is that the process enough to convert my pdf to csv in output?
Or any other process or code i need to add , please tell me simply?
I have already run your existing template working fine.

@bosd
Copy link
Collaborator

bosd commented Aug 27, 2024

Your invoked command seems ok.

Some debugging steps
[x] Verify your installation and parsing of sample file.
[ ] Run with --debug flag to check the output of the invoice-xx.pdf file.
This likely is the problem. As invoice2data trys to fall back on ocrmypdf. Which is likley due to the fact that it cannot detect characters with pdftotext.

Is your pdf file a text based file? or does it need ocr?
[ ] Try your pdf with different input parser --input-reader= then use pdftotext or ocrmypdf
[ ] Check your template for syntax errors

@abhigyanasatpathy
Copy link
Author

abhigyanasatpathy commented Aug 27, 2024

My pdf file is text based file.
I have only created one file in.invoicedemo.yml (path: D:\invoice2data-master\src\invoice2data\extract\templates\in\in.invoicedemo.yml) as step-1
Should i proceed only with this process step-1 or any other steps i should follow?
Is there any other steps where i need to code or whatever else?

So in in.invoicedemo.yml file i have woked on regex expressions and keywords according to my pdf .

@bosd
Copy link
Collaborator

bosd commented Aug 28, 2024

When you run invoice2data on the pdf file with the --debug flag, do you see the contents of the file in your logger/terminal?

@abhigyanasatpathy
Copy link
Author

No , i cannot see contents of the file.
I can see only pdf to text data in logger (using --debug flag)
But i cannot see data in csv file .
Getting error in logger:
♀←[0m
DEBUG:←[0mroot: END pdftotext result =============================←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: au.com.opal.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: au.com.telstra.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.accor.invest.ibis.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.accor.invest.novotel.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.boucherie.pochet.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.cebeo.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.eg_retail.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.lampiris.facture-dacompte.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.lampiris.factuur.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.lampiris.regularisation.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.melchior-vins.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.proximus.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.scarlet.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.securex.social.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: ch.pcengines.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invo
.
.
.DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.bmw-fs.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.insert.subiekt-gt.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.insert.subiekt-nexo.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.orlen.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.p4.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.paypro.yml | Failed to match all keywords.←[0m
←[94mINFO:←[0mpikepdf._core:←[94m pikepdf C++ to Python logger bridge initialized←[0m
DEBUG:←[0mroot: Text extraction failed, falling back to ocrmypdf←[0m
DEBUG:←[0mroot: Text extraction failed, falling back to ocrmypdf←[0m
DEBUG:←[0minvoice2data.input.ocrmypdf: input_reader_config received from main are, *{}*←[0m
DEBUG:←[0minvoice2data.input.ocrmypdf: ocrmypdf config settings are: *{'redo_ocr': True, 'optimize': 0, 'output_type': 'pdf', 'fast_web_view': 0}*←[0m

←[1;43mWARNING:←[0mocrmypdf._pipeline:←[1;43m This PDF is marked as a Tagged PDF. This often indicates that the PDF was generated from an office document and does not need OCR. PDF pages processed by OCRmyPDF may not be tagged correctly.←[0m
OCR ---------------------------------------- 0% 0/1 -:--:--←[1;43mWARNING:←[0mocrmypdf._pipeline:←[1;43m Weighted average image DPI is 152.1, max DPI is 247.7. The discrepancy may indicate a high detail region on this page, but could also indicate a problem with the input PDF file. Page image will be rendered at 400.0 DPI.←[0m
OCR ---------------------------------------- 100% 1/1 0:00:00
Linearizing ---------------------------------------- 100% 100/100 0:00:00
←[94mINFO:←[0minvoice2data.input.ocrmypdf:←[94m Text extraction made with ocrmypdf←[0m
DEBUG:←

@bosd
Copy link
Collaborator

bosd commented Aug 28, 2024

The result from pdftotext is empty.

So you're likely running into dependency issues from pdftotext / poppler utils on windows.
Currently windows is not well supported and tested.

There is an open pr to enhance support. But tests are failling.
#565

I'm a linux user. So cannot give you a lot of support on windows.

@bosd bosd changed the title Issues with other templates Issues with other templates on [Windows] Aug 28, 2024
@abhigyanasatpathy
Copy link
Author

abhigyanasatpathy commented Aug 28, 2024

But existing templates are working fine .
I am not able to extract my pdf data.

There is one file :
path: D:\invoice2data-master\invoice2data-env\Lib\site-packages\invoice2data-0.4.5.dist-info\RECORD
should i need to do anything with this file for new templates? or i need to just create templates?

@bosd
Copy link
Collaborator

bosd commented Aug 28, 2024

Just creating the templates should be fine.

Let's check if the template you have created has been loaded.

Do you see your template in the list of loaded templates?

@abhigyanasatpathy
Copy link
Author

Loaded templates meaning ? -- D:\invoice2data-master\src\invoice2data\extract\templates\in\in.demovoice.yml -- this one i can see..

But not able to see here:
DEBUG:←[0mroot: END pdftotext result =============================←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: au.com.opal.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: au.com.telstra.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.accor.invest.ibis.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.accor.invest.novotel.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.boucherie.pochet.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.cebeo.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.eg_retail.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.lampiris.facture-dacompte.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.lampiris.factuur.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.lampiris.regularisation.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.melchior-vins.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.proximus.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.scarlet.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: be.securex.social.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: ch.pcengines.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.AzureInterior.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.amazon.aws.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.apple.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.apps4rent.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.binarylife.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.bloomberg.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.cloudns.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.datadoghq.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.digitalocean.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.envato.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.expressvpn.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.expressvpn_prio6.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.ftserussell.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.github.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.globalsign.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.google.adwords.hk.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.hobohost.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.jamiepro.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.linode.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.microsoftonline.hk-v2017.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.microsoftonline.hk.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.mongodb.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.namecheap.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.namesilo.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.newrelic.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.nl.lenovo.digitalriver.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.nmmn.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.nodisto.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.nyse.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.oyo.invoice.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.packtpub.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.pixartprinting.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.sammymaystone.yml | Keywords matched. No exclude keywords found.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.scaleway.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.textmaster.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.tmx.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.travis-ci.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.twitter.de.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.twitter.uk.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.twitter.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.upwork.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.usersnap.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.amazon.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.bettina-kast.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.digikey.com.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.hosteurope.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.notebooksbilligerBillPay.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.ovh.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.qualityhosting.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: de.united-domains.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.pepephone.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: es.supplies24.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: co.mooncard.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.adobe.ie.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.akretion.fr.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.amazon.aws.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.ateliercopieservice.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.chauffeur-prive.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.coriolis.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.easyjet.fr.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.eaudugrandlyon.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.godaddy.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.google.ie.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.hootsuite.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.jeanbesson.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.ldlc.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.linkedin.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.mention.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.microsoft.ie.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.myflyingbox.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.officetimeline.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.orange-business.mobile.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.ovh.fr.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.rs-online.fr.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.saur.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.soyoustart.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: com.vinci-autoroutes.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: dolibarr.generique.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: eu.trainline.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.actn.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.airfrance.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.also.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.amazon.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.assurance-epargne-pension.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.bouyguestelecom.adsl-fiber.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.bouyguestelecom.mobile.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.butagaz.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.chronopost.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.dirafi.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.domaine-achat.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.easytrip.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.edf.entreprises.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.edf.pme.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.finagaz.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.fountain.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.free.adsl-fiber.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.free.mobile.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.free.mobile2.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.futur.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.ge-iroise.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.greffe-tc-lyon.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.hiscox.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.internetsatellite.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.jpg.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.kubii.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.laposte.boutique.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.laposte.coliposte.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.lecab.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.leroymerlin.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.maaf.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.mediapart.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.moneo-resto.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.mouser.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.mycelium-roulement.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.napsis.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.nexity.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.orange.fibre.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.orange.fixedline.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.prestaclic.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.publicationannoncelegale.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.sfr.adsl-fiber.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.sfr.mobile.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.sosh.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.teledec.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: fr.topoffice.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: net.online.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: net.scaleway.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.action.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.albron.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.anwb.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.be.coolblue.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.begra.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.blokker.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.bouwmans.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.bp.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.bunq.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.cpe.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.esso_eg_services.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.esso_eg_services_v2.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.farnell.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.ferbox.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.gamma.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.goos.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.gulf.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.ipparking.paleiskwartier.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.karwei.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.kav.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.koffiehenk.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.momentsenmore.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.ns.invoice.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.ok.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.parkmobile.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.praxis.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.reclameland.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.saeco.philips.eluscious.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.shell_nederland.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.shell_schellenkens.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.simpel.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.total_express.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.total_ototol.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.transip.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.tuynder.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.vistaprint.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.vodafone.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.wasco.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.weid.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.yezzer.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: nl.zinkunie.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.bmw-fs.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.insert.subiekt-gt.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.insert.subiekt-nexo.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.orlen.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.p4.yml | Failed to match all keywords.←[0m
DEBUG:←[0minvoice2data.extract.invoice_template: Template: pl.paypro.yml | Failed to match all keywords.←[0m

Why?
so asked i just only created yml file and my regex inside template folder ..

So is there anything i need to follow up ?

@bosd
Copy link
Collaborator

bosd commented Aug 29, 2024

Why?

Because you need to check if the template you have created is properly loaded.

Check if your pointing to the correct folder.
(You can disable the built in templates with the following flag to reduce the noise: --exclude-built-in-templates)

You should see your template in that list.
If your template is correct is should say that the keywords have matched..
followed by a.. using template <your template file>

@abhigyanasatpathy
Copy link
Author

Even after i deleted my templates still it is parsing existing pdf .
How's it possible?
Deactivated again activate it though.

@bosd
Copy link
Collaborator

bosd commented Aug 29, 2024

You have to verify if your template is being loaded.

  1. Are you pointing to the correct folder?
  2. Is your custom template loaded? Or does the debugger show that there is an error in your template?
  3. Is your template selected? Do the keywords match?

@abhigyanasatpathy
Copy link
Author

Are you pointing to the correct folder? -- yes
Is your custom template loaded? Or does the debugger show that there is an error in your template? yes error showing
Is your template selected? Do the keywords match? yes checking

But not able to understand when i deleted existing templates for my test purpose, still its working , so i have doubt how is it possible?
From where it is matching keywords it should show that yml file not available but still showing after deleting (for my test purpose)

@bosd
Copy link
Collaborator

bosd commented Aug 29, 2024

\ But not able to understand when i deleted existing templates for my test purpose, still its working , so i have doubt how is it possible?

That sounds like a folder issue.

Maybe it is installed in different versions or locations.

What is the path which shows when you do
'pip show invoice2data'?

Is that the same location as where you where deleting the files?

@abhigyanasatpathy
Copy link
Author

Screenshot 2024-08-30 003538

@abhigyanasatpathy
Copy link
Author

My template location path is :
D:\invoice2data-master\src\invoice2data\extract\templates
Is it okay?

@bosd
Copy link
Collaborator

bosd commented Aug 29, 2024

No, because your standard templates are loaded from the directory in the screenshot.

For easy testing gi to that location and delete the standard templates there. Or add your own custom ones there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants
@bosd @abhigyanasatpathy and others