Skip to content

Latest commit

 

History

History

shopping

Shopping data

Watch first the video animation Kanta-asiakkuuden jäljet!

How to get my shopping data?

S-group: Fill in a form in a customer service desk.

K-group: Fill in a paper form and mail it to X (sorry forgot the details, trying to search for the link).

Both S and K will send you your data in a paper format via mail. This makes processing the data much harder but not impossible. In the future the data will hopefully be provided in a convenient machine readable format.

Some news about the data here and here

How to analyze my shopping data?

In short, you need to

  • Scan your data
  • Use some Optical character recognition (OCR) tool to convert scanned data into a machine readable format
  • Process and analyse the converted data

Here's an example workflow that worked for me

  • Scan your data into a PDF
  • Use Tesseract for OCR
  • Use R for processing and analysing the data
    • R script with a lot of different processing stages
  • More details in the end of this page!

See the video animation Kanta-asiakkuuden jäljet!

Here are also some visualizations of the data:

fig1

fig2

More details of the tools used

Some tips and details of installing and using the tools on OSX 1.8.5.

OCR with Tesseract

Installation

Running OCR

  • If data is given in a table format with borders, OCR will be in trouble. There might be some option for Tesseract to adapt to this, but at least I didn't find anything. So I ended up removing the horizontal lines in R, which was not trivial since the lines were not exactly horizontal but a bit tilted instead
  • It would have also been useful too add custom vocabulary such as "supermarket", but I did not get this to work with Tesseract (some hints here)

Animation

Data sonification

  • Used R package playitbyr
  • Needs Csound, installing instructions here
  • Note! playitbyr does not work with Csound 6, so install version 5 instead!

test edit