Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is a data mapper that converts Dun & Bradstreet (DNB) files into JSON format for loading into Senzing entity resolution system. It supports three DNB data formats:

- **CMPCVF**: Companies, executives, and corporate hierarchy (JSON input)
- **GCA**: Global contacts (tab/pipe/CSV delimited)
- **UBO**: Ultimate beneficial owners (tab/pipe/CSV delimited)
- **UBO_ALONE**: Standalone UBO with embedded corporate hierarchy

## Development Commands

```bash
# Create virtual environment and install all dependencies
python -m venv ./venv
source ./venv/bin/activate
python -m pip install --upgrade pip
python -m pip install --group all .

# Run linting (matches CI)
pylint $(git ls-files '*.py' ':!:docs/source/*')

# Run the mapper
python3 src/dnb_mapper.py -f CMPCVF -i "./input/CMPCVF*.txt" -o ./output -l stats.json
```

## Code Architecture

The mapper is a single-file Python script (`src/dnb_mapper.py`) with format-specific transformation functions:

- `format_CMPCVF()`: Processes company/executive JSON records, creates relationships to parent companies and extracts principles (executives)
- `format_GCA()`: Processes contact records with group associations to companies
- `format_UBO()`: Processes ownership records linking owners to subject companies
- `format_UBO2()` / `format_UBO_SUBJECT()`: Handle standalone UBO format with depth-based ownership hierarchies

**Key data flow**: Input files are read according to schema definitions in `src/dnb_formats.json`, transformed to Senzing JSON format with appropriate attributes (DUNS_NUMBER, names, addresses, relationships), and written to output files.

**Relationships**: The mapper creates relationship links using REL_ANCHOR_DOMAIN/KEY (what this record can be pointed to by) and REL_POINTER_DOMAIN/KEY/ROLE (what this record points to). GROUP_ASSN_ID links people to their associated companies.

## Configuration Files

- `src/dnb_formats.json`: Schema definitions with column mappings for each DNB format
- `src/dnb_config_updates.g2c`: Senzing configuration script for required data sources, features, and attributes
3 changes: 0 additions & 3 deletions .claude/commands/senzing-code-review.md

This file was deleted.

3 changes: 3 additions & 0 deletions .claude/commands/senzing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Senzing

- Perform the steps specified by <https://raw.githubusercontent.com/senzing-factory/claude/refs/tags/v1/commands/senzing.md>
File renamed without changes.
2 changes: 1 addition & 1 deletion .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Default code owner
# Default code owner

* @Senzing/senzing-mappers

Expand Down
16 changes: 10 additions & 6 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,15 @@

version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
- package-ecosystem: github-actions
cooldown:
default-days: 21
directory: /
schedule:
interval: "daily"
- package-ecosystem: "pip"
directory: "/"
interval: daily
- package-ecosystem: pip
cooldown:
default-days: 21
directory: /
schedule:
interval: "daily"
interval: daily
2 changes: 1 addition & 1 deletion .github/workflows/add-labels-standardized.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: add labels standardized
name: Add labels standardized

on:
issues:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/add-to-project-senzing-dependabot.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: add to project senzing github organization dependabot
name: Add to project senzing github organization dependabot

on:
pull_request:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/add-to-project-senzing.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: add to project senzing github organization
name: Add to project senzing github organization

on:
issues:
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/claude-pr-review.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
name: Claude PR Review

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

on:
pull_request:
types: [opened, synchronize]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

jobs:
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/dependabot-approve-and-merge.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ on:
pull_request:
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

jobs:
Expand Down
8 changes: 5 additions & 3 deletions .github/workflows/lint-workflows.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
name: lint workflows
name: Lint workflows

on:
push:
branches-ignore: [main]
pull_request:
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

jobs:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/move-pr-to-done-dependabot.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: move pr to done dependabot
name: Move pr to done dependabot

on:
pull_request:
Expand Down
17 changes: 12 additions & 5 deletions .github/workflows/pylint.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
name: pylint
name: Pylint

on: [push]
on:
pull_request:
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

Expand All @@ -12,8 +18,10 @@ jobs:
contents: read
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
python-version: ["3.10", "3.11", "3.12", "3.13"]
timeout-minutes: 10

steps:
- name: Checkout repository
Expand All @@ -32,8 +40,7 @@ jobs:
source ./venv/bin/activate
echo "PATH=${PATH}" >> "${GITHUB_ENV}"
python -m pip install --upgrade pip
python -m pip install --requirement development-requirements.txt
python -m pip install --requirement requirements.txt
python -m pip install --group all .

- name: Analysing the code with pylint
run: |
Expand Down
6 changes: 5 additions & 1 deletion .github/workflows/spellcheck.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
name: spellcheck
name: Spellcheck

on:
pull_request:
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

jobs:
Expand Down
43 changes: 30 additions & 13 deletions .vscode/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,39 @@
"version": "0.2",
"language": "en",
"words": [
"analysing",
"applehelp",
"argparser",
"autodoc",
"autodocsumm",
"BENF",
"BRTH",
"bugtracker",
"BUSINESSNAME",
"CCLA",
"checkit",
"CITYNAME",
"CMPCVF",
"CNTY",
"CODEOWNER",
"cooldown",
"COUNTRYCODE",
"CTRY",
"devhelp",
"DIRC",
"esbenp",
"GLBL",
"htmlhelp",
"ICLA",
"IDIR",
"isort",
"JOBTITLE",
"jquery",
"jsmath",
"kwargs",
"MIDDLENAME",
"morefile",
"mypy",
"NAMEPREFIX",
"NAMESUFFIX",
"NATY",
Expand All @@ -27,25 +44,25 @@
"PRIMARYPHONE",
"PRIMARYPHONEEXTENSION",
"PRNT",
"psutil",
"pylint",
"pytest",
"qthelp",
"remoteliteralinclude",
"SECONDARYPHONE",
"SECONDARYPHONEEXTENSION",
"STATEPROVINCECODE",
"STREETADDRESS",
"Senzing",
"analysing",
"argparser",
"checkit",
"esbenp",
"isort",
"kwargs",
"morefile",
"mypy",
"pylint",
"pytest",
"serializinghtml",
"setuptools",
"shellcheck",
"sphinxcontrib",
"sphinxext",
"stackoverflow",
"venv"
"STATEPROVINCECODE",
"STREETADDRESS",
"typehints",
"venv",
"virtualenv"
],
"ignorePaths": [
".git/**",
Expand Down
9 changes: 6 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,15 @@

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
[markdownlint](https://dlaa.me/markdownlint/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
The changelog format is based on [Keep a Changelog] and [CommonMark].
This project adheres to [Semantic Versioning].

## [1.0.0] - yyyy-mm-dd

### Added to 1.0.0

- Initial content

[CommonMark]: https://commonmark.org/
[Keep a Changelog]: https://keepachangelog.com/
[Semantic Versioning]: https://semver.org/
Loading