AMT-8000 Data Transformation Tool

Overview

The AMT-8000 is a retro sci-fi themed command-line interface for transforming and validating data files. It provides an engaging, interactive experience while maintaining professional functionality.

Quick Start

# Process files
python src/main.py uploads/

# View transmission history
python src/main.py --history

# Delete transmissions matching pattern
python src/main.py --delete "2024-01"

# Disable color output
python src/main.py uploads/ --no-color

Features

Retro Sci-Fi Interface

Stylized boot sequence with system checks
Mission-oriented terminology
Progress indicators and status messages
Color-coded output (with --no-color option)

Core Functionality

Multi-format input support (Excel/CSV)
Smart column mapping with fuzzy matching
Data validation and transformation
Progress tracking for long operations
Transmission history management

Command Options

python src/main.py [-h] [--no-color] [--page-size PAGE_SIZE] [--history] [--delete PATTERN] [input_path]

arguments:
  input_path            Input file or directory path
  --no-color           Disable color output
  --page-size PAGE_SIZE
                      Number of items per page (default: 5)
  --history           View transmission history
  --delete PATTERN    Delete transmissions matching pattern

Operation Phases

Boot Sequence
- System initialization
- Component checks
- Mission parameters loading
Scan Phase
- Directory scanning
- File detection
- Sample data preview
Alignment Phase
- Column mapping
- Data validation
- Configuration review
Transmission Phase
- Data transformation
- Progress tracking
- Output generation
- Log entry creation

Examples

Processing Files

# Process single file
python src/main.py uploads/data.xlsx

# Process directory
python src/main.py uploads/

Managing Transmissions

# View transmission history
python src/main.py --history

# Delete transmissions from January 2024
python src/main.py --delete "2024-01"

# Delete transmissions by filename pattern
python src/main.py --delete "user_data"

Display Options

# Disable color output
python src/main.py uploads/ --no-color

# Adjust page size for large datasets
python src/main.py uploads/ --page-size 10

Error Handling

The AMT-8000 includes comprehensive error handling:

Input validation
File access checks
Data transformation validation
Progress monitoring
User interruption handling
Detailed error messages

Best Practices

File Preparation
- Place input files in the uploads/ directory
- Use consistent column naming
- Ensure file permissions are correct
Operation
- Review mapping suggestions carefully
- Monitor progress indicators
- Check transmission logs regularly
- Use pattern-based cleanup for maintenance
Troubleshooting
- Check error messages in transmission log
- Use --no-color for compatibility issues
- Adjust page size for different dataset sizes

System Requirements

Python 3.8+
zsh shell (for automated setup)

Quick Start

Clone the repository:

git clone https://github.com/yourusername/data-transformation-tool.git
cd data-transformation-tool

Place your input file(s) in the uploads directory
Run the initialization script:

chmod +x init.sh
./init.sh

The script will:

Create a virtual environment
Install required dependencies
Process files from the uploads directory
Generate output files in the converts directory

Manual Installation (Alternative)

If you prefer manual setup:

Create a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Run manually:

python src/main.py <input_file_or_directory>

Project Structure

├── init.sh                    # Automated setup and execution script
├── src/
│   ├── main.py               # Entry point
│   ├── reader.py             # Input file reader
│   ├── header_mapper.py      # Column mapping logic
│   ├── data_transformer.py   # Data transformation
│   ├── validator.py          # Data validation
│   ├── output_generator.py   # Output file generation
│   ├── schema.json          # Data schema definition
│   └── header_mappings.yaml # Saved header mappings
├── uploads/                  # Input directory
├── converts/                 # Output directory
├── tests/                   # Test files
├── requirements.txt         # Dependencies
└── README.md               # This file

Data Flow

flowchart TD
    A[Input File] --> B[Reader]
    B --> C[Header Mapper]
    C --> D[Data Transformer]
    D --> E[Validator]
    E --> F{Valid?}
    F -->|Yes| G[Valid Output]
    F -->|No| H[Invalid Output]
    
    subgraph "Header Mapping Process"
    C1[Load Schema] --> C2[Fuzzy Match]
    C2 --> C3[User Review]
    C3 --> C4[Save Mappings]
    end
    
    subgraph "Transformation Process"
    D1[Map Fields] --> D2[Format Dates]
    D2 --> D3[Convert Booleans]
    D3 --> D4[Handle Relationships]
    end
    
    subgraph "Validation Process"
    E1[Check Required Fields] --> E2[Validate Formats]
    E2 --> E3[Verify Relationships]
    end

Schema Rules

Entity Data

Users:
- Required: user_id/username/email
- Required for mapping: first_name + last_name OR full_name
  - All 3 fields (first_name, last_name, full_name) must be populated in output
- is_active: "Yes" or "No"
- Dates in ISO 8601 format
Groups:
- Required: group_id or group_name
Roles:
- Required: role_id or role_name
Resources:
- Required: resource_id or resource_name

Relationships

User Groups:
- All fields required
- user_id: Uses Users tab value, or email, or username
- group_id: Uses Groups tab value or incremental number
User Roles:
- All fields required
- user_id: Same rules as User Groups
- role_id: Uses Roles tab value or role_name
Group Roles:
- All fields required
- group_id: Uses Groups tab value or incremental number
- role_id: Uses Roles tab value or role_name
User Resources:
- All fields required
- user_id: Same rules as User Groups
- resource_id: Uses Resources tab value or resource_name
Role Resources:
- All fields required
- role_id: Uses Roles tab value or role_name
- resource_id: Uses Resources tab value or resource_name

Error Handling

The tool generates two output files:

converted_[filename].xlsx: Contains valid records
invalid_records_[filename].xlsx: Contains invalid records with error details

Logging

Log file: validation.log
Includes:
- Transformation errors
- Validation failures
- Processing statistics

Interactive Features

Smart header mapping with fuzzy matching
Preview of data samples during mapping
Ability to save and reuse mappings
Interactive confirmation of mappings
Option to skip optional fields
Progress indicators during processing

Best Practices

Input Files:
- Place files in the uploads directory
- Use either Excel (.xlsx) or CSV format
- Ensure data consistency within columns
Header Mapping:
- Review automatic mappings carefully
- Use saved mappings for consistency
- Map all mandatory fields
Data Validation:
- Check invalid records output
- Review validation.log for errors
- Correct source data if needed

Troubleshooting

Common issues and solutions:

Environment Setup:

# If init.sh fails, try:
chmod +x init.sh
./init.sh

Input Files:
- Ensure files are not open in other applications
- Check file permissions
- Verify file format (Excel/CSV)
Processing Errors:
- Check validation.log for details
- Ensure all mandatory fields are mapped
- Verify data formats match schema requirements

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Development Guidelines

Follow PEP 8 style guide
Add tests for new features
Update documentation
Maintain backward compatibility

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For support:

Check the documentation
Review closed issues
Open a new issue with:
- Description of the problem
- Steps to reproduce
- Expected vs actual behavior
- Log files and screenshots

Acknowledgments

Thanks to all contributors
Built with Python and open-source libraries
Inspired by real-world data transformation needs

Changelog

v1.0.0 (Initial Release)

Multi-format input support (Excel/CSV)
Smart header mapping with fuzzy matching
Data validation and transformation
Standardized output generation
Interactive CLI interface
Comprehensive logging

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
converts		converts
src		src
uploads		uploads
.env		.env
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
Spreadsheet_Format.xlsx		Spreadsheet_Format.xlsx
TRDv1.1.0.txt		TRDv1.1.0.txt
TRDv1.1.1.txt		TRDv1.1.1.txt
init.sh		init.sh
mappings_history.json		mappings_history.json
py-custom-app.code-workspace		py-custom-app.code-workspace
requirements.txt		requirements.txt
schema_rules		schema_rules
todo.txt		todo.txt
todo_AMT8000.txt		todo_AMT8000.txt
validate_schema.py		validate_schema.py

safer-strategy/data-transformation-tool

Folders and files

Latest commit

History

Repository files navigation

AMT-8000 Data Transformation Tool

Overview

Quick Start

Features

Retro Sci-Fi Interface

Core Functionality

Command Options

Operation Phases

Examples

Processing Files

Managing Transmissions

Display Options

Error Handling

Best Practices

System Requirements

Quick Start

Manual Installation (Alternative)

Project Structure

Data Flow

Schema Rules

Entity Data

Relationships

Error Handling

Logging

Interactive Features

Best Practices

Troubleshooting

Contributing

Development Guidelines

License

Support

Acknowledgments

Changelog

v1.0.0 (Initial Release)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages