MechaDataCleaner User Manual

Version: 1.3.1 | Last Updated: February 5, 2026

Current build highlights

  • Grid-based cleaning UI: Configure data types and cleaning operations per column in an interactive grid. Check boxes directly in the grid to enable Proper Case, UPPER, lower, trimming, and more.
  • Bulk apply controls: Apply settings to all columns at once using the "Apply to All Columns" expander above the grid.
  • Advanced columns: Enable Deduplicate (Exact/Fuzzy), Remove Emoji, and Convert Word Numbers via the Advanced Columns multiselect.
  • Plan-based limits: Usage limits now enforced based on your subscription tier (Free: 5 cleanings, 10 columns, 1,000 rows; Starter: 50/20/10,000; Pro: 500/100/100,000).

Introduction

MechaDataCleaner is a web-based data preparation tool designed to help you clean and optimize your data effortlessly. With its intuitive interface, you can prepare your data for analysis, reporting, and business intelligence tools like Power BI.

Key Features

  • Automated Schema Detection - AI-powered column type inference
  • Data Cleaning - Standardization, validation, deduplication
  • Batch Processing - Clean multiple files at once
  • AI Assistance - Interactive help for data cleaning tasks
  • Dark Mode Support - Comfortable interface for extended use

Who Is This For?

  • Data Analysts preparing files for dashboards
  • Business Users needing clean, standardized datasets
  • Data Engineers building ETL pipelines

Getting Started

Accessing the App

  1. Visit the MechaDataCleaner website and log in to your account.
  2. Once logged in, you will be directed to the main dashboard.
  3. Choose between cleaning a single file or processing multiple files in batch mode.

Quick Start

  1. Click Browse files or drag and drop a CSV/Excel file.
  2. Review the auto-detected column types in the grid.
  3. Check boxes in the grid to enable cleaning operations per column.
  4. Adjust sidebar settings as needed.
  5. Click Clean Data to process your file.
  6. Download the cleaned file and schema.

Interface Overview

Main Tabs

Single File Tab

  • Upload and clean individual files.
  • Review column types and configure transformations.
  • Preview data before and after cleaning.
  • Download cleaned data, schemas, and audit logs.

Batch Upload Tab

  • Upload multiple files at once.
  • Process all files with the same settings.
  • Download all results as a ZIP file.
  • View quality metrics for each file.

Sidebar Sections

  1. Settings - Core cleaning configuration.
  2. Cleaning Options - Deduplication and validation.
  3. Dates & Formatting - Date handling preferences.
  4. Data Transformations - Pre/post-processing options.
  5. Custom Rules - Create custom transformation rules.
  6. Account Info - Usage limits and profile.

Core Features

File Upload

Supported Formats:

  • CSV (.csv)
  • Excel (.xlsx, .xls)

Features:

  • Automatic encoding detection.
  • Mojibake (text corruption) fixing.
  • Large file support with sampling.

Column Type Detection

The app automatically detects these types:

  • Basic: str, int, float, bool, category
  • Dates: date, datetime
  • Validation: email, phone, url
  • Advanced: ipv4, ipv6, uuid, domain, and more

Inference Modes:

  • Strict - Conservative type detection (fewer false positives).
  • Relaxed - Aggressive type detection (catches more patterns).

AI type preview: With AI-enhance schema enabled, you now see AI-inferred types in the type table before applying them, so you can review and override suggestions confidently.

Data Cleaning

Automated Operations:

  • Remove exact duplicates.
  • Standardize text (trim, case normalization).
  • Trim suffixes (drop the last N characters) to remove trailing IDs or noise.
  • Validate emails, phones, URLs.
  • Handle missing values.
  • Detect and handle outliers with a quick IQR-based summary in the Data Quality Profile.
  • Normalize headers and column names.

Data Cleaning Workflow

Two Ways to Configure Cleaning

MechaDataCleaner offers two complementary methods to configure your data cleaning. You can use either one or both together:

Grid-Based Configuration

Configure cleaning operations per column directly in an interactive grid. Best for column-specific transformations like case changes, trimming, and character removal.

Sidebar Settings

Configure global options that apply to all columns or the entire file. Best for deduplication, date formatting, validation rules, and custom rules.

Tip: Both methods work together. Grid settings handle per-column operations while sidebar settings handle file-wide operations. When you click Clean Data, all settings from both are applied.

Step 1: Upload File

  1. Click "Browse files" or drag and drop a CSV or Excel file.
  2. File loads with automatic encoding detection.
  3. Preview rows are displayed based on your plan (Free: 5, Starter: 50, Pro: 100).

Step 2: Configure Cleaning in the Grid (Per-Column Settings)

The grid appears in the main area after uploading a file. Each row represents a column from your data.

  1. Type: Select the data type for each column from the dropdown (str, int, float, date, email, phone, etc.).
  2. Text transformations: Check boxes to enable:
    • Proper Case - Capitalize first letter of each word
    • UPPER - Convert to uppercase
    • lower - Convert to lowercase
  3. Trimming options: Check boxes to enable:
    • Trim Lead - Remove leading spaces
    • Trim Trail - Remove trailing spaces
    • Rm Non-Print - Remove non-printable characters
  4. Character removal: Check boxes to enable:
    • Rm Spaces - Remove all spaces
    • Rm Letters - Remove all letters
    • Rm Numbers - Remove all numbers
    • Nullify 0s - Replace zeros with NULL
  5. Bulk apply: Use the "Apply to All Columns" expander above the grid to apply the same settings to every column at once.
  6. Advanced columns: Click the Advanced Columns multiselect to add extra operations:
    • Deduplicate (Exact) - Mark exact duplicate rows
    • Deduplicate (Fuzzy) - Mark similar rows using similarity matching
    • Remove Emoji - Remove emoji characters
    • Convert Word Numbers - Convert "thirty" to 30

Step 3: Configure Sidebar Settings (Global Options)

The sidebar on the left contains settings that apply to the entire file or multiple columns at once.

  1. Cleaning Options:
    • AI-enhance schema - Use AI for smarter type detection
    • Deduplication mode - None, Exact (all columns), Exact (selected), or Fuzzy match
    • Handle invalid rows - Keep as-is, Flag (add column), or Remove
  2. Dates and Formatting:
    • Date standardization - Normalize all date formats
    • Date input format - Specify MM/DD/YYYY, DD/MM/YYYY, or auto-detect
    • Date-only mode, date keys, and other date options
  3. Data Transformations:
    • Header normalization - Clean column names
    • Outlier detection - Flag values outside IQR thresholds
    • Schema validation settings
  4. Custom Rules: Add conditional transformations (Starter: up to 3, Pro: unlimited).

Step 4: Run Cleaning

When you run cleaning, both grid settings and sidebar settings are applied together.

Option A: Create Schema Only

Generates schema JSON without cleaning. Useful for validation or reuse.

Option B: Clean Data

Applies all grid settings and sidebar options, validates and cleans data, generates quality metrics.

Step 5: Review Results

  1. Check quality metrics (completeness, duplicates removed).
  2. Review before/after preview.
  3. Download: Cleaned CSV/Excel, Schema JSON, Audit log.

Batch Processing

When to Use Batch Mode

  • Multiple tables with same structure.
  • Consistent transformations across files.
  • ETL pipelines requiring bulk processing.

Workflow

  1. Switch to Batch Upload tab.
  2. Upload multiple CSV/Excel files.
  3. Configure sidebar settings (applied to all files).
  4. Preview individual files from dropdown.
  5. Click "Process All Files".
  6. Download ZIP with all results.

ZIP Contents:

  • cleaned_*.csv - Cleaned files.
  • schemas/*.json - Schema definitions.
  • audit_logs/*.json - Processing logs.

AI Features

AI-Enhanced Schema Detection

Requirements: AI assistance enabled in your account settings.

What It Does:

  • Detects semantic types (email, phone, currency).
  • Validates data patterns.
  • Suggests appropriate data types.
  • Adds validation rules to schema.

AI Chat Bot (BETA)

Features:

  • Ask about your dataset.
  • Get cleaning recommendations.
  • Troubleshoot data issues.

Usage Limits:

  • Free: 10 messages/month.
  • Starter: 100 messages/month.
  • Pro: 500 messages/month.

Troubleshooting

Common Issues

File Upload Fails

  • Check file size (large files may take longer to process).
  • Verify file format (CSV, XLSX, XLS only).
  • Try smaller sample if file is very large.

Type Detection Incorrect

  • Switch between Strict/Relaxed inference modes.
  • Manually override types in the type selection table.
  • Enable AI-enhancement for better detection.

Cleaning Takes Too Long

  • Disable AI-enhancement for faster processing.
  • Reduce sample size in Processing Options.
  • Disable expensive operations (fuzzy deduplication, outlier detection).

AI Features Not Working

  • Verify AI assistance is enabled in your account.
  • Ensure you have sufficient usage credits.

Error Messages

"You've reached your cleaning limit"

  • Upgrade to Starter or Pro plan.
  • Wait for monthly limit reset.
  • Contact support for assistance.

"Schema validation failed"

  • Review schema requirements.
  • Disable "Fail on schema errors" to see issues.
  • Check validation mode (try Loose instead of Strict).

Best Practices

Data Preparation

  • Start Small: Test with a sample before processing large files.
  • Review Types: Always verify auto-detected types are correct.
  • Incremental Changes: Apply transformations one at a time.
  • Save Schemas: Reuse schemas for consistent processing.

Performance Optimization

  • Disable Unused Features: Turn off AI enhancement if not needed.
  • Adjust Sample Size: Use smaller samples for faster iteration.
  • Batch Wisely: Group similar files for batch processing.
  • Use Strict Mode: Faster than Relaxed inference mode.

Data Quality

  • Flag Before Removing: Use Flag mode to review invalid data.
  • Check Metrics: Review quality scores before proceeding.
  • Audit Logs: Download and archive for traceability.
  • Validate Early: Use schema validation to catch issues.

Need More Help?

For additional assistance or feature requests, contact us at support@mechadatacleaner.com

This website utilizes technologies such as cookies to enable essential site functionality, as well as for analytics, personalization, and targeted advertising. Privacy Notice