Data Cleaning Pipeline

aneezakiran07 shipped Data Cleaning Pipeline

12 days ago

Shipped this project!

Hours: 7.94

Cookies: 🍪 150

Multiplier: 18.89 cookies/hr

HII!!!
So, I finally shipped this after working on it for a while and I’m happy with how it turned out(took me months :”) “).

The app now has a full Validation & Quality section with five new operations that can validate emails (flag or drop invalid ones), standardize messy phone numbers into one clean format, properly parse mixed-format dates, detect and cap outliers using IQR or Z-score with configurable thresholds, and validate value ranges to catch impossible values like age = -5 or score = 999.

The hardest part was definitely the date standardization because pandas was silently failing on most non-ISO formats, so I rewrote it from scratch to try 17 explicit formats per cell, which boosted accuracy from almost useless to actually reliable.

I also make every operation follow the same clean interaction with popup dropdowns, apply to all, checkboxes, and disabled action buttons until valid input is given
I built this because i love to do tasks related to data cleaning, so i realize why not make a tool generally for it :”)

Note: You can download test_data csv from my github to check this shipped project

aneezakiran07 worked on Data Cleaning Pipeline

12 days ago

0h 17m logged

Hi!!!
In this devlog, i tested all the functionalities once again. I am sure they all are working correctly.
I also have attached the demo video along with my voice explaining all the functionalities!!!
I hope you like my work :”)
Demo Video: https://drive.google.com/file/d/1-grGcqJQ6B3npXZOQ_7ZmkfjFL48aCGl/view?usp=sharing

2

0

Log in to leave a comment

aneezakiran07 worked on Data Cleaning Pipeline

12 days ago

0h 59m logged

HI!!!
In this devlog, I added a new Validation & Quality section in the data cleaning pipeline where I implemented five new validators to check emails, clean and standardize phone numbers, correctly parse mixed-format dates, detect and handle outliers, and validate value ranges. This update helped fix many hidden data issues, especially the date parser which now handles almost all common formats., and I also kept the UI consistent with the rest

2

0

Log in to leave a comment

aneezakiran07 worked on Data Cleaning Pipeline

13 days ago

0h 51m logged

HII!!
So in this devlog,
I turned the app into an intelligent assistant that actually thinks about your data.
It now scans the dataset, finds problems like duplicates, currency symbols, wrong data types, percentages as text, and missing values, and shows them as simple fix cards.
Each issue comes with a one-click “Fix This” button, and I also added checkboxes in front of AI suggestions so users can select exactly which columns they want to apply fixes to.
Added an “Auto-Fix All Issues” button that runs the full pipeline in the best order and fixes everything at once.
Now beginners can clean data without technical knowledge, while advanced users still have full control.

2

0

Log in to leave a comment

aneezakiran07 worked on Data Cleaning Pipeline

17 days ago

1h 32m logged

HII!!
So in this devlog,
I upgraded the website with a super-smart missing value handler. It now detects all kinds of missing data like “NA”, “?”, -999, and more, then fills them intelligently.
Numeric columns get KNN or MICE imputation depending on dataset size, while categorical columns get mode or “Missing” automatically.
Smart threshold drops columns with too many missing values, and everything can be controlled in the sidebar.
Also added a one-click “Full Pipeline” button that runs all cleaning steps in the best order, with detailed feedback showing exactly what changed.

2

0

Log in to leave a comment

aneezakiran07 worked on Data Cleaning Pipeline

19 days ago

1h 8m logged

HII!!
So in this devlog,
I upgraded the app into a smart data transformation system by adding intelligent string cleaning and automatic type detection.
The system now cleans text, detects patterns like currency, percentages, units, durations, and numeric values, and converts them automatically.
smart threshold system prevents wrong conversions, and users can control sensitivity using a settings sidebar.
also added a one-click “Run Basic Pipeline” button and detailed feedback showing exactly what was converted.

1

0

Log in to leave a comment

aneezakiran07 worked on Data Cleaning Pipeline

19 days ago

0h 55m logged

HI!!!
SO in this devlog, I improved the data cleaning pipeline with a cleaner UI and a real-time statistics dashboard showing rows, columns, missing cells, duplicates, and data types.
I added a flexible preview slider (5–50 rows), a collapsible column info panel, export options (CSV & Excel), and a reset button to restore the original dataset instantly.

2

0

Log in to leave a comment

aneezakiran07 worked on Data Cleaning Pipeline

about 2 months ago

0h 59m logged

I’ve built a Streamlit UI that lets you upload a CSV and instantly clean it by dropping duplicate rows/columns and stripping extra spaces from text. User will choose what functions he want to run using the provided buttons. I already pushed the code for these three core functions and added a data preview so you can see the results immediately. For the next session, I’m moving on to removing special characters and fixing missing values.

2

0

Log in to leave a comment

aneezakiran07 worked on Data Cleaning Pipeline

about 2 months ago

1h 12m logged

I spent an hour setting up a Streamlit interface to handle the tedious parts of data cleaning. I wrote three core functions that take any uploaded CSV and automatically fix it: one to strip hidden whitespaces from text, one to drop duplicate rows, and a third to find and remove identical columns. The goal was to make something generic so I don’t have to manually clean files every time I start a new project. It’s simple, fast, and handles the “dirty” data work in one click.

2

0

Log in to leave a comment

aneezakiran07 worked on Data Cleaning Pipeline

about 2 months ago

I’m working on my first project! This is so exciting. I can’t wait to share more updates as I build.

3

0

Log in to leave a comment

0 Followers

Shipped this project!