Open in app

Sign In

Write

Sign In

Travis Tang
Travis Tang

2K Followers

Home

About

Published in

Towards Data Science

·Pinned

Polars: Pandas DataFrame but Much Faster

Perform multithreaded, optimized pandas operations — Let’s face it. Pandas is slow. When you have millions of rows in your dataframe, it becomes incredibly frustrating to wait for a minute for a single line of code to execute. You will end up spending more time waiting than doing actual analytics. Multiple libraries exist to solve this…

Machine Learning

11 min read

Polars: Pandas DataFrame but Much Faster
Polars: Pandas DataFrame but Much Faster
Machine Learning

11 min read


Published in

Towards Data Science

·Apr 24

Class Imbalance Strategies — A Visual Guide with Code

Understand Random Undersampling, Oversampling, SMOTE, ADASYN, and Tomek Links — Class imbalance occurs when one class in a classification problem significantly outweighs the other class. It’s common in many machine learning problems. Examples include fraud detection, anomaly detection, and medical diagnosis. The Curse of Class Imbalance A model trained on an imbalanced dataset perform poorly on the minority class. At best, this can cause loss…

Machine Learning

13 min read

Class Imbalance Strategies — A Visual Guide with Code
Class Imbalance Strategies — A Visual Guide with Code
Machine Learning

13 min read


Published in

DataDrivenInvestor

·Apr 11

60 ChatGPT Prompts for Data Science (Tried, Tested, and Rated)

Automate data science tasks with ChatGPT — I rated 60 ChatGPT functions for Data Science. Use these prompts and ask ChatGPT to write, and explain code, optimize data science code. It can also explain data science concepts, suggest ideas, and troubleshoot problems. How to Read this The table of content contains all 60 prompts. Click on the links in the table…

Machine Learning

27 min read

60 ChatGPT Prompts for Data Science (Tried, Tested, and Rated)
60 ChatGPT Prompts for Data Science (Tried, Tested, and Rated)
Machine Learning

27 min read


Published in

Towards AI

·Jan 11

Cleanlab: Correct your data labels automatically and quickly

Data-centric AI without manually relabeling your data — I used an open-sourced library, cleanlab, to remove low-quality labels on an image dataset. The model trained on the dataset without low-quality data gained 4 percentage points of accuracy compared to the baseline model (trained on all data). Improving data quality sounds easy enough. It’s essentially identifying and rectifying wrong…

Data Science

8 min read

Cleanlab: Correct your data labels automatically and quickly
Cleanlab: Correct your data labels automatically and quickly
Data Science

8 min read


Published in

Towards AI

·Dec 26, 2022

Lazypredict: Run All Sklearn Algorithms With a Line Of Code

How to (and why you shouldn’t) use it — Here are two pain points of data scientists: Pain Point 1: Limited time in the data science lifecycle Data scientists have to prioritize. This may mean spending more time on understanding the business problem and identifying the most appropriate approach rather than focusing solely on developing machine learning algorithms. Pain point 2: Machine learning modeling can be time-consuming

Machine Learning

10 min read

Lazypredict: Run All Sklearn Algorithms With a Line Of Code
Lazypredict: Run All Sklearn Algorithms With a Line Of Code
Machine Learning

10 min read


Published in

Towards Data Science

·Dec 19, 2022

Convert Jupyter Notebooks into Functions

Parameterize notebooks so you can programmatically run them — You’ve trained your machine learning model in a Jupyter notebook. Now, you want to run that model on data that comes in daily. Day in, day out, you create a new copy of the same notebook and run it. …

Machine Learning

7 min read

Convert Jupyter Notebooks into Functions
Convert Jupyter Notebooks into Functions
Machine Learning

7 min read


Published in

Towards Data Science

·Dec 13, 2022

4x Faster Pandas Operations with Minimal Code Change

Stop waiting on pandas operations. Parallelize them. — One of the major limitations of Pandas is that it can be slow when working with large datasets, particularly when running complex operations. This can frustrate data scientists and analysts who need to process and analyze large datasets in their work. There are a few ways to address this issue…

Data Science

6 min read

4x Faster Pandas Operations with Minimal Code Change
4x Faster Pandas Operations with Minimal Code Change
Data Science

6 min read


Published in

Towards Data Science

·Dec 9, 2022

Using an Out-of-Core Approach to Process Large Datasets

Faster big-data analysis workflows with an open-source library — If you’re a data scientist working with large datasets, you must have run out of memory (OOM) when performing analytics or training machine learning models. That’s not surprising. The memory available on a desktop or laptop computer can easily exceed large…

Machine Learning

5 min read

Using an Out-of-Core Approach to Process Large Datasets
Using an Out-of-Core Approach to Process Large Datasets
Machine Learning

5 min read


Published in

Towards Data Science

·Oct 25, 2022

Unit Testing for Data Science with Python

Catch expensive mistakes early with nose2 and parameterized tests — You’ve deployed a new machine learning model at work. You can finally enjoy the weekend, you thought to yourself. Little did you know that an imminent storm of errors is about to tear down your model and ruin your weekend. Why does that happen? Insufficient error checking. Data scientists are…

Machine Learning

6 min read

Unit Testing for Data Science with Python
Unit Testing for Data Science with Python
Machine Learning

6 min read


Published in

Analytics Vidhya

·Nov 10, 2021

Automate Your Machine Learning Training Process with TPOT

Stop rewriting the same code for model selection and hyper-parameter search — Let’s face it — model training is extremely time-consuming. What if you could automate it? Meet TPOT, your data science assistant. It saves you time and effort in looking for the most optimized machine learning pipeline. Give TPOT the data, and it will give you the code for the most…

Data Science

6 min read

Automate Your Machine Learning Training Process with TPOT
Automate Your Machine Learning Training Process with TPOT
Data Science

6 min read

Travis Tang

Travis Tang

2K Followers

Self-taught data scientist | 20k+ followers on LinkedIn linkedin.com/in/travistang

Following
  • Bex T.

    Bex T.

  • Ben Le Fort

    Ben Le Fort

  • Richmond Alake

    Richmond Alake

  • MIT Open Learning

    MIT Open Learning

  • Kurtis Pykes

    Kurtis Pykes

See all (140)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams