Introducing the pd-helper Package for Pandas DataFrames

Justin Chae
2 min readApr 8, 2021

A Python package to run auto optimization of a Pandas DataFrame.

Reduce DataFrame Memory Consumption by ~70% and configure dtypes automatically for best precision with pd-helper.

I released a Python Package in Beta, this is the Announcement

The project is up on PyPi here: https://pypi.org/project/pd-helper/

Also on GitHub here: https://github.com/justinhchae/pd-helper

Edit 15 April 2021: Stable release 1.0.0 is now available.

Install:

pip install pd-helper

Basic Usage:

from pd_helper.helper import optimize

if __name__ == "__main__":
# some DataFrame, df
df = optimize(df)

Better Usage With Multiprocessing:

from pd_helper.helper import optimizeif __name__ == "__main__":
# some DataFrame, df
df = optimize(df, enable_mp=True)

Specify Special Mappings:

from pd_helper.helper import optimizeif __name__ == "__main__":
# some DataFrame, df
special_mappings = {'string': ['col_1', 'col_2'],
'category': ['col_3', 'col_4']}
# special mappings will be applied df = optimize(df
, enable_mp=True,
special_mappings=special_mappings
)

Add pd-helper to your pipeline:

from pd_helper.helper import optimizeif __name__ == "__main__":
# some DataFrame, df
special_mappings = {'string': ['col_1', 'col_2'],
'category': ['col_3', 'col_4']}
exclude_cols = ['col_5'] df = (df.pipe(some_other_function)
.pipe(optimize
, special_mappings=special_mappings}
, parse_col_names=True
, exclude_cols=exclude_cols
, enable_mp=True))

--

--

Justin Chae

Justin writes about technology, programming, and general interest topics.