Introducing the pd-helper Package for Pandas DataFrames

Justin Chae

2 min readApr 8, 2021

A Python package to run auto optimization of a Pandas DataFrame.

Reduce DataFrame Memory Consumption by ~70% and configure dtypes automatically for best precision with pd-helper.

I released a Python Package in Beta, this is the Announcement

The project is up on PyPi here: https://pypi.org/project/pd-helper/

Also on GitHub here: https://github.com/justinhchae/pd-helper

Edit 15 April 2021: Stable release 1.0.0 is now available.

Install:

pip install pd-helper

Basic Usage:

from pd_helper.helper import optimize

if __name__ == "__main__":
   # some DataFrame, df
   df = optimize(df)

Better Usage With Multiprocessing:

from pd_helper.helper import optimizeif __name__ == "__main__":
   # some DataFrame, df
   df = optimize(df, enable_mp=True)

Specify Special Mappings:

from pd_helper.helper import optimizeif __name__ == "__main__":
   # some DataFrame, df
   special_mappings = {'string': ['col_1', 'col_2'],
                       'category': ['col_3', 'col_4']}   # special mappings will be applied   df = optimize(df
                 , enable_mp=True,
                 special_mappings=special_mappings
                 )

Add pd-helper to your pipeline:

from pd_helper.helper import optimizeif __name__ == "__main__":
   # some DataFrame, df
   special_mappings = {'string': ['col_1', 'col_2'],
                       'category': ['col_3', 'col_4']}   exclude_cols = ['col_5']   df = (df.pipe(some_other_function)
        .pipe(optimize
              , special_mappings=special_mappings}
              , parse_col_names=True
              , exclude_cols=exclude_cols
              , enable_mp=True))