Introducing the pd-helper Package for Pandas DataFrames
2 min readApr 8, 2021
A Python package to run auto optimization of a Pandas DataFrame.
Reduce DataFrame Memory Consumption by ~70% and configure dtypes automatically for best precision with pd-helper.
I released a Python Package in Beta, this is the Announcement
The project is up on PyPi here: https://pypi.org/project/pd-helper/
Also on GitHub here: https://github.com/justinhchae/pd-helper
Edit 15 April 2021: Stable release 1.0.0 is now available.
Install:
pip install pd-helper
Basic Usage:
from pd_helper.helper import optimize
if __name__ == "__main__":
# some DataFrame, df
df = optimize(df)
Better Usage With Multiprocessing:
from pd_helper.helper import optimizeif __name__ == "__main__":
# some DataFrame, df
df = optimize(df, enable_mp=True)
Specify Special Mappings:
from pd_helper.helper import optimizeif __name__ == "__main__":
# some DataFrame, df
special_mappings = {'string': ['col_1', 'col_2'],
'category': ['col_3', 'col_4']} # special mappings will be applied df = optimize(df
, enable_mp=True,
special_mappings=special_mappings
)
Add pd-helper to your pipeline:
from pd_helper.helper import optimizeif __name__ == "__main__":
# some DataFrame, df
special_mappings = {'string': ['col_1', 'col_2'],
'category': ['col_3', 'col_4']} exclude_cols = ['col_5'] df = (df.pipe(some_other_function)
.pipe(optimize
, special_mappings=special_mappings}
, parse_col_names=True
, exclude_cols=exclude_cols
, enable_mp=True))