Motivation

Data sets can get large quickly.
You can quickly go from looking at:

a few 100 lines and a handful of columns to...
a million lines and with hundred of columns.

Python Pandas (with smart use of Categories) can enable one to reduce the size of ones data in memory by up to 90%.

This repository contains a tutorial and supporting scripts to showcase the power of python pandas with categories.

The tutorial located in the file called:

slides.md

Web Tutorial

The tutorial is hosted here

Link to talk

Outline

In this tutorial, we will:

Learn how Python uses memory with Pandas
How to reduce the Pandas' dataframe memory footprint.
Learn what data types are
Speed up reading in csv files by using categories
Reduce the memory footprint by 90%

In a nut shell

Instead of writing "Sunday","Sunday","Sunday"... Pandas with categories says

"Sunday = 1" and the df =[1,1,1].

uint8 "1" takes a lot less memory than "Sunday"

To convert a column to the category you change the dtype via the follow command.

df['column name'].astype('category')

Slide Deck

The tutorial can be run locally as an html slide deck. To activate the html you need to first set up a server in the main directory.

This can be done via python (version 3).

python -m  http.server

Then open a browser and type the following url http://localhost:8000

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
css		css
fonts		fonts
images		images
src		src
stylesheets		stylesheets
.gitignore		.gitignore
DidactexLogo_344_194.png		DidactexLogo_344_194.png
README.md		README.md
index.html		index.html
slides.md		slides.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Motivation

Web Tutorial

Outline

In a nut shell

Slide Deck

Reference

About

Releases

Packages

Contributors 3

Languages

DiDacTexGit/Talk-ProcessingLargeDatawithPandas

Folders and files

Latest commit

History

Repository files navigation

Motivation

Web Tutorial

Outline

In a nut shell

Slide Deck

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages