For those who gets stacked to add a column on DataFrame
— Data Science
Make a list which has same length of the DataFrame
Summary
- Problem- Add a column on
DataFrame
without errors - Dataset
- Instruction — Columns must have same length of the
DataFrame
Problem
If you are confused of adding columns in DataFrame
, always think to Make a list.
The list needs to have same length as the DataFrame
, and if you want to refer to an existed column for a new column, be noticed what functions work with pd.Series.
Dataset
You can download the dataset from my repo
# create datasetimport pandas as pd
import numpy as np
np.random.seed(42)letters = [l for l in 'abcdefghijklmnopqrstuvwxyz']
numbers = [n for n in range(100)]practice = pd.DataFrame()
practice['num'] = [np.random.choice(numbers) for i in range(100)]
practice['let'] = [np.random.choice(letters) for i in range(100)]
Instruction
Make a list which has same length of the DataFrame
.
Challenge 1 (Success)
Now I want to make a column which starts with a
followed by let
column.
# Make a list which starts with `a` followed by `let`a = ['a'+l for l in practice['let']]
print(len(a))
a[:5]
This list has length of 100. You can add this to the DaraFrame
.
practice['a'] = a
practice.head()
Challenge 2 (Fail)
I want to make a column which stores high
where num
is over 50
# List which stores 'high' where num column is over 50high = ['high' for num in practice['num'] if num > 50]print(len(high))
high[:5]
You cannot add this list because the length is 55.
# This is error
practice['high'] = high
Challenge 2 (Success)
high = ['high' if num > 50 else 'low' for num in practice['num']]
practice['high'] = highprint(len(high))
high[:5]
This is success because I added values when num
is equal to or lower than 50 this time. This list has length of 100.
Challenge 3
I want to add a column which has capitalized let
.
practice['cap'] = practice['let'].upper()
This is failed because upper()
function does not work for pd.Series
.
practice['cap'] = [l.upper() for l in practice['let']]
So now I applied upper()
on each letter one by one in let
column using list comprehension.
Note
There is a smarter way to handle above challenge. However, when you get stacked, think back simply which means like a column is a list