Photo by Dan Gold on Unsplash

For those who gets stacked to add a column on DataFrame — Data Science

Nao Kawakami
3 min readFeb 18, 2021

--

Make a list which has same length of the DataFrame

Summary

  • Problem- Add a column on DataFrame without errors
  • Dataset
  • Instruction — Columns must have same length of the DataFrame

Problem

If you are confused of adding columns in DataFrame, always think to Make a list.

The list needs to have same length as the DataFrame, and if you want to refer to an existed column for a new column, be noticed what functions work with pd.Series.

Dataset

You can download the dataset from my repo

# create datasetimport pandas as pd
import numpy as np
np.random.seed(42)
letters = [l for l in 'abcdefghijklmnopqrstuvwxyz']
numbers = [n for n in range(100)]
practice = pd.DataFrame()
practice['num'] = [np.random.choice(numbers) for i in range(100)]
practice['let'] = [np.random.choice(letters) for i in range(100)]
Simply stores integers and alphabets randomly

Instruction

Make a list which has same length of the DataFrame.

Challenge 1 (Success)

Now I want to make a column which starts with a followed by let column.

# Make a list which starts with `a` followed by `let`a = ['a'+l for l in practice['let']]
print(len(a))
a[:5]
Length of the column is 100. This list is available to add

This list has length of 100. You can add this to the DaraFrame.

practice['a'] = a
practice.head()
Column `a` added successfully

Challenge 2 (Fail)

I want to make a column which stores high where num is over 50

# List which stores 'high' where num column is over 50high = ['high' for num in practice['num'] if num > 50]print(len(high))
high[:5]
Length of 55.

You cannot add this list because the length is 55.

# This is error
practice['high'] = high
The list length is 55 but DataFrame length is 100. This caused an error

Challenge 2 (Success)

high = ['high' if num > 50 else 'low' for num in practice['num']]
practice['high'] = high
print(len(high))
high[:5]
Made it 100 long by adding `low` where `num` is or under 50

This is success because I added values when num is equal to or lower than 50 this time. This list has length of 100.

Successfully added `high` column

Challenge 3

I want to add a column which has capitalized let.

practice['cap'] = practice['let'].upper()
Caused an error by using `upper()` function on `pd.Series`. This is not appropriate

This is failed because upper() function does not work for pd.Series.

practice['cap'] = [l.upper() for l in practice['let']]

So now I applied upper() on each letter one by one in let column using list comprehension.

Successfully added capitalized `let` column

Note

There is a smarter way to handle above challenge. However, when you get stacked, think back simply which means like a column is a list

--

--