Contents
- 1 What is a Series and how is it different from a 1-D array, a list and a dictionary?
- 2 What is a DataFrame and how is it different from a 2-D array?
- 3 How are DataFrames related to Series?
- 4 What do you understand by the size of (i) a Series, (ii) a DataFrame?
- 5 Create the following Series and do the specified operations:
- 5.1 a) Anaglyph, having 26 elements with the alphabets as values and default index values.
- 5.2 b) Vowels, having 5 elements with index labels ‘a’, ‘e’, ‘i’, ‘o’ and ‘u’ and all the five values set to zero. Check if it is an empty series.
- 5.3 c) Friends, from a dictionary having roll numbers of five of your friends as data and their first name as keys.
- 5.4 d) MT series, an empty Series. Check if it is an empty series.
- 5.5 e) MonthDays, from a numpy array having the number of days in the 12 months of a year. The labels should be the month numbers from 1 to 12.
- 6 Using the Series created in Question 5, write commands for the following:
- 6.1 a) Set all the values of Vowels to 10 and display the Series.
- 6.2 b) Divide all values of Vowels by 2 and display the Series.
- 6.3 c) Create another series Vowels1 having 5 elements with index labels ‘a’, ‘e’, ‘i’, ‘o’ and ‘u’ having values [2,5,6,3,8] respectively.
- 6.4 d) Add Vowels and Vowels1 and assign the result to Vowels3.
- 6.5 e) Subtract, Multiply and Divide Vowels by Vowels1.
- 6.6 f) Alter the labels of Vowels1 to [‘A’, ‘E’, ‘I’, ‘O’, ‘U’].
- 7 7. Using the Series created in Question 5, write commands for the following:
- 7.1 a) Find the dimensions, size and values of the Series EngAlph, Vowels, Friends, MTseries, and MonthDays
- 7.2 b) Rename the Series MTseries as SeriesEmpty.
- 7.3 c) Name the index of the Series MonthDays as monthno and that of Series Friends as Fname.
- 7.4 d) Display the 3rd and 2nd value of the Series Friends, in that order.
- 7.5 e) Display the alphabets ‘e’ to ‘p’ from the Series EngAlph.
- 7.6 f) Display the first 10 values in the Series EngAlph.
- 7.7 g) Display the last 10 values in the Series EngAlph.
- 7.8 h) Display the MTseries
- 8 8. Using the Series created in Question 5, write commands for the following:
- 9 9. Create the following DataFrame Sales containing year wise sales figures for five sales persons in INR. Use the years as column labels and sales person names as row labels.
- 10 10. Use the DataFrame created in question 9 above to do the following:
- 10.1 a) Display the row labels of Sales
- 10.2 b) Display the column labels of Sales
- 10.3 c) Display the data types of each column of Sales
- 10.4 d) Display the dimensions, shape, size and values of Sales
- 10.5 e) Display the last two rows of Sales
- 10.6 f) Display the first two columns of Sales
- 10.7 g) Create a dictionary using the following data. Use this dictionary to create a DataFrame Sales2.
- 11 11. Use the DataFrame created in Question 9 above to do the following:
- 11.1 a) Append the DataFrame Sales2 to the DataFrame Sales.
- 11.2 b) Change the DataFrame Sales such that it becomes its transpose
- 11.3 c) Display the sales made by all sales persons in the year 2017.
- 11.4 d) Display the sales made by Madhu and Ankit in the year 2017 and 2018.
- 11.5 e) Display the sales made by Shruti 2016.
- 11.6 f) Add data to Sales for salesman Sumeet where the sales made are [196.2, 37800, 52000, 78438, 38852] in the years [2014, 2015, 2016, 2017, 2018] respectively.
- 11.7 g) Delete the data for the year 2014 from the DataFrame Sales.
- 11.8 h) Delete the data for sales man Kinshuk from the DataFrame Sales.
- 11.9 i) Change the name of the salesperson Ankit to Vivaan and Madhu to Shailesh.
- 11.10 j) Update the sale made by Shailesh in 2018 to 100000.
- 11.11 k) Write the values of DataFrame Sales to a comma separated file SalesFigures.csv on the disk. Do not write the row labels and column labels.
- 11.12 l) Read the data in the file SalesFigures.csv into a DataFrame SalesRetrieved and Display it. Now update the row labels and column labels of SalesRetrieved to be the same as that of Sales.
What is a Series and how is it different from a 1-D array, a list and a dictionary?
- Series is a one dimensional data structure present in python pandas library.
- It can contain a sequence of homogeneous value of any data type like int, float, char etc.
- It is value mutable but size immutable.
- All elements of Series are associated with a data labels called index.
- Values of other data types can also be assigned as index
Following table comparison shows how series is different from 1-D array, list and a dictionary.
Series | 1-D array | List | Dictionary |
Contains homogeneous data | Contains homogeneous data | Can contains heterogeneous data | Can contains heterogeneous data |
Default indexing begins with numerical value 0. | Default indexing begins with numerical value 0. | Default indexing begins with numerical value 0. | Each value is associated with a key value defined manually |
Values of other data types can also be assigned as index | Values of other data types cannot be assigned as index | Values of other data types cannot be assigned as index | Key value is treated as index which can contain any type of value. |
Size immutable | Size immutable | Size mutable | Size mutable |
Mathematical operations can be performed directly | Mathematical operations can be performed directly | Mathematical operations cannot be performed directly | Mathematical operations cannot be performed directly |
What is a DataFrame and how is it different from a 2-D array?
- A DataFrame is a two dimensional data structure present in python pandas library.
- It can contain heterogeneous data in tabular format like spreadsheet or table in MySQL.
- It is both value and size mutable.
- It is a labelled data structure where both rows and columns are indexed
- Values of other data types can also be assigned as index for rows and columns
Following table comparison shows how DataFrame differs from 2-D array:
DataFrame | 2-D array |
DataFrame have default numerical index that can be labelled with any other type of values | 2D array have default numerical index that cannot be labelled with other type of values |
DataFrame can stores heterogeneous data | 2D array stores homogeneous data |
DataFrame can deal with dynamic data and mixed data types | 2d array better deal with numerical data type |
Dataframe is size mutable | 2D array is size immutable |
DataFrame is related to Series as:
- Both are data structure of python pandas library
- Dataframe can be created from Series
- Both dataframe and series can be labelled with values of different types
- Both dataframe and series can deal with dynamic data and mixed datatypes.
What do you understand by the size of (i) a Series, (ii) a DataFrame?
(i) size of a Series refers to total no of elements present in a Series. Consider the following example:
s = pd.Series([2,5,6,np.NaN,8])
print(s.size)
output:
5
(ii) size of a DataFrame refers to total number of elements of DataFrame which is product of rows and columns. Consider the following example:
df = pd.DataFrame({‘a’:[4,np.NaN,7],’b’:[6,2,np.NaN]})
print(df.size)
output:
6
Create the following Series and do the specified operations:
a) Anaglyph, having 26 elements with the alphabets as values and default index values.
import pandas as pd
Anaglyph = pd.Series(chr(i) for i in range(97,123))
print(Anaglyph)
b) Vowels, having 5 elements with index labels ‘a’, ‘e’, ‘i’, ‘o’ and ‘u’ and all the five values set to zero. Check if it is an empty series.
Import pandas as pd
Vowels = pd.Series( 0, [‘a’,’e’,’i’,’o’,’u’])
print(Vowels)
if S.empty:
print(“Empty Series”)
else:
Print(“Series is not empty”)
c) Friends, from a dictionary having roll numbers of five of your friends as data and their first name as keys.
import pandas as pd
Friends = pd.Series({‘ram’:1,’hari’:2,’raheem’:3,’kabir’:4,’rasool’:5})
print(Friends)
d) MT series, an empty Series. Check if it is an empty series.
import pandas as pd
MT = pd.Series()
if S.empty:
print(“Empty Series”)
else:
print(“Series is not empty”)
e) MonthDays, from a numpy array having the number of days in the 12 months of a year. The labels should be the month numbers from 1 to 12.
import pandas as pd
import Numpy as np
MonthDays = pd.Series(np.array([31,28,31,30,31,30,31,31,30,31,30,31]),range(1,13))
print(MonthDays)
Using the Series created in Question 5, write commands for the following:
a) Set all the values of Vowels to 10 and display the Series.
Vowels[:] = 10
print(Vowels)
b) Divide all values of Vowels by 2 and display the Series.
Vowels = Vowels/2
print(Vowels)
c) Create another series Vowels1 having 5 elements with index labels ‘a’, ‘e’, ‘i’, ‘o’ and ‘u’ having values [2,5,6,3,8] respectively.
import pandas as pd
Vowels1 = pd.Series([2,5,6,3,8],[‘a’,’e’,’i’,’o’,’u’])
print(Vowels1)
d) Add Vowels and Vowels1 and assign the result to Vowels3.
import pandas as pd
Vowels = pd.Series(0,[‘a’,’e’,’i’,’o’,’u’])
Vowels1 = pd.Series([2,5,6,3,8],[‘a’,’e’,’i’,’o’,’u’])
Vowels3 = Vowels + Vowels1
print(Vowels3)
e) Subtract, Multiply and Divide Vowels by Vowels1.
print(Vowels1 – Vowels)
print(Vowels1 *Vowels)
print(Vowels/ Vowels1)
f) Alter the labels of Vowels1 to [‘A’, ‘E’, ‘I’, ‘O’, ‘U’].
vowels1.index = [‘A’,’E’,’I’,’O’,’U’]
print(vowels1)
7. Using the Series created in Question 5, write commands for the following:
a) Find the dimensions, size and values of the Series EngAlph, Vowels, Friends, MTseries, and MonthDays
To find the dimensions, size and values of the Series object we can use shape, size and values attributes respectively as given below:
print(“Dimension,size and values of EngAlph”, EngAlph.shape, EngAlph.size, EngAlph.values)
print(“Dimension,size and values of Vowels”, Vowels.shape, Vowels.size, Vowels.values)
print(“Dimension,size and values of MTseries”, MTseries.shape, MTseries.size, MTseries.values)
print(“Dimension,size and values of MonthDays”, MonthDays.shape, MonthDays.size, MonthDays.values)
b) Rename the Series MTseries as SeriesEmpty.
We can rename Series MTseries as SeriesEmpty using name property as given below:
MTseries.name = ‘SeriesEmpty”
c) Name the index of the Series MonthDays as monthno and that of Series Friends as Fname.
To name the index of the MonthDays as monthno we can write:
MonthDays.index.name = “monthno”
And to name the index of the Friends as Fname we can write:
Friends.index.name = “fname”
d) Display the 3rd and 2nd value of the Series Friends, in that order.
We can display the 3rd and 2nd value of the Series Friends in that order in two ways as given below:
Using Index:
print(“3rd and 2nd value of the Series Friends are”, Friends[2], “ “, Friends[1])
Using Slice:
print(“3rd and 2nd value of the Series Friends are”, Friends[2:0:-1])
e) Display the alphabets ‘e’ to ‘p’ from the Series EngAlph.
To display the alphabets ‘e’ to ‘p’ from the Series EngAlph, we can write:
print(EngAlph[4:16])
f) Display the first 10 values in the Series EngAlph.
We can display the first 10 values in the Series EngAlph in following ways:
print(EngAlph.head(10))
OR
print(EngAlph[:10])
g) Display the last 10 values in the Series EngAlph.
We can display the last 10 values in the Series EngAlph as:
print(EngAlph.tail(10))
h) Display the MTseries
print(MTseries)
8. Using the Series created in Question 5, write commands for the following:
a) Display the names of the months 3 through 7 from the Seies MonthDays.
Print(MonthDays[2:7])
b) Display the Series MonthDays in reverse order.
print(MonthDays[::-1])
9. Create the following DataFrame Sales containing year wise sales figures for five sales persons in INR. Use the years as column labels and sales person names as row labels.
We can create dataframe ‘sales’ in various was as given below:
Using 2D dictionary:
Import pandas as pd
D = {2014:[100.5,150.8,200.9,30000,40000],2015:[12000,18000,22000,30000,45000],2016:[20000,50000,70000,100000,125000],2017:[50000,60000, 70000, 80000, 90000]}
Sales= pd.DataFrame(D, index = [‘Madhu’,’Kusum’,’Kinshuk’,’Ankit’, ‘Shruti’])
Using 2D dictionary having values as dictionary objects:
Import pandas as pd
D = {2014:{‘madhu’:100.5, ‘kusum’:150.8,’kinshuk’:200.9,’ankit’:30000, ‘shruti’:40000}, 2015:{‘madhu’:12000, ‘kusum’:18000,’kinshuk’:22000,’ankit’:30000, ‘shruti’:45000}, 2016:{‘madhu’:20000, ‘kusum’:60000,’kinshuk’:70000,’ankit’:100000, ‘shruti’:125000},2017:{‘madhu’:50000, ‘kusum’:60000,’kinshuk’:70000,’ankit’:80000, ‘shruti’:90000} }
Sales= pd.DataFrame(D)
10. Use the DataFrame created in question 9 above to do the following:
a) Display the row labels of Sales
sales.index
b) Display the column labels of Sales
sales.columns
c) Display the data types of each column of Sales
sales.dtypes
d) Display the dimensions, shape, size and values of Sales
we can use shape, size and values attributes of dataframe to display dimension, size and values as given below:
Print(“Dimension,size and values of Sales”, Sales.shape, sales.size, sales.values)
e) Display the last two rows of Sales
print(sales.tail(2))
f) Display the first two columns of Sales
print(sales.iloc[:,:2]
g) Create a dictionary using the following data. Use this dictionary to create a DataFrame Sales2.
import pandas as pd
D = {2018 :{ ‘madhu’:160000, ‘kusum’:110000,’kinshuk’:500000,’ankit’:340000, ‘shruti’:900000}}
Sales2 = pd.Dataframe(D)
OR
import pandas as pd
D = {2018:[160000,110000,500000,340000,900000]}
Sales2 = pd.DataFrame(D, index = [‘madhu’,’kusum’,’kinshuk’,’ankit’,’shruti’])
h) Check if Sales2 is empty or it contains data
if sales2.empty:
print(‘sales2 is empty’)
else:
print(‘it contains data’)
11. Use the DataFrame created in Question 9 above to do the following:
a) Append the DataFrame Sales2 to the DataFrame Sales.
In earlier versions of python append() method were used to merge two dataframes as given below:
Sales = sales.append(sales2)
But now a days in python recent versions append() method is deprecated and instead of append() now concat() is used to merge or join two dataframes as given below:
Sales = pd.concat([sales,sales2], axis = 0)
b) Change the DataFrame Sales such that it becomes its transpose
print(sales.T)
c) Display the sales made by all sales persons in the year 2017.
print(sales[2017])
OR
print(sales.loc[:,2017])
d) Display the sales made by Madhu and Ankit in the year 2017 and 2018.
df.loc[[‘Ankit’,’Kusum’],2017:]
e) Display the sales made by Shruti 2016.
df.at[‘Shruti’,2016]
OR
df[2016][‘Shruti’]
OR
df.loc[‘Shruti’,2016]
f) Add data to Sales for salesman Sumeet where the sales made are [196.2, 37800, 52000, 78438, 38852] in the years [2014, 2015, 2016, 2017, 2018] respectively.
Df.loc[‘Sumit’,:] = [196.2,37800, 52000, 78438, 38852]
g) Delete the data for the year 2014 from the DataFrame Sales.
Del df[2014]
OR
Df = Df.drop([2014],axis = 1)
h) Delete the data for sales man Kinshuk from the DataFrame Sales.
Df = Df.drop(‘Kinshuk’)
i) Change the name of the salesperson Ankit to Vivaan and Madhu to Shailesh.
Df.rename(index = {‘Ankit’:’Vivaan’, ‘Madhu’:’Shailesh’}, Inplace = True)
j) Update the sale made by Shailesh in 2018 to 100000.
Df[2018][‘Shailesh’]=100000
OR
Df.loc[‘Shailesh’,2018]=100000
k) Write the values of DataFrame Sales to a comma separated file SalesFigures.csv on the disk. Do not write the row labels and column labels.
Sales.to_csv(‘e:\\programs\\python\\SalesFigures.csv’, header = False, index = False)
l) Read the data in the file SalesFigures.csv into a DataFrame SalesRetrieved and Display it. Now update the row labels and column labels of SalesRetrieved to be the same as that of Sales.
SalesRetrieved = pd.read_csv(‘e:\\programs\\python\\SalesFigures.csv’,names = [2014,2015,2016,2-17,2018])
salesRetrieved.rename(index = {0:’Madhu’,1:’Kusum’,2:’Kinshuk’,3:’Ankit’,4:’Shruti’,5:’Sumeet’}, inplace = True)