Python Pandas Dataframe Notes | Class 12 IP Dataframe

python-pandas-dataframe-notes

In this article, We will get very handy Python Pandas DataFrame Notes. We all know Python pandas is one of the most important library in python programming used to manage large and complex data using various data structures such as Panel, Series, Dataframes. This article titled ‘ python pandas dataframe notes ‘ talks about how to declare and manage python pandas dataframe with examples.
Please refer following table of content listed all the topics for python pandas dataframe notes.

Contents

What is dataframe?

  • It is 2D (Two Dimensional) data structure.
  • Used to manage large and complex data in tabular format
  • It contains both rows and columns and hence have both row and column indexes
  • Most commonly used data structure similar to spreadsheet.

Features of dataframe

  • It can store any type (heterogeneous) of data
  • It is size mutable
  • It is value mutable
  • Both indexes can be labelled
  • Indexes may constitute any type of value such as number, string, character, Boolean value
  • Index of dataframe can also be referred as ‘axis’. axis = 0 refers to row index and axis = 1 refers to column index

Creating dataframe

  • While creating dataframe we should remember following points:
  • We must include python pandas library in our program.
  • DataFrame() method of pandas library is used to create dataframe.
  • Dataframe can accept data from
    • List
    • Dictionary
    • Tuple
    • String
    • Series
    • Another dataframe
    • Numpy array
Syntax for creating dataframe

Import pandas as <pandas object>
<dataframe object> = <pandas object>.DataFrame(data, index, columns, dtype)

In the above syntax, arguments we used are-

  • Data – Values to be passed in dataframe. It can be any collection such as list, nparray, dictionary, series etc.
  • Index – Used to label index for rows. It is optional, and if not passed than numbers from 0 to n-1 is assigned to rows
  • Columns – Used to label index for columns. It is optional, and if not passed than numbers from 0 to n-1 is assigned to each column.
  • Dtype – used to define data type for columns. It is option and if not defined, None is applied.

Index and column are positional independent
– The no of index values to be passed in index sequence must match the length of data to be passed

Exp-1: Dataframe with default indexes
import pandas as pd
apple = [10,20,30,40]
banana = [23,34,45,54]
df = pd.DataFrame([apple,banana])
print(df)
Output: 
     0   1   2   3
0   10  20  30  40
1   23  34  45  54

– Index values will be generated automatically

Exp-2: Dataframe with labelled Indexes
import pandas as pd
apple = [10,20,30,40]
banana = [23,34,45,54]
df = pd.DataFrame([apple,banana],index=['apple','banana'], columns = ['Jan','Feb','Mar','Apr'])
print(df)
output:
        Jan  	Feb 	Mar  	Apr
apple    10   	20   	30   	40
banana   23  	34   	45   	54

Creating Dataframe using List

  • We can create dataframe by passing list as data
  • If the lists of different length is passed in dataframe , than NaN or missing value is assigned to its corresponding column.
Exp-3: Create Dataframe using list with default indexes
import pandas as pd
apple = [10,20,30,40]
banana = [23,34,45,54]
df = pd.DataFrame([apple,banana])
print(df)
Output:
    0    1   2   3
0   10  20  30  40
1   23  34  45  54
Exp-4 Create Dataframe using list with labelled indexes
import pandas as pd
apple = [10,20,30,40]
banana = [23,34,45,54]
df = pd.DataFrame([apple,banana],index=['apple','banana'], columns = ['Jan','Feb','Mar','Apr'])
print(df)
output:
        Jan  	Feb 	Mar  	Apr
apple    10   	20   	30   	40
banana   23  	 34   	45   	54
Exp-5: Create Dataframe using multiple list of different length
import pandas as pd
apple = [10,20,30,40]
banana = [23,34,45,54]
orange = [43,31,21,12,19]
rw =['apple','banana','orange']
cl = ['Jan','Feb','Mar','Apr','may']
df = pd.DataFrame([apple,banana,orange],index=rw, columns =cl)
print(df)
output:
        Jan  	Feb 	 Mar  	Apr  	 may
apple    10   	20   	30  	40  	NaN
banana   23  	34   	45   	54   	NaN
orange   43   	31   	21   	12  	19.0

NaN is automatically inserted  if no matching value is found for columns.
– The length of index value being passed must be equal to length of column and row with largest length.
– If length of index and length of column/row does not match, value error is generated

Creating Dataframe using Series

  • We can create dataframe using Series as data.
  • Dataframe is 2D representation of Series object.
  • When different Series objects are represented as rows and columns, it forms a dataframe.
Exp-6: Create dataframe passing single Series object
Import pandas as pd
S = pd.Series([‘10’,20’,’30’],index= [‘a’,’b’,’c’])
df=pd.DataFrame(S)
print(df)
Output:
   0
a  10
b  20
c  30

– When we pass single Series directly (not as list element) as data, values of Series forms rows of dataframe.

Exp-7: Create dataframe passing single Series object within list (as list item)
Import pandas as pd
S = pd.Series([‘10’,20’,’30’],index= [‘a’,’b’,’c’])
df=pd.DataFrame(S)
print(df)
Output:
   	a	b	c
0	10	20	30

When we pass single Series within list, values of Series forms columns of dataframe.

Exp-8: Create dataframe passing multiple Series object
Import pandas as pd
S1 = pd.Series([10,20,30], index=[‘jan’,’feb’,’mar’])
S1 = pd.Series([18,12,34], index=[‘jan’,’feb’,’mar’])
Df = pd.DataFrame([S1, S2], index=[‘sale1’,’sale2’])
Print(df)
Output:
        Jan	feb	mar
Sale1	10	20	30
Sale2	18	12	34

Index label of series object becomes column labels of dataframe
Each series becomes rows of dataframe

Exp-9: Create dataframe passing multiple Series object (with different set of index labels)
Import pandas as pd
S1 = pd.Series([10,20,30], index=[‘jan’,’feb’,’mar’])
S1 = pd.Series([18,12,34], index=[‘jan’,’mar’,’apr’])
Df = pd.DataFrame([S1, S2], index=[‘sale1’,’sale2’])
Print(df)
Output:
       Jan	feb	mar	apr
sale1  	12.0  	23.0  	34.0   	NaN
sale2  	12.0   	NaN  	23.0  	34.0

– Total number of columns is equal to all distinct keys of Series objects
– NaN value is inserted at the place where corresponding value for the label of series is not matched.

Creating Dataframe using Dictionary

  • Dictionary can also be passed as data to create dataframe
  • By default, keys of dictionary are taken as column labels of dataframe
  • Values of dictionary are taken as input data of dataframe
  • We can specify our own indexes for dataframe using dictionary as input data
Exp-10: Create dataframe passing dictionary with scalar (single) value
Import pandas as pd
d={'a':12,'b':21}
df=pd.DataFrame([d])
print(df)
Output:
   a	b
0  12	21

If you pass dictionary object directly (not as list item) like ‘pd.DataFrame(d)’ will produce value error (for passing scalar value, index must be passed)

Exp-11: Create dataframe passing dictionary having values as list
Import pandas as pd
d = {‘state’: [‘UP’,’MP’,’AP’],’centers’:[122,78,54],’PCs’:[2034,1506,1243]}
df = pd.DataFrame(d)
print(df)
Output:
     state  centers   PCs
0    UP     122       2034
1    MP     78        1506
2    AP     54        1243

– Values of all the keys must be of same structure and length.

Exp-12: Create dataframe passing dictionary of list with own index
Import pandas as pd
import pandas as pd
d = {‘state’: [‘UP’,’MP’,’AP’],’centers’:[122,78,54],’PCs’:[2034,1506,1243]}
df = pd.DataFrame(d,[‘uttarpradesh’,’madhyapradesh’,’andhrapradesh’])
print(df)
Output:
		state  	centers  PCs
uttarpradesh     UP     122 	2034
madhyapradesh    MP     78  	1506
andhrapradesh    AP     54  	1243

– The no of indexes being passed must be equal to the length of index of dictionary’s values, otherwise python gives Value Error.
Values of all the keys must be of same structure and length.

Exp-13: Create a dataframe passing list of Series
import pandas as pd
S= pd.Series(['kabaddi','kho kho','volly ball'])
P = pd.Series([7, 12,6])
d={'Sport':S,'Players':P}
df = pd.DataFrame(d)
print(df)
Output:
     Sport  	Players
0    kabaddi    7
1    kho kho    12
2    volly ball 6
Exp-14: Create a dataframe passing list of Series
Import pandas as pd
iimport pandas as pd
S= pd.Series(['kabaddi','kho kho','volly ball'],index=['r1','r2','r3'])
P = pd.Series([7, 12,6],index=['r1','r2','r3'])
d={'Sport':S,'Players':P}
df = pd.DataFrame(d)
Output:
      Sport  	  Players
r1    kabaddi     7
r2    kho kho     12
r3    volly ball  6
Exp-15: Create a dataframe passing list of Series of different length with different labeled indexes
import pandas as pd
S= pd.Series(['kabaddi','kho kho','volly ball','Base Ball'],index=['r1','r2','r3','r4'])
P = pd.Series([7, 12,6],index=['r1','r2','r5'])
d={'Sport':S,'Players':P}
df = pd.DataFrame(d)
print(df)
Output:
        Sport  	    Players
r1     	kabaddi     7.0
r2     	kho kho     12.0
r3  	volly ball  NaN
r4   	Base Ball   NaN
r5      NaN         6.0

The resulting row labels or indexes are union of all indexes of Series used to create dataframe.
Every column in dataframe is a series.  
NaN value is automatically inserted at the missing place.

Creating dataframe using dictionary of dictionary (nested dictionary)

  • We can also create dataframe object using 2D dictionary having values a dictionary object
  • The keys of inner dictionaries make the indexes or row labels and keys of outer dictionaries make the column labels.
  • If inner dictionaries have non matching keys, then resulting indexes of dataframe are union of all inner keys.
  • If a key has no matching key in other dictionaries, then NaN is automatically inserted at missing place.
Exp-16: creating dataframe using 2d dictionary having values as dictionary
import pandas as pd
d={'manager':{'Generation':'S K Singh','Operation':'M L Khanna'},'enginner':{'Generation':'S N Mandal','Operation':'B K Shukla'}}
df = pd.DataFrame(d)
print(df)
Output:
             manager       enginner
Generation   S K Singh     S N Mandal
Operation    M L Khanna    B K Shukla
Exp-17: creating dataframe using 2d dictionary having values as dictionary with non-matching keys
import pandas as pd
d={'manager':{'Generation':'S K Singh','Operation':'M L Khanna'},
   'enginner':{'Generation':'S N Mandal','maintenence':'B K Shukla'}}
df = pd.DataFrame(d)
print(df)
Output:
	      manager       enginner
Generation    S K Singh     S N Mandal
Operation     M L Khanna    NaN
maintenence   NaN  	    B K Shukla

Creating dataframe using numpy Array

Exp-17:
ar1 = [12,23,34]
ar2 = [23,34,54,60]
df = pd.DataFrame([ar1,ar2],columns = ['a','b','c','d'], index = ['r1','r2'])
print(df)
Output:
     a   b   c     d
r1  12  23  34   NaN
r2  23  34  54  60.0

Creating dataframe from another Dataframe object

Exp-18:
import pandas as pd
S= pd.Series(['kabaddi','kho kho','volly ball'],index=['r1','r2','r3'])
P = pd.Series([7, 12,6],index=['r1','r2','r3'])
d={'sports':S,'player':P}
df = pd.DataFrame(d)
df_dup=pd.DataFrame(df)
print(df_dup)
Output:
       sports      player
r1     kabaddi      7
r2     kho kho      12
r3     volly ball   6

Dataframe Attributes

Attribute refers to properties of dataframe.  Using dataframe attribute we can get all kind of information related to it. Following table list all dataframe attributes:

AttributesDescription
indexReturns row labels of dataframe
ColumnsReturns the column labels of dataframe
AxesReturns both row and column indexes
SizeReturns total no of elements of dataframe including missing values
ShapeReturns total no of rows and columns of dataframe as tuple
ValuesReturn dataframe as numpy array
EmptyReturns true if dataframe is empty
TTranspose  dataframe’s index and columns
List of Dataframe attributes

Objective Type Questions | MCQ Exercise on Python Pandas Dataframe

Click Here to view Dataframe MCQ Set-1 (Q1-Q25)

Click Here to view Dataframe MCQ Set-1 (Q25-Q50)

Click Here to view Dataframe MCQ Set-1 (Q51-Q75)

Click Here to view Dataframe MCQ Set-1 (Q76-Q100)

7 thoughts on “Python Pandas Dataframe Notes | Class 12 IP Dataframe”

  1. i really aprreciate the work you are doing to help students free of cost we really need more teachers like you

Leave a Comment

Your email address will not be published. Required fields are marked *

error: Content is protected !!