In this article, We will get very handy Python Pandas DataFrame Notes. We all know Python pandas is one of the most important library in python programming used to manage large and complex data using various data structures such as Panel, Series, Dataframes. This article titled ‘ python pandas dataframe notes ‘ talks about how to declare and manage python pandas dataframe with examples.
Please refer following table of content listed all the topics for python pandas dataframe notes.
Contents
- 1 What is dataframe?
- 2 Features of dataframe
- 3 Creating dataframe
- 4 Creating Dataframe using List
- 5 Creating Dataframe using Series
- 5.0.1 Exp-6: Create dataframe passing single Series object
- 5.0.2 Exp-7: Create dataframe passing single Series object within list (as list item)
- 5.0.3 Exp-8: Create dataframe passing multiple Series object
- 5.0.4 Exp-9: Create dataframe passing multiple Series object (with different set of index labels)
- 6 Creating Dataframe using Dictionary
- 6.0.1 Exp-10: Create dataframe passing dictionary with scalar (single) value
- 6.0.2 Exp-11: Create dataframe passing dictionary having values as list
- 6.0.3 Exp-12: Create dataframe passing dictionary of list with own index
- 6.0.4 Exp-13: Create a dataframe passing list of Series
- 6.0.5 Exp-14: Create a dataframe passing list of Series
- 6.0.6 Exp-15: Create a dataframe passing list of Series of different length with different labeled indexes
- 7 Creating dataframe using dictionary of dictionary (nested dictionary)
- 8 Creating dataframe using numpy Array
- 9 Creating dataframe from another Dataframe object
- 10 Dataframe Attributes
- 11 Objective Type Questions | MCQ Exercise on Python Pandas Dataframe
What is dataframe?
- It is 2D (Two Dimensional) data structure.
- Used to manage large and complex data in tabular format
- It contains both rows and columns and hence have both row and column indexes
- Most commonly used data structure similar to spreadsheet.
Features of dataframe
- It can store any type (heterogeneous) of data
- It is size mutable
- It is value mutable
- Both indexes can be labelled
- Indexes may constitute any type of value such as number, string, character, Boolean value
- Index of dataframe can also be referred as ‘axis’. axis = 0 refers to row index and axis = 1 refers to column index
Creating dataframe
- While creating dataframe we should remember following points:
- We must include python pandas library in our program.
- DataFrame() method of pandas library is used to create dataframe.
- Dataframe can accept data from
- List
- Dictionary
- Tuple
- String
- Series
- Another dataframe
- Numpy array
Syntax for creating dataframe
Import pandas as <pandas object>
<dataframe object> = <pandas object>.DataFrame(data, index, columns, dtype)
In the above syntax, arguments we used are-
- Data – Values to be passed in dataframe. It can be any collection such as list, nparray, dictionary, series etc.
- Index – Used to label index for rows. It is optional, and if not passed than numbers from 0 to n-1 is assigned to rows
- Columns – Used to label index for columns. It is optional, and if not passed than numbers from 0 to n-1 is assigned to each column.
- Dtype – used to define data type for columns. It is option and if not defined, None is applied.
– Index and column are positional independent
– The no of index values to be passed in index sequence must match the length of data to be passed
Exp-1: Dataframe with default indexes
import pandas as pd apple = [10,20,30,40] banana = [23,34,45,54] df = pd.DataFrame([apple,banana]) print(df)
Output: 0 1 2 3 0 10 20 30 40 1 23 34 45 54
– Index values will be generated automatically
Exp-2: Dataframe with labelled Indexes
import pandas as pd apple = [10,20,30,40] banana = [23,34,45,54] df = pd.DataFrame([apple,banana],index=['apple','banana'], columns = ['Jan','Feb','Mar','Apr']) print(df)
output: Jan Feb Mar Apr apple 10 20 30 40 banana 23 34 45 54
Creating Dataframe using List
- We can create dataframe by passing list as data
- If the lists of different length is passed in dataframe , than NaN or missing value is assigned to its corresponding column.
Exp-3: Create Dataframe using list with default indexes
import pandas as pd apple = [10,20,30,40] banana = [23,34,45,54] df = pd.DataFrame([apple,banana]) print(df)
Output: 0 1 2 3 0 10 20 30 40 1 23 34 45 54
Exp-4 Create Dataframe using list with labelled indexes
import pandas as pd apple = [10,20,30,40] banana = [23,34,45,54] df = pd.DataFrame([apple,banana],index=['apple','banana'], columns = ['Jan','Feb','Mar','Apr']) print(df)
output: Jan Feb Mar Apr apple 10 20 30 40 banana 23 34 45 54
Exp-5: Create Dataframe using multiple list of different length
import pandas as pd apple = [10,20,30,40] banana = [23,34,45,54] orange = [43,31,21,12,19] rw =['apple','banana','orange'] cl = ['Jan','Feb','Mar','Apr','may'] df = pd.DataFrame([apple,banana,orange],index=rw, columns =cl) print(df)
output: Jan Feb Mar Apr may apple 10 20 30 40 NaN banana 23 34 45 54 NaN orange 43 31 21 12 19.0
– NaN is automatically inserted if no matching value is found for columns.
– The length of index value being passed must be equal to length of column and row with largest length.
– If length of index and length of column/row does not match, value error is generated
Creating Dataframe using Series
- We can create dataframe using Series as data.
- Dataframe is 2D representation of Series object.
- When different Series objects are represented as rows and columns, it forms a dataframe.
Exp-6: Create dataframe passing single Series object
Import pandas as pd S = pd.Series([‘10’,20’,’30’],index= [‘a’,’b’,’c’]) df=pd.DataFrame(S) print(df)
Output: 0 a 10 b 20 c 30
– When we pass single Series directly (not as list element) as data, values of Series forms rows of dataframe.
Exp-7: Create dataframe passing single Series object within list (as list item)
Import pandas as pd S = pd.Series([‘10’,20’,’30’],index= [‘a’,’b’,’c’]) df=pd.DataFrame(S) print(df)
Output: a b c 0 10 20 30
– When we pass single Series within list, values of Series forms columns of dataframe.
Exp-8: Create dataframe passing multiple Series object
Import pandas as pd S1 = pd.Series([10,20,30], index=[‘jan’,’feb’,’mar’]) S1 = pd.Series([18,12,34], index=[‘jan’,’feb’,’mar’]) Df = pd.DataFrame([S1, S2], index=[‘sale1’,’sale2’]) Print(df)
Output: Jan feb mar Sale1 10 20 30 Sale2 18 12 34
– Index label of series object becomes column labels of dataframe
– Each series becomes rows of dataframe
Exp-9: Create dataframe passing multiple Series object (with different set of index labels)
Import pandas as pd S1 = pd.Series([10,20,30], index=[‘jan’,’feb’,’mar’]) S1 = pd.Series([18,12,34], index=[‘jan’,’mar’,’apr’]) Df = pd.DataFrame([S1, S2], index=[‘sale1’,’sale2’]) Print(df)
Output: Jan feb mar apr sale1 12.0 23.0 34.0 NaN sale2 12.0 NaN 23.0 34.0
– Total number of columns is equal to all distinct keys of Series objects
– NaN value is inserted at the place where corresponding value for the label of series is not matched.
Creating Dataframe using Dictionary
- Dictionary can also be passed as data to create dataframe
- By default, keys of dictionary are taken as column labels of dataframe
- Values of dictionary are taken as input data of dataframe
- We can specify our own indexes for dataframe using dictionary as input data
Exp-10: Create dataframe passing dictionary with scalar (single) value
Import pandas as pd d={'a':12,'b':21} df=pd.DataFrame([d]) print(df)
Output: a b 0 12 21
– If you pass dictionary object directly (not as list item) like ‘pd.DataFrame(d)’ will produce value error (for passing scalar value, index must be passed)
Exp-11: Create dataframe passing dictionary having values as list
Import pandas as pd d = {‘state’: [‘UP’,’MP’,’AP’],’centers’:[122,78,54],’PCs’:[2034,1506,1243]} df = pd.DataFrame(d) print(df)
Output: state centers PCs 0 UP 122 2034 1 MP 78 1506 2 AP 54 1243
– Values of all the keys must be of same structure and length.
Exp-12: Create dataframe passing dictionary of list with own index
Import pandas as pd import pandas as pd d = {‘state’: [‘UP’,’MP’,’AP’],’centers’:[122,78,54],’PCs’:[2034,1506,1243]} df = pd.DataFrame(d,[‘uttarpradesh’,’madhyapradesh’,’andhrapradesh’]) print(df)
Output: state centers PCs uttarpradesh UP 122 2034 madhyapradesh MP 78 1506 andhrapradesh AP 54 1243
– The no of indexes being passed must be equal to the length of index of dictionary’s values, otherwise python gives Value Error.
– Values of all the keys must be of same structure and length.
Exp-13: Create a dataframe passing list of Series
import pandas as pd S= pd.Series(['kabaddi','kho kho','volly ball']) P = pd.Series([7, 12,6]) d={'Sport':S,'Players':P} df = pd.DataFrame(d) print(df)
Output: Sport Players 0 kabaddi 7 1 kho kho 12 2 volly ball 6
Exp-14: Create a dataframe passing list of Series
Import pandas as pd iimport pandas as pd S= pd.Series(['kabaddi','kho kho','volly ball'],index=['r1','r2','r3']) P = pd.Series([7, 12,6],index=['r1','r2','r3']) d={'Sport':S,'Players':P} df = pd.DataFrame(d)
Output: Sport Players r1 kabaddi 7 r2 kho kho 12 r3 volly ball 6
Exp-15: Create a dataframe passing list of Series of different length with different labeled indexes
import pandas as pd S= pd.Series(['kabaddi','kho kho','volly ball','Base Ball'],index=['r1','r2','r3','r4']) P = pd.Series([7, 12,6],index=['r1','r2','r5']) d={'Sport':S,'Players':P} df = pd.DataFrame(d) print(df)
Output: Sport Players r1 kabaddi 7.0 r2 kho kho 12.0 r3 volly ball NaN r4 Base Ball NaN r5 NaN 6.0
– The resulting row labels or indexes are union of all indexes of Series used to create dataframe.
– Every column in dataframe is a series.
– NaN value is automatically inserted at the missing place.
Creating dataframe using dictionary of dictionary (nested dictionary)
- We can also create dataframe object using 2D dictionary having values a dictionary object
- The keys of inner dictionaries make the indexes or row labels and keys of outer dictionaries make the column labels.
- If inner dictionaries have non matching keys, then resulting indexes of dataframe are union of all inner keys.
- If a key has no matching key in other dictionaries, then NaN is automatically inserted at missing place.
Exp-16: creating dataframe using 2d dictionary having values as dictionary
import pandas as pd d={'manager':{'Generation':'S K Singh','Operation':'M L Khanna'},'enginner':{'Generation':'S N Mandal','Operation':'B K Shukla'}} df = pd.DataFrame(d) print(df)
Output: manager enginner Generation S K Singh S N Mandal Operation M L Khanna B K Shukla
Exp-17: creating dataframe using 2d dictionary having values as dictionary with non-matching keys
import pandas as pd d={'manager':{'Generation':'S K Singh','Operation':'M L Khanna'}, 'enginner':{'Generation':'S N Mandal','maintenence':'B K Shukla'}} df = pd.DataFrame(d) print(df)
Output: manager enginner Generation S K Singh S N Mandal Operation M L Khanna NaN maintenence NaN B K Shukla
Creating dataframe using numpy Array
Exp-17:
ar1 = [12,23,34] ar2 = [23,34,54,60] df = pd.DataFrame([ar1,ar2],columns = ['a','b','c','d'], index = ['r1','r2']) print(df)
Output: a b c d r1 12 23 34 NaN r2 23 34 54 60.0
Creating dataframe from another Dataframe object
Exp-18:
import pandas as pd S= pd.Series(['kabaddi','kho kho','volly ball'],index=['r1','r2','r3']) P = pd.Series([7, 12,6],index=['r1','r2','r3']) d={'sports':S,'player':P} df = pd.DataFrame(d) df_dup=pd.DataFrame(df) print(df_dup)
Output: sports player r1 kabaddi 7 r2 kho kho 12 r3 volly ball 6
Dataframe Attributes
Attribute refers to properties of dataframe. Using dataframe attribute we can get all kind of information related to it. Following table list all dataframe attributes:
Attributes | Description |
index | Returns row labels of dataframe |
Columns | Returns the column labels of dataframe |
Axes | Returns both row and column indexes |
Size | Returns total no of elements of dataframe including missing values |
Shape | Returns total no of rows and columns of dataframe as tuple |
Values | Return dataframe as numpy array |
Empty | Returns true if dataframe is empty |
T | Transpose dataframe’s index and columns |
Objective Type Questions | MCQ Exercise on Python Pandas Dataframe
Click Here to view Dataframe MCQ Set-1 (Q1-Q25)
Click Here to view Dataframe MCQ Set-1 (Q25-Q50)
Is this the complete notes for dataframe?
Nop..Yet to upload..will be upload entirely within couple of days..Thanx!
can we use transpose attribute in series
No..Series is 1D and hence transposing doesn’t make any sense there.
Thanks..Keep Learning!!
i really aprreciate the work you are doing to help students free of cost we really need more teachers like you
Thanks for appreciating.
Best Wishes!!
thank you sir