Friends Python is one of the most popular programming languages for data analysis. It is easy to learn, powerful, and widely used in industries like finance, healthcare, and technology. If you want to become a data analyst, knowing Python can help you get a great job.
In this article, we will go over some common Python interview questions for data analysts. These questions will test your knowledge of Python basics, data manipulation, and important libraries like Pandas and NumPy. If you are preparing for an interview or just want to improve your skills, this guide will be helpful!
Contents
- 1 Python Interview Questions For Data Analyst
- 1.0.1 1. What are Python’s key features that make it useful for data analysis?
- 1.0.2 2. What is Pandas in Python?
- 1.0.3 3. How do you read a CSV file in Pandas?
- 1.0.4 4. How do you handle missing values in Pandas?
- 1.0.5 5. How do you filter rows in a DataFrame?
- 1.0.6 6. What is NumPy and why is it useful?
- 1.0.7 7. How do you create a NumPy array?
- 1.0.8 8. What is the difference between a list and a NumPy array?
- 1.0.9 9. How do you calculate the mean of a column in Pandas?
- 1.0.10 10. What is the difference between loc and iloc in Pandas?
- 1.0.11 11. How do you merge two DataFrames in Pandas?
- 1.0.12 12. How do you group data in Pandas?
- 1.0.13 13. How do you visualize data using Matplotlib?
- 1.0.14 14. What is the difference between a for loop and a while loop?
- 1.0.15 15. How do you write a function in Python?
- 1.0.16 16. What is the difference between a list and a tuple?
- 1.0.17 17. How do you convert a Pandas DataFrame column to a list?
- 1.0.18 18. How do you check for duplicate values in a DataFrame?
- 1.0.19 19. What is the difference between apply() and map() in Pandas?
- 1.0.20 20. How do you check the data types of each column in a DataFrame?
- 1.0.21 21. How do you change the data type of a column in Pandas?
- 1.0.22 22. How do you sort a DataFrame by a specific column?
- 1.0.23 23. What is a lambda function in Python?
- 1.0.24 24. How do you concatenate two DataFrames in Pandas?
- 1.0.25 25. How do you count unique values in a DataFrame column?
- 1.0.26 26. How do you rename a column in a Pandas DataFrame?
- 1.0.27 27. What is the difference between .iloc[] and .iat[]?
- 1.0.28 28. How do you find the correlation between columns in Pandas?
- 1.0.29 29. How do you create a new column in a DataFrame based on conditions?
- 1.0.30 30. How do you save a DataFrame to a CSV file?
Python Interview Questions For Data Analyst
1. What are Python’s key features that make it useful for data analysis?
Answer: Python is popular for data analysis because it is:
- Easy to read and write
- Open-source and has a large community
- Supports libraries like Pandas, NumPy, and Matplotlib
- Can handle large datasets efficiently
2. What is Pandas in Python?
Answer: Pandas is a library used for data manipulation and analysis. It provides two main data structures:
- Series: A one-dimensional array-like structure.
- DataFrame: A two-dimensional table with rows and columns, similar to an Excel sheet.
3. How do you read a CSV file in Pandas?
Answer:
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head()) # Displays first 5 rows
Explanation: pd.read_csv()
loads data from a CSV file into a Pandas DataFrame.
4. How do you handle missing values in Pandas?
Answer:
df.dropna() # Removes rows with missing values
df.fillna(0) # Fills missing values with 0
Explanation: Missing values can be handled by either removing them (dropna()
) or filling them with a specific value (fillna()
).
5. How do you filter rows in a DataFrame?
Answer:
df_filtered = df[df['Age'] > 30] # Selects rows where Age is greater than 30
Explanation: Use boolean conditions inside square brackets to filter rows.
6. What is NumPy and why is it useful?
Answer: NumPy is a library for numerical computing in Python. It provides:
- ndarray: A fast and efficient multi-dimensional array.
- Functions for mathematical operations (e.g., mean, sum, standard deviation).
7. How do you create a NumPy array?
Answer:
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr)
Explanation: np.array()
creates an array from a Python list.
8. What is the difference between a list and a NumPy array?
Answer:
- Lists are flexible but slow for numerical operations.
- NumPy arrays are faster and use less memory.
- NumPy supports element-wise operations like
arr * 2
, while lists do not.
9. How do you calculate the mean of a column in Pandas?
Answer:
df['Salary'].mean()
Explanation: The mean()
function calculates the average of a column.
10. What is the difference between loc and iloc in Pandas?
Answer:
loc
selects data by label (column/row name).iloc
selects data by index position.
Example:
df.loc[2, 'Salary'] # Selects Salary of row with label 2
df.iloc[2, 1] # Selects value at row index 2, column index 1
11. How do you merge two DataFrames in Pandas?
Answer:
merged_df = pd.merge(df1, df2, on='ID', how='inner')
Explanation: pd.merge()
combines two DataFrames based on a common column.
12. How do you group data in Pandas?
Answer:
df.groupby('Department')['Salary'].mean()
Explanation: groupby()
groups data based on a column and performs aggregate functions like mean()
.
13. How do you visualize data using Matplotlib?
Answer:
import matplotlib.pyplot as plt
df['Salary'].hist()
plt.show()
Explanation: hist()
creates a histogram to show salary distribution.
14. What is the difference between a for loop and a while loop?
Answer:
- For loop runs a fixed number of times.
- While loop runs as long as a condition is true.
Example:
for i in range(5): # Runs 5 times
print(i)
while x < 5: # Runs until x reaches 5
print(x)
x += 1
15. How do you write a function in Python?
Answer:
def add_numbers(a, b):
return a + b
print(add_numbers(3, 5)) # Output: 8
Explanation: def
defines a function, and return
sends back a result.
16. What is the difference between a list and a tuple?
Answer:
- List: Mutable (can be changed), defined using
[]
. - Tuple: Immutable (cannot be changed), defined using
()
.
Example:
my_list = [1, 2, 3] # Can be modified
my_tuple = (1, 2, 3) # Cannot be modified
17. How do you convert a Pandas DataFrame column to a list?
Answer:
my_list = df['ColumnName'].tolist()
Explanation: The tolist()
function converts a column into a list.
18. How do you check for duplicate values in a DataFrame?
Answer:
df.duplicated().sum() # Returns the number of duplicate rows
df.drop_duplicates(inplace=True) # Removes duplicate rows
Explanation: duplicated()
checks for duplicate rows, and drop_duplicates()
removes them.
19. What is the difference between apply()
and map()
in Pandas?
Answer:
apply()
: Used for applying a function on rows or columns of a DataFrame.map()
: Used for applying a function on a single Pandas Series.
Example:
df['Salary'] = df['Salary'].apply(lambda x: x * 1.1) # Increases salary by 10%
df['Name'] = df['Name'].map(str.upper) # Converts names to uppercase
20. How do you check the data types of each column in a DataFrame?
Answer:
df.dtypes
Explanation: The dtypes
attribute displays the data type of each column.
21. How do you change the data type of a column in Pandas?
Answer:
df['Age'] = df['Age'].astype(int) # Converts 'Age' column to integer
Explanation: astype()
is used to convert the data type of a column.
22. How do you sort a DataFrame by a specific column?
Answer:
df_sorted = df.sort_values(by='Salary', ascending=False) # Sorts in descending order
Explanation: sort_values()
sorts the DataFrame by a specific column.
23. What is a lambda function in Python?
Answer: A lambda function is an anonymous, one-line function.
Example:
square = lambda x: x ** 2
print(square(5)) # Output: 25
Explanation: lambda
is useful for short, simple functions.
24. How do you concatenate two DataFrames in Pandas?
Answer:
df_combined = pd.concat([df1, df2], axis=0) # Vertical concatenation
df_combined = pd.concat([df1, df2], axis=1) # Horizontal concatenation
Explanation: pd.concat()
combines DataFrames along rows (axis=0
) or columns (axis=1
).
25. How do you count unique values in a DataFrame column?
Answer:
df['City'].nunique() # Returns the number of unique values
df['City'].unique() # Returns a list of unique values
Explanation: nunique()
counts unique values, while unique()
lists them.
26. How do you rename a column in a Pandas DataFrame?
Answer:
df.rename(columns={'OldName': 'NewName'}, inplace=True)
Explanation: rename()
allows renaming columns.
27. What is the difference between .iloc[]
and .iat[]
?
Answer:
.iloc[]
: Used for selecting multiple rows and columns by index..iat[]
: Faster, used for selecting a single element by row and column index.
Example:
df.iloc[2, 1] # Selects value at row index 2, column index 1
df.iat[2, 1] # Faster method for selecting a single value
28. How do you find the correlation between columns in Pandas?
Answer:
df.corr()
Explanation: corr()
returns a correlation matrix, showing how strongly columns are related.
29. How do you create a new column in a DataFrame based on conditions?
Answer:
df['Category'] = df['Salary'].apply(lambda x: 'High' if x > 50000 else 'Low')
Explanation: apply()
is used to create a new column based on conditions.
30. How do you save a DataFrame to a CSV file?
Answer:
df.to_csv('output.csv', index=False)
Explanation: to_csv()
exports the DataFrame to a CSV file, and index=False
prevents saving the index.
Also Read: Python Interview Questions and Answers 2025 – 45 Most Asked Questions