Answer these Python questions for a successful data science job interview

0

If you want to pursue a career in data science, knowing Python is a must. Python is the most popular programming language in data science, especially when it comes to machine learning and artificial intelligence.

To help you in your career in data science, I have prepared the main Python concepts tested in the data science interview. Later, I’ll discuss two main types of interview questions that cover the concepts you need to know as a data scientist. I’ll also show you several sample questions and give you solutions to push you in the right direction.

Technical concepts of Python interview questions

This guide is not specific to the company. So, if you’ve been planning a data science interview, I highly recommend using this guide as a starting point for what might come up during the interview. Moreover, you should also try to find business specific questions and try to resolve them as well. Knowing the general concepts and practicing them on real life issues is a winning combination.

I’m not going to bother you with theoretical questions. They may appear during the interview, but they also cover technical concepts found in coding questions. After all, if you know how to use the concepts I’m about to talk about, you probably know how to explain them too.

The Python technical concepts tested in data science job interviews are:

Data types

Integrated data structures

User-defined data structures

Integrated functions

Loops and conditions

External Libraries (Pandas)

1. Data types

Data types are the concept you should be familiar with. This means that you need to know the most commonly used data types in Python, the difference between them, when and how to use them. These are data types such as integers (int), floats (float), complexes (complexes), strings (str), booleans (bool), null values ​​(None).

2. Integrated data structures

These are the list, the dictionary, the tuple, and the sets. Knowing these four built-in data structures will help you organize and store data in a way that will allow easier access and modification.

3. User-defined data structures

In addition to using the built-in data structures, you should also be able to define and use some of the user-defined data structures. These are Tables, Stacks, Queues, Trees, Linked Lists, Charts, HashMaps.

4. Integrated functions

Python has over 60 built-in functions. You don’t have to know them all, but of course, it’s best to know as much as possible. The built-in functions that you cannot avoid are abs (), isinstance (), len (), list (), min (), max (), pow (), range (), round (), split (), sorted (), type ().

5. Loops and conditions

Loops are used in repetitive tasks when they run a piece of code over and over again. They do this until the conditions (true / false tests) tell them to stop.

6. External libraries (Pandas)

Although there are several external libraries in use, Pandas is probably the most popular. It is designed for practical data analysis in the fields of finance, social sciences, statistics and engineering.

Types of interview questions with Python

All of these six technical concepts are mainly tested by only two types of interview questions. These are:

Data manipulation and analysis

Algorithms

Let’s take a closer look at each of them.

1. Data manipulation and analysis

These questions are designed to test the above technical concept by solving ETL (extract, transform, and load data) issues and perform data analysis.

Here is one Facebook example:

QUESTION: Facebook sends SMS when users attempt to log into 2FA (2 Factor Authentication) on the platform to login. To pass 2FA, they must confirm that they have received the SMS. Confirmation SMS are only valid on the date they are sent. Unfortunately, there was an ETL issue with the database where invalid friend requests and confirmation records were inserted into the logs, which are stored in the ‘fb_sms_sends’ table. These types of messages should not appear in the table. Fortunately, the ‘fb_confirmers’ table contains valid confirmation records, so you can use this table to identify SMS text messages that have been confirmed by the user.

Calculate the percentage of confirmed text messages for August 4, 2020.

REPLY:

import pandas as pd
import numpy as np
df = fb_sms_sends[["ds","type","phone_number"]]
df1 = df[df["type"].isin(['confirmation','friend_request']) == False]
df1_grouped = df1.groupby('ds')['phone_number'].count().reset_index(name="count")
df1_grouped_0804 = df1_grouped[df1_grouped['ds']=='08-04-2020']
df2 = fb_confirmers[["date","phone_number"]]
df3 = pd.merge(df1,df2, how ='left',left_on =["phone_number","ds"], right_on = ["phone_number","date"])
df3_grouped = df3.groupby('date')['phone_number'].count().reset_index(name="confirmed_count")
df3_grouped_0804 = df3_grouped[df3_grouped['date']=='08-04-2020']
result = (float(df3_grouped_0804['confirmed_count'])/df1_grouped_0804['count'])*100

One of the questions asked to test your data analysis skills is this one from Dropbox:

QUESTION: Write a query that calculates the difference between the highest salaries found in marketing and engineering departments. Just take out the difference in wages.

REPLY:

import pandas as pd
import numpy as np
df = pd.merge(db_employee, db_dept, how = 'left',left_on = ['department_id'], right_on=['id'])
df1=df[df["department"]=='engineering']
df_eng = df1.groupby('department')['salary'].max().reset_index(name="eng_salary")
df2=df[df["department"]=='marketing']
df_mkt = df2.groupby('department')['salary'].max().reset_index(name="mkt_salary")
result = pd.DataFrame(df_mkt['mkt_salary'] - df_eng['eng_salary'])
result.columns = ['salary_difference']
result

2. Algorithms

As for the Python algorithm interview questions, they test your problem solving using the algorithms. Since algorithms aren’t limited to just one programming language, these questions test your logic and thinking, as well as coding in Python.

For example, you could get that question:

QUESTION: Given a string containing digits 2 through 9 inclusive, return all possible letter combinations that the number could represent. Resend the answer in any order.

A mapping from numbers to letters (like on phone buttons) is given below. Note that 1 does not match any letter.

REPLY:

class Solution:
def letterCombinations(self, digits: str) -> List[str]:
# If the input is empty, immediately return an empty answer array
if len(digits) == 0:
return []

# Map all the digits to their corresponding letters
letters = {"2": "abc", "3": "def", "4": "ghi", "5": "jkl",
"6": "mno", "7": "pqrs", "8": "tuv", "9": "wxyz"}

def backtrack(index, path):
# If the path is the same length as digits, we have a complete combination
if len(path) == len(digits):
combinations.append("".join(path))
return # Backtrack
# Get the letters that the current digit maps to, and loop through them
possible_letters = letters[digits[index]]
for letter in possible_letters:
# Add the letter to our current path
path.append(letter)
# Move on to the next digit
backtrack(index + 1, path)
# Backtrack by removing the letter before moving onto the next
path.pop()
# Initiate backtracking with an empty path and starting index of 0
combinations = []
backtrack(0, [])
return combinations

Or it might get even harder with the following question:

QUESTION: “Write a program to solve a Sudoku puzzle by filling in the empty cells. A sudoku solution must meet all of the following rules:

Each of the numbers 1 through 9 must appear exactly once in each row.

Each of the numbers 1 through 9 must appear exactly once in each column.

Each of the numbers 1-9 must appear exactly once in each of the 9 3 × 3 grid sub-boxes.

The ‘.’ character indicates empty cells.

REPLY:

from collections import defaultdict
class Solution:
def solveSudoku(self, board):
"""
:type board: List[List[str]]
:rtype: void Do not return anything, modify board in-place instead.
"""
def could_place(d, row, col):
"""
Check if one could place a number d in (row, col) cell
"""
return not (d in rows[row] or d in columns[col] or 
d in boxes[box_index(row, col)])
def place_number(d, row, col):
"""
Place a number d in (row, col) cell
"""
rows[row][d] += 1
columns[col][d] += 1
boxes[box_index(row, col)][d] += 1
board[row][col] = str(d)
def remove_number(d, row, col):
"""
Remove a number which didn't lead
to a solution
"""
del rows[row][d]
del columns[col][d]
del boxes[box_index(row, col)][d]
board[row][col] = '.'
def place_next_numbers(row, col):
"""
Call backtrack function in recursion
to continue to place numbers
till the moment we have a solution
"""
# if we're in the last cell
# that means we have the solution
if col == N - 1 and row == N - 1:
nonlocal sudoku_solved
sudoku_solved = True
#if not yet
else:
# if we're in the end of the row
# go to the next row
if col == N - 1:
backtrack(row + 1, 0)
# go to the next column
else:
backtrack(row, col + 1)
def backtrack(row = 0, col = 0):
"""
Backtracking
"""
# if the cell is empty
if board[row][col] == '.':
# iterate over all numbers from 1 to 9
for d in range(1, 10):
if could_place(d, row, col):
place_number(d, row, col)
place_next_numbers(row, col)
# if sudoku is solved, there is no need to backtrack
# since the single unique solution is promised
if not sudoku_solved:
remove_number(d, row, col)
else:
place_next_numbers(row, col)
# box size
n = 3
# row size
N = n * n
# lambda function to compute box index
box_index = lambda row, col: (row // n ) * n + col // n
# init rows, columns and boxes
rows = [defaultdict(int) for i in range(N)]
columns = [defaultdict(int) for i in range(N)]
boxes = [defaultdict(int) for i in range(N)]
for i in range(N):
for j in range(N):
if board[i][j] != '.':
d = int(board[i][j])
place_number(d, i, j)
sudoku_solved = False
backtrack()

It would be quite a complex algorithm and good for you if you knew how to solve it!

Conclusion

For a data science interview, the six technical concepts I mentioned are must-have. Of course, it is recommended that you dive even deeper into Python and broaden your knowledge. Not only in theory, but also by practicing solving as much as possible the questions of manipulation and analysis of data and algorithms.

For the first, there are plenty of examples on StrataScratch. You could probably find the questions of the company where you applied for a job. And LeetCode is a good choice when you decide to practice writing Python algorithms before your interviews.


Source link

Leave A Reply

Your email address will not be published.