Interview Questions (Quant Research)

10 min readJul 19, 2023

P(X + 3Y > 0 | X > 0) where x and y are standard normal random variables. Solve via diagram, as bivariate has cylindrical symmetry so just take the required region!

what is the significance of Jacobian, what is the joint pdf of 2 iid Normals signifying -> a circle and angle pdf multiplication basically.

Double money on every head and if tail you take away all the money, game ends. assumptions? infinite money for casino. How to take care of these assumptions, how to make it a fair game?

pricing of bonds -> x dollars in x years, which one to pick. How to get cashflow or value for current year, of the bond!

KL Divergence -> log weighted by p(x).

Logistic Regression Convex Optimization -> Always goes to global minima, because of loss function> What if we take MAE as loss function? If being off by 4 is twice as bad as being off from 2, then use MAE. MAE is less sensitive to outliers basically.

For linear regression, Look at the gradients for numerical stability! Probably standardize them to a unit vector and then do the updates.

What are the assumptions for MLE? I.I.D

The main difference is that MLE assumes that all solutions are equally likely beforehand, whereas MAP allows prior information about the form of the solution to be harnessed. With a uniform prior, MLE = MAP.

Why is correlation related to linearity, the intuition? The product of regression coefficients is equal to correlation coefficient squared for simple linear regression.

Sum of errors for regression with an intercept term is always equal to zero.

Autoencoder project -> We train an autoencoder on good credit card transactions, so reconstruction error on good credit card data is low while on high credit card is high.

Linear fleet of cars, keep adding one car after other, the expected number of fleets? If the new car is slowest out of the current available cars, then the number of fleets increases by 1 else it catches up and the number of fleets remain the same. E(n) = (1/n)*(E(n-1)) + (n-1)/n*(E(n-1) + 1) => E(n) = 1 + 1/2 + 1/3 + 1/4 + 1/5 +….

Double chess? Okay.

Expected draws until we see an ace? 52 cards 4 aces, so 5 partitions

In each partition there are 48/5 cards => 9.6 => 10.6th card is ace. This is equivalent to after you pick the last ace, how many cards will be remaining.

Simulate event of probability 1/3 using a coin? Easy :)

Probability of forming a triangle using 2 random cuts on rod? 1/4 :)

Same but second cut on larger part now? Take max concept, solve?

Inverse CDF method, the CDF -> F(x) -> U(0,1) as each value is seen once only, so it is a Uniform distribution with values between (0,1) due to the probability.

PCA -> Projection of vectors and then calculating mean and variance for them, |a|cos(theta) 0<cos(theta)<1

One 18 sided dice vs three 6 sided dice -> thin tailed is 6 sided dice.

Calculate mean and variance of U(0,1) -> 0.5 is mean, variance is …E(X²) — E(X)²

Central limit theorem? When does it not hold? For samples < 30, For not IIDs, for distributions with infinite variance (Cauchy distribution).

CLT is great for hypothesis testing. Because stats is all about estimating population parameters from samples. And given a hypothesis about the population, given a sample can we say the null hypothesis is true or no. if p value is less, then reject null hypothesis.

Law of Large Numbers ? sigma/sqrt(n)

SVM -> Kernel trick! It helps with dot products for between points in a high dimensional space

ML -> Trees (Do one side project on tabular data!!) What happens as we keep increasing the trees on random subsets of data!

For GB -> It will go down eventually. In boosting, we form an ensemble of the learners.

For LinR -> Goes up and saturates, effectively duplicating the data, just the confidence intervals will shrink, parameter estimates remain the same(from normal equation)! standard error reduces basically.

95% Confidence Interval for different samples -> 95% of time it will contain the true value!

P value -> reject the null hypothesis if it is lower than a threshold!

Why Cross validation does not work on Finance(time series) data? out of context

Binomial Distribution example -> n coin flips with X being number of heads!

Why Is Autocorrelation Problematic?

Most statistical tests assume the independence of observations. In other words, the occurrence of one tells nothing about the occurrence of the other. Autocorrelation is problematic for most statistical tests because it refers to the lack of independence between values.

simulate irrational probability using fair dice?

how to check for stationarity in time series?

Visually, global vs local…

Augmented dicky dueller?

Potential questions

A fair die is rolled n times. What is the probability that the largest number rolled is r, for each r in 1..6?P(r =1) = 1/6^n
P(r ≤2) = (2/6)^n
P(r =2) = (2/6)^n — (1/6)^n and so on

A fair coin is tossed n times. Given that there were k heads in the n tosses, what is the probability that the first toss was heads?

P(first heads|k tosses) = P(k heads|first head).P(first head)/P(k heads) => *n-

[(n-1ck-1)]/[n-1ck + (n-1ck-1)] => (n-1)!/(n-k)!(k-1)![(n-1)!/k!(n-1-k)!) + (n-1)!/(n-k)!(k-1)!] =>

Probability that x³ ends with 11? 1/100 :) (a + 10b)³

the variance of the x-axis of points distributed uniformly on a unit ball in R3?

There are n points on the plane and then find three points to form the largest angle.

Markov chains!!! -> You are on “0” index of a X line at the beginning. There is a black hole on 2N and another black hole on -N. At every position there’s equal probability of going left or right. When you arrive one of the black hole, the process is end. What’s the probability that you end up at the black hole at index 2N?

When to use cosine similarity and when to use Pearson correlation?

How to exponentiate a matrix? X = PDP^-1 so X^n = PD^nP^-1 as all the other terms cancel out :) Verify! And implement the nth power of a real number in O(logn) time using Divide and conquer.

@lru_cache

def recurse(x,n):

if dp[n] is not None:

return dp[n]

if n == 0:

return 1

a = n//2

b = n — a

return dp[n] = recurse(x, n//2)*recurse(x,n//2)

Distribution with infinite variance? Pareto, cauchy!

Black Scholes- >

covariance of linear combination of random variables -> cov(ax,by) = abcov(x,y)

How would you detect outliers in a dataset? box plots, plot histograms, quantile plots.

Compare Lasso and Ridge regression.

Math behind l1 and l2, why l1 gives sparsity?

Write python code to find max drawdown of trading strategy?
2) Compare two lotteries based on their expected payoff and standard deviation
3) Probability that n points on a circle are in one semicircle -> n/2^(n-1)

Explanation -> select one point and rest is 1/2^n-1, now each of them can be the leading point.

4) How to extract features using linear regression -> Take example of least angle regression, first one with highest correlation to y. Then next with highest correlation to residuals ( as if residuals show a pattern, they are botched basically)
The St Petersburg paradox ->
1. Sum of list of list — find sum of elements in cols and rows in all previous index.
2. if event X is getting 2 in a dice and if event Y is getting 3 in a dice, what is the covariance of X and Y if the dice is rolled 6 times. (-n/36 => -1/6)
3. If the head is thrown it is 1 and if the tail is thrown it is 2. After every throw, if you sum the values . What is the probability of getting 100 as sum at one point? Solve! Take cases and solve

Calculate probability one of two events occurs!

Gaussian processes, how to address non-stationarity (not asked in those terms but probably what they want you to do), lag analysis (auto-correlation, partial auto-correlation), correlation analysis between series. They don’t ask things clearly like this but it’s what they expect candidates to do I think.

Give an example of a Poisson distribution.

ROC/AUC -> Area under curve calculated for FPR-TPR (TPR on y axis, FPR on x axis) at different probability thresholds. TPR = Recall! (TP/(TP + FN)) -> it should BR H

FPR = FP/(FP + TN)

Precision-recall curve is used for evaluating the performance of binary classification algorithms. It is often used in situations where classes are heavily imbalanced.

SVMs are very good when you have tabular data with many features and few datapoints. They got quite famous on biomedical and gene research, which often has an overwhelming number of features and 20–50 datapoints per dataset.

More generally, think of SVMs of a “very good algorithm that is cheap to train and use”. They won’t be good at things like images or audio, but they can be your go-to tool for tabular data. The same goes for XGBoost and other similar algorithms.

Mathematically, in support vector machines, the non-linear decision boundary is constructed as a linear combination of all the training data, so that a new point is classified using the following function:
y=sign(∑mi=1αiyiK(x,xi))�=��(∑�=1��(�,��))
where K� is a kernel function which measures the similarity between points x� and xi�� and αi�� are the dual variables.
Now, it turns out that often, only a few αi��’s are non-zero. Therefore, you only need to keep those αi��’s and the corresponding training points in memory.

RBF kernel vs linear kernel?

Random forest is an extension of bagging! [incorporate random feature selection also]

Spectral clustering?

consecutive sum (Leetcode)

3 sum in python (Leetcode)

How to construct alphas using daily data? Can you list five factors that can predict the return?

Predictive power of model?

What is the GMM model?

What ‘s the difference between bagging and boosting? random forest is built on top of bagging.

OLS by hand? OLS assumption? double derivative > 0

Simulate Monty hall in python?

3 doors, say car == 1, we select a door, then one door must be having a goat. monty opens that door, what is probability of winning after switching?

try out for 4 doors.

How to write a function that return 4 times the input, but can not use multiply and add operators?

If we generate a uniform random number X∈[0,1]�∈[0,1], then generate a uniform random number Y∈[X,1], are the pairs (X,Y)uniformly distributed?

Not at all. If we take X and Xin (0.9, 1.0) that is all values are within 0.9,1.0 so dense within right corner.

But for X around lower (0,0.1) only 10% of those values are here.

Expected number of rolls of a fair six-sided die to get a 6, given all rolls leading up to that 6 are even numbered? 3/2

P(of getting 1 or 3) = 1/3 rest is 2/3 which is 6, better is to calculate updated probabilites and proceed.

Given three time series data X, Y, Z, build a model of X in terms of Y and Z? Can we use greater lag?

You are given an urn with 100 balls (50 black and 50 white). You pick balls from urn one by one without replacements until all the balls are out. A black followed by a white or a white followed by a black is “a color change”. Calculate the expected number of color changes if the balls are being picked randomly from the urn.

> expectation is linear, so there can be 99 color change slots basically. We have so 99*E(x1) = 99*50/99 = 50 color changes.
There are 51 ants sitting on top of a square table with side length of 1. If you have a square card with side 1/5, can you put your card at a position on the table to guarantee that the card encompasses at least 3 ants? Yes, Pigeonhole principle
(updated: square card was originally disk of radius 1/7) ?

A hedge fund has 70 employees. For any two employees X and Y there is a language that X speaks but Y does not, and there is a language that Y speaks but X does not. At least how many different languages are spoken by the employees of this hedge fund? (8 -> 8C4 = 70) max of NCr occurs at r = n//2.

for svm margin is calculated only using data points having alpha > 0 which are called as the support vectors. It is a LaGrange problem, solved for maximizing the margin equivalent to minimizing ||w||² which is convex in nature subject to the data points class labels.

Why are other kernels slower, because of creating n*n matrix to store similarity of data points as we are calculating dot products effectively.

What is put call parity?

future vs forward, futures are regulated and traded on an exchange unlike forwards which are highly customized.

all options in india are european options.

what is stop loss?

how are ipos issued?

A limit order is used to buy or sell a security at a pre-determined price and will not execute unless the security’s price meets those qualifications. more.What is a stop limit order example?

For example, if the current price per share is $60, the trader can set a stop price at $55 and a limit order at $53. The order is activated when the price falls to $55, but not below $53. Below $53, the order will not be fulfilled.[limit loss but not that desperate]

limit order is order will not execute until it meets a standard.

adani shares plumetted.

what happened to silicon valley bank? credit suisse?

Let the grid be the same then how many paths are there from (0,0) to (8,6) with the same constraints of moving either UP or RIGHT. But add another constraint that the total number of turns taken should be even.

In a deck of 52 cards, 26 are reds and 26 are blacks. After shuffling the cards, the deck is divided into two stacks. What is the probability of the number of red cards in stack 1 is equal to the number of black cards in the stack 2 -> Answer is 1, lol

There are 26 black(B) and 26 red(R) cards in a standard deck. A run is number of blocks of consecutive cards of the same color. For example, a sequence RRRRBBBRBRB of only 11 cards has 6 runs; namely, RRRR, BBB, R, B, R, B. Find the expected number of runs in a shuffled deck of cards.

answer is 27 , 51 slots given that last card always represents a run. So, 51*26/51 + 1 => 27!

Intuition behind linearity of expectation/ independence?(IMP)

A stick is broken into 3 pieces, by randomly choosing two points along its unit length, and cutting it. What is the expected length of the middle part?(Derive this again, imp)

Given the set of numbers from 1 to n: { 1, 2, 3 .. n } We draw n numbers randomly (with uniform distribution) from this set (with replacement). What is the expected number of distinct values that we would draw?

Each number becomes an indicator variable basically and we see if it is picked in any draw or not.

Note that not all Markov chains have a stationary distribution. Some Markov chains may have absorbing states where the system eventually settles and remains in a specific state indefinitely, resulting in a lack of a stationary distribution.

Interview Questions (Quant Research)

Why Is Autocorrelation Problematic?

Potential questions

Written by Faraz Gerrard Jamal