ML Basic Concepts Interview Questions

“Learn from yesterday, hope for tomorrow, but live for today”

Frequenct interview questions

1). What is overfitting? / Please briefly describe what is bias vs. variance.

2). How do you overcome overfitting? Please list 3-5 practical experience. / What is ‘Dimension Curse’? How to prevent?

3). Please briefly describe the Random Forest classifier. How did it work? Any pros and cons in practical implementation?

4). Please describe the difference between GBM tree model and Random Forest.

5). What is SVM? what parameters you will need to tune during model training? How is different kernel changing the classification result?

6). Briefly rephrase PCA in your own way. How does it work? And tell some goods and bads about it.

7). Why doesn’t logistic regression use \(R^2\)?

8). When will you use L1 regularization compared to L2?

9). List out at least 4 metrics you will use to evaluate model performance and tell the advantage for each of them. (F1 score, ROC curve, recall, etc…)

10). What would you do if you have 30% missing value in an important field before building the model?

Overall Theory Questions

Some modeling questions and answers * How to find outlier * How to engage in missing value * How to deal with imbalanced data * Model evaluation should be clear about the characteristics and application of each metric a. Cross-Validation, stratified cross-validation b. MSE, MAE, impurity function, cross-entrop, precision, recall, AUC, ROC, F1… * false positive and false negative: give examples where false positive is more important than false negative * How to choose feature * Overfitting and underfitting performance and solutions * Variance / bias trade-off * Out-of-bag sample * Explain gradient descent, stochastic gradient descent, mini-batch gradient descent… * The difference between statistical learning and machine learning * Spherical hashing, I really feel that this question is out of outline … not to be prepared. * I have not been tested, but personally feel that I still understand a better knowledge point: a. Parametric / Non-parametric model b. Generative / Discriminant model c. Curse of dimension

REGRESSION
- The basic assumption of linear regression, what to do if the basic assumption is violated
- How to measure collinearity, VIF
- How to measure correlation and causation respectively
- Linear regression, how does the model change when various linear transformations are performed on the data, how does the predicted value, R square, coefficient, etc. change
- Why the residual sum is zero under OLS
- According to the residual plot and QQ-plot to judge whether the fitting is good or not
- Potential test sites that I have not been tested but I can think of a. How to estimate the parameters of Logistic regression b. The form of LOSS function of Logistic regression c. Why use OLS estimation in linear regression, some properties of OLS estimator (BLUE)
REGULARIZATION
- Compare Lasso and Ridge
- Are the results of different programming languages Lasso the same? Not the same, because grid is different. I don’t know what I did for the exam.
- L1 norm and L2 norm
- Are the estimated coefficients of Regularization unbiased?
TREE & ENSEMBLE
- Explain the tree model
- Explain the random forest model, and explain the boosting model in contrast (GBT is more often tested)
- Adjustable parameters of Random forest and GBT model in programming language
- To know that each tree in random forest is better to be deeper, because random forest is more suitable for low bias high variance; each tree of boosting model should not be too deep
- What is the most popular model? why?
- In short, it is recommended to understand the advantages and disadvantages of each model, what situation it applies to, what data, complexity, and calculations are related to what.

KNN
- Please explain KNN, and then write down its implementation code.
K-MEANS
- Please explain K-means, and then write its implementation code
- How to choose k. Check 1point3acres for more.
- How to measure the results (unsupervised learning, I guess interviewers often want to hear some cooperation with domain people)

SVM
- Please explain SVM, (It seems that any model may “explain the model”)
- What is Support vector
- Please explain the kernel trick, why its kernel matrix is positive definite
- To know what the complexity of the SVM depends on, the sample size or the number of variables
- Explain several important parameters of the SVM model

ML related algorithm implementation
- I have been tested a. Please write a KNN algorithm b. Please write a KMeans algorithm c. Please write a mini-batch gradient descent function-baidu 1point3acres
NLP related
- Unless you interview an NLP engineer, you are rarely taken the initiative
- According to different interview positions, ML model is based on text data for some positions, so NLP can add points
- Some basic concepts such as: a. BOW model, N-gram model b. Term matrix, TFIDF c. Stemming, part-of-speech, NER d. Word2Vec, Topic modeling
- There was only one real experience of taking the NLP exam, which was more biased towards computational linguistic. I asked to give a dirty data, how to design rules to extract the company name (personally think that this test site is a bit high-level, do n’t just look at it)

[Preparation experience]
- For questions that require oral answers, Suggest to go Quora Yes, sort out the answers of several people and merge them into your own answers **. Very good.
- A systematic ML course to understand long live.
- Practice to implement the ML algorithm manually. KNN and K-Means are relatively easy to write and often tested; Nayes Bayes can also be written. Logistic regression / linear regression can also be written with gradient descent. Some friends have been tested with the EM algorithm.
A complete set of OLS, assumption, estimation, test \ (t test, F test ), diagnostic, model selection: There are many common problems in addition to basic knowledge, such as multicollinearity \ (how to identify it, how to deal with it ), Such as list all assumptions of OLS, write down or derive beta estimates \ (using max likelihood or least squares ), such as BIC or subset of features, such as the geometric interpretation of R squared, such as double observation, how prediction and coef estimation will change,
A complete set of Logistic, basic knowledge of log \ (p / 1-p ), likelihood, etc. If you expand a bit, use logistic for classification, which can involve how to choose cut off \ (error rate, auc, cross validation ) Compare logistic versus LDA
A little more advanced: Ridge vs lasso, when to use ridge and when to use lasso, talk about the advantages and disadvantages, talk about bias-variance trade-off, talk about how you choose tuning parameter lambda
PCA, can be used as a solution to collinearity, write about SVD, talk about what is related to SVD, what are the disadvantages

Resources

Interview Questions

"Rome was not built in a day"

Frequenct interview questions

Overall Theory Questions

Resources

CATALOG

FEATURED TAGS