Data Science interview: what to expect from you

Data Science is a very promising area . Last year, at EPAM, we received 210 resumes from people who want to do Data Science. Of these, we invited 43 people to a technical interview, and offered seven jobs. If demand is high, why?

We talked with technical interviewers and found out: the problem with many candidates is that they have a poor idea of ​​what data analysts are doing. Therefore, their knowledge and skills are not always relevant for work. Someone thinks that working with Big Data is enough to work in Data Science, someone is sure that it’s enough to view several courses on machine learning, some people think that it’s not necessary to understand the algorithms well.

Dmitry Nikitko and Mikhail Kamalov , data analysts and technical interviewers from EPAM, told us what they expect at interviews from candidates, what questions they ask, what is valued in the resume, and how to prepare for the interview.



Different companies have a different understanding of what data analysts do. Someone understands this concept wider, someone - already. Here's what such experts at EPAM do:

  • Engaged in data preprocessing
  • Look for patterns in data and test hypotheses
  • Create predictive models using machine learning algorithms
  • Evaluate the quality of the resulting models.
  • Visualize data
  • Help integrate solution

The challenges that data analytics work with are many. For example, ranking can be applied not only to search results, but also to the creation of recommendation systems, the search for similar pictures, music and even a 3D model of the face. In each of these cases, you need to find a relevant answer on request. But the data types are different, and you need to know which strategy to apply in a particular case.

At EPAM, they did a test that recruiters send candidates before an interview. The part where you need to choose the right option is checked automatically. The part that contains detailed answers to questions is read by technical interviewers.

What you need to be able to


In short, a data analyst is a person who knows how to program (in most cases in Python), understands statistics, mathematics, algorithms and speaks English.
English is needed not only to read specialized literature and deal with documentation. Many analysts communicate directly with foreign customers. By the way, the ability to translate from the language of a data scientist to one that is understandable to business is also useful here.

Is specialized education compulsory?


It is important to know mathematics well, and higher technical education is a big plus. Most data scientists at EPAM are maths, programmers, or physicists. But this is not a strict requirement - we have a linguist employee, and recently we took a sociologist who, after graduating from the university, processed the results of sociological studies, created models, and was engaged in forecasting and analysis of social graphs. Such an experience is relevant to work in Data Science, so the candidate was interesting to us.

In general, it cannot be said that a person with a technical education will suit us, but with a humanitarian education - no. It all depends on skills and experience. For example, a computer linguist who has learned to write code is a more interesting candidate than a Big Data engineer who worked with MapReduce and Hadoop, but is not versed in algorithms, or who has a degree in statistics without experience.

What is valued in a resume


Work experience is most appreciated. If you have already worked in Data Science, write in detail what you did, what algorithms you used and what skills you have.
If you do not have experience, a big plus in the resume will be:

  • A short story about pet projects . It is important that the candidate not only knows the theory, but also has time to practice.
  • Participation in hackathons . This suggests at least that you worked in a team and (most likely) created a working solution in a limited time. Participation in hackathons . орошо еще и тем, что на них вас могут заметить работодатели. Then sending a resume may not be necessary at all.
  • Participation in machine learning competitions (Kaggle, DrivenData). If you participated or even won the Instacart competition at Kaggle, where you needed to create a recommendation system, you can solve a business problem with similar goals faster. But, in our experience, victory in such competitions does not always mean that the candidate knows, for example, how the algorithms that he used work.

What is asked at the interview


The purpose of an interview on Data Science, as elsewhere, is to understand how well a person understands his subject area. First, the interviewer asks questions about the basics of machine learning and statistics. By the answers you can understand the depth and breadth of knowledge of the candidate on basic issues. After that, specific questions are asked, for example, on natural language processing, working with time series or recommendation systems. If the candidate says that he knows how to work with graphs, images or other data, he will be asked about this.

Universal soldiers are extremely rare, and questions at the interview depend on the experience of the candidates. They usually ask about past projects, what technologies they used and why. After that, they may be asked to reason. And of course they will ask a few theoretical questions.

Here are some questions that may be asked during the interview:
Neural networks
- What methods of preventing retraining (regularization) for neural networks do you know? How do they work? Where to insert batch normalization?

- What is the difference between a neural network with one output and a sigmoidal activation function and the same neural network, but with two outputs and softmax?

- Imagine that we have a multilayer fully connected network with a nonlinear activation function. What will happen to the neural network if we remove the nonlinearity?

- What is global pooling used for?


Image Recognition
- How is quality assessed in object detection tasks?
- What architecture of neural networks for semantic segmentation do you know?
- How and why to use transfer learning?


Time series
- How to properly test the quality of models in working with time series?
- What should we do with seasonality in the data?
- How to search for anomalies in time series?


Natural Language Processing
- What is the basis of topic
modeling? How does this algorithm work? How do you choose the number of topics that will be taught by this algorithm?

- You have a review text and rating, users use a 5-point scale. How would you build a system that will be able to predict the rating on the review text? How to evaluate the quality of this system?


In the course of reasoning and solving problems, interviewers ask a lot of clarifying questions and try to place the candidate in “combat conditions”. For example, the candidate offers a solution, and the interviewer adds new conditions to the task.

“What will you do if the data set is unbalanced?”
“How will you solve the problem if there are data gaps?”
“What if there are outliers in the data?”


In addition, they may ask how the candidate organizes his working time, how the experiments are logged, whether they monitor their reproducibility, how he processes large volumes of data and builds data processing pipelines.

Common Interview Mistakes


• The candidate does not understand how the algorithms that he used work
Interviewers always ask about the algorithms that the candidates used: what parameters they have, how to configure them. If there is no answer, or the candidate replies that he tuned the algorithm "on a hunch" - this is bad. If you take an algorithm, it's worth the time to figure out how to set it up.

• The candidate does not understand how to apply his knowledge in “combat conditions”
It happens like this: the candidate knows the theory well, but does not imagine how to cope with problems on projects. It is important not only to be able to find insights in the data, do feature-engineering, build models, but also understand how to put all this into production or make a decision that will work faster.

A candidate cannot reason on his own.
If a person answers the question too often: “I will google” - this is not a good sign. Of course, data scientists google, but being able to reason independently is also important: sometimes there are problems for which there is no ready-made solution, and you need to come up with something of your own.

Candidate decides how the system works
Sometimes people cannot answer the question of how this or that system works, and begin to invent, hoping to get a finger into the sky. This is not recommended: the interviewer will notice this. It’s better to honestly say, “I don’t know,” then more time will be left for other questions. The likelihood that you will be asked about what you understand will grow.

Bibliography


Anyone who wants to study Data Science is advised to watch / read:
• Course “Programming in Python” at Stepik
• Course “Introduction to Machine Learning” at Coursera
• Course “Machine Learning and Data Analysis” at Coursera
• Course “Machine Learning” by Konstantin Vorontsova
Deep learning courses at Coursera
“Neural Networks” course at Stepik
Deep Learning Book
• Book “Deep learning: immersion in the world of neural networks” - the first book on deep learning in Russian
• Book on NLP Speech and Language Processing
• Book on information retrieval and NLP “Introduction to Information Retrieval”
• Articles on opendatascience
• Course “Algorithms and structures data » Maxim Babenko