[Book Summary - CtDSI] Cracking the Data Science Interview Ch. 1
 
- What is Data Science?
- definition: deriving insights from data. these insights are then used to provide value to a business
 
- main factors
- explosion of data (like MapReduce)
 
- technological advances (like GPUs)
 
- success stories
 
 
 
- Data Science ≠ Machine Learning
- Machine learning is based on the idea that algorithms can identify and learn patterns from data and make decisions with minimal human intervention
 
- two diciplines that provide suitable tools to help solve data science problems
- Operations Research: problem-solving techniques ranging from mathematical modeling and stochastic processes to mathematical optimization to simulation in order to improve decision-making
 
- Statistics: for collecting, analyzing, interpreting and presenting empirical data
 
 
- The most important for a data scientist to have a solid foundation in exploratory data analysis, data visualization, probability and statistics, optimization, mathematical modeling and computer science!
 
 
- What Makes a Good Data Scientist?
- identify the relevant questions
 
- acquire and clean the right data
 
- analyze that data to obtain results
 
- clearly communicate the findings
 
- conver the results into solutions
 
 
- The Data Science Process Workflow
- Specify Objective ⇒ Data Acquisition ⇒ Explore the Data ⇒ Establish a Baseline ⇒ Model the Data ⇒ Analyze Results ⇒ Communicate Findings ⇒ Iterate
 
 
- Data Science Deliverables: Prediction, Forecasts, Anomaly Detection, Recognition, Optimization, Segmentation, Recommendations, etc.
 
- Writing a Great Data Science Resume
- One Page
 
- Relevant Coursework
 
- Relevant Skills
 
- Relevants Projects: try not to include the common projects everyone has worked on
 
- Relevant Experience: relevant experience and employment history with impactful bullet points
 
- Include Accomplishments (put NUMBERS into your resume!): specific impact, business impact, competition ranking etc.
 
- Customize: customize your resume to specific jobs
 
 
- Data Science Interview Topics
- Probability & Statistics: Conditional probability (Bayes’ Theorem), Probability Distributions, Hypothesis Testing, Covariance and correlation
 
- Computer Science: Coding (Python or R), Data Structures (Lists, Hash Tables, Stacks, Queues, Treesm Graphs), Algorithms (searching, sorting, graph traversals), Databases (SQL, NoSQL), Distributed Computing (MapReduce, Spark, Hadoop)
 
- Machine Learning: Supervised Learning, Unsupervised Learning, Deep Learning, General Predictive Modeling (choosing the right evaluation metrics, train and test sets, cross-validation)
 
- Data Engieering: Data Wrangling, Cleaning and Visualization, Feature Engineering
 
- Doman Knowledge: depneding on the company and industry
 
- Behavioral and “Fit” Questions
 
 
- Data Science Interview Process
- Coding Challenge ⇒ HR Screen (asking you behavioral questions), Technical Screen (questions ranging from computer science to machine learning to statistics) ⇒ Take Home Project (testing your coding, analytical, and communication skills - Be aware of the target audience!) ⇒ Onsite ⇒ Offer & Negotiation
 
 
- Behavioral & Fit Questions: Teamwork, Ability to Adapt, Communication using the STAR (Situation, Task, Action, Result) method
 
 
 
 
          
      
 
  
 
 
 
 
 
 
 
 
 
 
 
 
No comments:
Post a Comment