Thursday, January 19, 2023

[Book Summary - CtDSI] Cracking the Data Science Interview Ch. 1

 

  1. What is Data Science?
    • definition: deriving insights from data. these insights are then used to provide value to a business
    • main factors
      • explosion of data (like MapReduce)
      • technological advances (like GPUs)
      • success stories
  2. Data Science ≠ Machine Learning
    • Machine learning is based on the idea that algorithms can identify and learn patterns from data and make decisions with minimal human intervention
    • two diciplines that provide suitable tools to help solve data science problems
      • Operations Research: problem-solving techniques ranging from mathematical modeling and stochastic processes to mathematical optimization to simulation in order to improve decision-making
      • Statistics: for collecting, analyzing, interpreting and presenting empirical data
    • The most important for a data scientist to have a solid foundation in exploratory data analysis, data visualization, probability and statistics, optimization, mathematical modeling and computer science!
  3. What Makes a Good Data Scientist?
    • identify the relevant questions
    • acquire and clean the right data
    • analyze that data to obtain results
    • clearly communicate the findings
    • conver the results into solutions
  4. The Data Science Process Workflow
    • Specify Objective ⇒ Data Acquisition ⇒ Explore the Data ⇒ Establish a Baseline ⇒ Model the Data ⇒ Analyze Results ⇒ Communicate Findings ⇒ Iterate
  5. Data Science Deliverables: Prediction, Forecasts, Anomaly Detection, Recognition, Optimization, Segmentation, Recommendations, etc.
  6. Writing a Great Data Science Resume
    • One Page
    • Relevant Coursework
    • Relevant Skills
    • Relevants Projects: try not to include the common projects everyone has worked on
    • Relevant Experience: relevant experience and employment history with impactful bullet points
    • Include Accomplishments (put NUMBERS into your resume!): specific impact, business impact, competition ranking etc.
    • Customize: customize your resume to specific jobs
  7. Data Science Interview Topics
    • Probability & Statistics: Conditional probability (Bayes’ Theorem), Probability Distributions, Hypothesis Testing, Covariance and correlation
    • Computer Science: Coding (Python or R), Data Structures (Lists, Hash Tables, Stacks, Queues, Treesm Graphs), Algorithms (searching, sorting, graph traversals), Databases (SQL, NoSQL), Distributed Computing (MapReduce, Spark, Hadoop)
    • Machine Learning: Supervised Learning, Unsupervised Learning, Deep Learning, General Predictive Modeling (choosing the right evaluation metrics, train and test sets, cross-validation)
    • Data Engieering: Data Wrangling, Cleaning and Visualization, Feature Engineering
    • Doman Knowledge: depneding on the company and industry
    • Behavioral and “Fit” Questions
  8. Data Science Interview Process
    • Coding Challenge ⇒ HR Screen (asking you behavioral questions), Technical Screen (questions ranging from computer science to machine learning to statistics) ⇒ Take Home Project (testing your coding, analytical, and communication skills - Be aware of the target audience!) ⇒ Onsite ⇒ Offer & Negotiation
  9. Behavioral & Fit Questions: Teamwork, Ability to Adapt, Communication using the STAR (Situation, Task, Action, Result) method

No comments:

Post a Comment