- What is Data Science?
- definition: deriving insights from data. these insights are then used to provide value to a business
- main factors
- explosion of data (like MapReduce)
- technological advances (like GPUs)
- success stories
- Data Science ≠ Machine Learning
- Machine learning is based on the idea that algorithms can identify and learn patterns from data and make decisions with minimal human intervention
- two diciplines that provide suitable tools to help solve data science problems
- Operations Research: problem-solving techniques ranging from mathematical modeling and stochastic processes to mathematical optimization to simulation in order to improve decision-making
- Statistics: for collecting, analyzing, interpreting and presenting empirical data
- The most important for a data scientist to have a solid foundation in exploratory data analysis, data visualization, probability and statistics, optimization, mathematical modeling and computer science!
- What Makes a Good Data Scientist?
- identify the relevant questions
- acquire and clean the right data
- analyze that data to obtain results
- clearly communicate the findings
- conver the results into solutions
- The Data Science Process Workflow
- Specify Objective ⇒ Data Acquisition ⇒ Explore the Data ⇒ Establish a Baseline ⇒ Model the Data ⇒ Analyze Results ⇒ Communicate Findings ⇒ Iterate
- Data Science Deliverables: Prediction, Forecasts, Anomaly Detection, Recognition, Optimization, Segmentation, Recommendations, etc.
- Writing a Great Data Science Resume
- One Page
- Relevant Coursework
- Relevant Skills
- Relevants Projects: try not to include the common projects everyone has worked on
- Relevant Experience: relevant experience and employment history with impactful bullet points
- Include Accomplishments (put NUMBERS into your resume!): specific impact, business impact, competition ranking etc.
- Customize: customize your resume to specific jobs
- Data Science Interview Topics
- Probability & Statistics: Conditional probability (Bayes’ Theorem), Probability Distributions, Hypothesis Testing, Covariance and correlation
- Computer Science: Coding (Python or R), Data Structures (Lists, Hash Tables, Stacks, Queues, Treesm Graphs), Algorithms (searching, sorting, graph traversals), Databases (SQL, NoSQL), Distributed Computing (MapReduce, Spark, Hadoop)
- Machine Learning: Supervised Learning, Unsupervised Learning, Deep Learning, General Predictive Modeling (choosing the right evaluation metrics, train and test sets, cross-validation)
- Data Engieering: Data Wrangling, Cleaning and Visualization, Feature Engineering
- Doman Knowledge: depneding on the company and industry
- Behavioral and “Fit” Questions
- Data Science Interview Process
- Coding Challenge ⇒ HR Screen (asking you behavioral questions), Technical Screen (questions ranging from computer science to machine learning to statistics) ⇒ Take Home Project (testing your coding, analytical, and communication skills - Be aware of the target audience!) ⇒ Onsite ⇒ Offer & Negotiation
- Behavioral & Fit Questions: Teamwork, Ability to Adapt, Communication using the STAR (Situation, Task, Action, Result) method