Thursday, January 19, 2023

[Book Summary - CtDSI] Cracking the Data Science Interview Ch. 1

 

  1. What is Data Science?
    • definition: deriving insights from data. these insights are then used to provide value to a business
    • main factors
      • explosion of data (like MapReduce)
      • technological advances (like GPUs)
      • success stories
  2. Data Science ≠ Machine Learning
    • Machine learning is based on the idea that algorithms can identify and learn patterns from data and make decisions with minimal human intervention
    • two diciplines that provide suitable tools to help solve data science problems
      • Operations Research: problem-solving techniques ranging from mathematical modeling and stochastic processes to mathematical optimization to simulation in order to improve decision-making
      • Statistics: for collecting, analyzing, interpreting and presenting empirical data
    • The most important for a data scientist to have a solid foundation in exploratory data analysis, data visualization, probability and statistics, optimization, mathematical modeling and computer science!
  3. What Makes a Good Data Scientist?
    • identify the relevant questions
    • acquire and clean the right data
    • analyze that data to obtain results
    • clearly communicate the findings
    • conver the results into solutions
  4. The Data Science Process Workflow
    • Specify Objective ⇒ Data Acquisition ⇒ Explore the Data ⇒ Establish a Baseline ⇒ Model the Data ⇒ Analyze Results ⇒ Communicate Findings ⇒ Iterate
  5. Data Science Deliverables: Prediction, Forecasts, Anomaly Detection, Recognition, Optimization, Segmentation, Recommendations, etc.
  6. Writing a Great Data Science Resume
    • One Page
    • Relevant Coursework
    • Relevant Skills
    • Relevants Projects: try not to include the common projects everyone has worked on
    • Relevant Experience: relevant experience and employment history with impactful bullet points
    • Include Accomplishments (put NUMBERS into your resume!): specific impact, business impact, competition ranking etc.
    • Customize: customize your resume to specific jobs
  7. Data Science Interview Topics
    • Probability & Statistics: Conditional probability (Bayes’ Theorem), Probability Distributions, Hypothesis Testing, Covariance and correlation
    • Computer Science: Coding (Python or R), Data Structures (Lists, Hash Tables, Stacks, Queues, Treesm Graphs), Algorithms (searching, sorting, graph traversals), Databases (SQL, NoSQL), Distributed Computing (MapReduce, Spark, Hadoop)
    • Machine Learning: Supervised Learning, Unsupervised Learning, Deep Learning, General Predictive Modeling (choosing the right evaluation metrics, train and test sets, cross-validation)
    • Data Engieering: Data Wrangling, Cleaning and Visualization, Feature Engineering
    • Doman Knowledge: depneding on the company and industry
    • Behavioral and “Fit” Questions
  8. Data Science Interview Process
    • Coding Challenge ⇒ HR Screen (asking you behavioral questions), Technical Screen (questions ranging from computer science to machine learning to statistics) ⇒ Take Home Project (testing your coding, analytical, and communication skills - Be aware of the target audience!) ⇒ Onsite ⇒ Offer & Negotiation
  9. Behavioral & Fit Questions: Teamwork, Ability to Adapt, Communication using the STAR (Situation, Task, Action, Result) method

Sunday, January 8, 2023

[Paper Review - Psycholinguistics] Syntactic Categories in the Speech of Young Children (Valian, 1986)

Syntactic categories in the speech of young children

Valian (1986)

 2;0, 2;5 MLU 2.93-4.14 

  • Determiner, adjective, noun, noun phrase, preposition, & prepositional phrase 
  • Showed evidence all categories 
  • Children are sensitive very early in life to abstract, formal properties of the speech / syntactic knowledge at an earlier point  
  1. Semantic viewpoint: treatments of semantic roles presuppose a phrasal segmentation of the sentence (e.g., NP -> agent) 
  2. Syntactic viewpoint: most treatments of grammatical relations define such relations over syntactic categories. 
  3. Developmental viewpoint: category acquisition puts temporal constraints on theories of how syntactic knowledge is acquired. 


The present study seeks a clearer answer by synthesizing 
different features of previous research methods. The scope is 
six syntactic categories—Determiner, Adjective, Noun, Noun 
Phrase. Preposition, and Prepositional Phrase—which are used 
in most descriptions of the adult language. The method involves 
(a) the development of category criteria against which chil- 
dren's spontaneous production can be evaluated and (b) the 
comparison of the children's performance to the criteria. Con- 
tnbuting to the method are Brown's distributional analysis of 
child corpora (e.g., Brown & BelluÉ, 1964), Bloom's "rich in- 
terpretation" ( 1970), and Chomsky's work on the limits oftaxo- 
nomic 
•s(1975).

  • Subjects: 6 children (2;0-2;5), MLU 2.93-4.14, utterances 52-689 
  • Recording & transcribing: tape-recorded & transcribed by the observer 

    • Utterances: by intonation and syntax 
  • Preliminary category assignment: the child's error of not knowing which category a word belonged to would be misdescribed as the child's having deficient understanding of how adjectives pattern.  
  • Procedures used to test category assignment 

    • What expressions another expression can precede and follow 
    • Use of the single-word or single-expression "substitutability" test 
    • Multiple-appearance test: a category should show up in all its existing syntactic variations in each location where it is allowed 
    • Subcategories method: the main method used here is the restriction of different words to different subclasses (e.g., the restriction of a to singular Ns.)  
  • Specific category criteria