Cheonkam's Deep Learning Space

Thursday, January 19, 2023

[Book Summary - CtDSI] Cracking the Data Science Interview Ch. 1

What is Data Science?
- definition: deriving insights from data. these insights are then used to provide value to a business
- main factors
  - explosion of data (like MapReduce)
  - technological advances (like GPUs)
  - success stories
Data Science ≠ Machine Learning
- Machine learning is based on the idea that algorithms can identify and learn patterns from data and make decisions with minimal human intervention
- two diciplines that provide suitable tools to help solve data science problems
  - Operations Research: problem-solving techniques ranging from mathematical modeling and stochastic processes to mathematical optimization to simulation in order to improve decision-making
  - Statistics: for collecting, analyzing, interpreting and presenting empirical data
- The most important for a data scientist to have a solid foundation in exploratory data analysis, data visualization, probability and statistics, optimization, mathematical modeling and computer science!
What Makes a Good Data Scientist?
- identify the relevant questions
- acquire and clean the right data
- analyze that data to obtain results
- clearly communicate the findings
- conver the results into solutions
The Data Science Process Workflow
- Specify Objective ⇒ Data Acquisition ⇒ Explore the Data ⇒ Establish a Baseline ⇒ Model the Data ⇒ Analyze Results ⇒ Communicate Findings ⇒ Iterate
Data Science Deliverables: Prediction, Forecasts, Anomaly Detection, Recognition, Optimization, Segmentation, Recommendations, etc.
Writing a Great Data Science Resume
- One Page
- Relevant Coursework
- Relevant Skills
- Relevants Projects: try not to include the common projects everyone has worked on
- Relevant Experience: relevant experience and employment history with impactful bullet points
- Include Accomplishments (put NUMBERS into your resume!): specific impact, business impact, competition ranking etc.
- Customize: customize your resume to specific jobs
Data Science Interview Topics
- Probability & Statistics: Conditional probability (Bayes’ Theorem), Probability Distributions, Hypothesis Testing, Covariance and correlation
- Computer Science: Coding (Python or R), Data Structures (Lists, Hash Tables, Stacks, Queues, Treesm Graphs), Algorithms (searching, sorting, graph traversals), Databases (SQL, NoSQL), Distributed Computing (MapReduce, Spark, Hadoop)
- Machine Learning: Supervised Learning, Unsupervised Learning, Deep Learning, General Predictive Modeling (choosing the right evaluation metrics, train and test sets, cross-validation)
- Data Engieering: Data Wrangling, Cleaning and Visualization, Feature Engineering
- Doman Knowledge: depneding on the company and industry
- Behavioral and “Fit” Questions
Data Science Interview Process
- Coding Challenge ⇒ HR Screen (asking you behavioral questions), Technical Screen (questions ranging from computer science to machine learning to statistics) ⇒ Take Home Project (testing your coding, analytical, and communication skills - Be aware of the target audience!) ⇒ Onsite ⇒ Offer & Negotiation
Behavioral & Fit Questions: Teamwork, Ability to Adapt, Communication using the STAR (Situation, Task, Action, Result) method

Sunday, January 8, 2023

[Paper Review - Psycholinguistics] Syntactic Categories in the Speech of Young Children (Valian, 1986)

Syntactic categories in the speech of young children

Valian (1986)

2;0, 2;5 MLU 2.93-4.14

Determiner, adjective, noun, noun phrase, preposition, & prepositional phrase
Showed evidence all categories
Children are sensitive very early in life to abstract, formal properties of the speech / syntactic knowledge at an earlier point

Semantic viewpoint: treatments of semantic roles presuppose a phrasal segmentation of the sentence (e.g., NP -> agent)
Syntactic viewpoint: most treatments of grammatical relations define such relations over syntactic categories.
Developmental viewpoint: category acquisition puts temporal constraints on theories of how syntactic knowledge is acquired.

The present study seeks a clearer answer by synthesizing
different features of previous research methods. The scope is
six syntactic categories—Determiner, Adjective, Noun, Noun
Phrase. Preposition, and Prepositional Phrase—which are used
in most descriptions of the adult language. The method involves
(a) the development of category criteria against which chil-
dren's spontaneous production can be evaluated and (b) the
comparison of the children's performance to the criteria. Con-
tnbuting to the method are Brown's distributional analysis of
child corpora (e.g., Brown & BelluÉ, 1964), Bloom's "rich in-
terpretation" ( 1970), and Chomsky's work on the limits oftaxo-
nomic
•s(1975).

Subjects: 6 children (2;0-2;5), MLU 2.93-4.14, utterances 52-689
Recording & transcribing: tape-recorded & transcribed by the observer

Utterances: by intonation and syntax

Preliminary category assignment: the child's error of not knowing which category a word belonged to would be misdescribed as the child's having deficient understanding of how adjectives pattern.
Procedures used to test category assignment

What expressions another expression can precede and follow
Use of the single-word or single-expression "substitutability" test
Multiple-appearance test: a category should show up in all its existing syntactic variations in each location where it is allowed
Subcategories method: the main method used here is the restriction of different words to different subclasses (e.g., the restriction of a to singular Ns.)

Specific category criteria