# STAT 406 – Statistical Learning Course Reflection

All I can say is props to Professor Barrera for not only conveying SO MANY concepts to us clearly, but also making it interesting with his weird sense of humor. This is one of the courses that made me think that EVER SINGLE concept taught in this class is going to somehow benefit me or be used in the future.

Topics covered:

• Supervised and unsupervised learning
• K-fold cross validation
• Prediction models (linear, non-linear) and non-parametric models
• Variable selection: step-wise, sequencing, shrinkage
• LASSO, Ridge regression, Elastic net
• Smoothers (local regression, kernal, splines)
• Regression and classification trees
• K-nearest neighbors, QDA, LDA
• Logistic Regression
• Bagging
• Curse of Dimensionality
• Boosting
• Random Forests
• Neural Networks

# STAT 443 – Time Series and Forecasting Course Reflection

After finishing STAT 443, I feel that the topics covered in this course will be very useful in my career. The professor Natalia Nolde was a great and I felt that she genuinely cared for her students. We were required to submit our assignments as PDF versions of R Markdown. It was fun to learn and the results after knitting our code was super rewarding. My friend Shangeeth described our finished R Markdown assignments as works of art because they looked so professional and took a lot of hard work.

Topics covered in this course:

• Autocorrelation/Autocovariance and correlogram
• White noise / error
• Yule-Walker
• Stationarity
• Stochastic models including AR, MA, ARMA, ARIMA, SARIMA models
• Exponential smoothing
• Holt-Winters methods
• Box-Jenkins prediction approach
• Frequency domain
• Fourier transforms
• Spectral density
• Models for changing variance: GARCH processes

# STAT 344 – Sample Surveys Course Reflection

STAT 344 – Sample Surveys was actually a very enjoyable course for me. Unlike many theory-based statistics courses, stat 344 gives concrete and practical examples of how surveys are conducted. This gave me a feeling of accomplishment because I could get a sense of how what I was learning could be applied towards real-life situations. A common question that was asked on practice exams: we are given a table of data and we are asked to treat the data as a

a) stratified sample

b) panel study

c) aggregation of polls

d) cluster sample

and find their relative estimates and standard errors.

Some topics or concepts covered in this course (Off the top of my head):

• Recommending a sample size in order to satisfy an employer’s preferred accuracy level
• Bias (an example that helped me understand bias was “say you were sampling random people on the street and asking them the number of people in their household” but this is a biased way of sampling because larger households have a better chance of being approached by you)
• Ratio vs. Regression vs. “Vanilla” estimation
• Panel study (has a co-variance term)
• Stratified sampling
• One-stage cluster sampling (Simple random sample clusters, then sample everyone in selected cluster)
• Two-stage cluster sampling (Simple random sample of clusters, then another random sampling within the cluster)
• Aggregate polls (poll of polls)
• House-effects (τ)
• Weighted sampling
• Proportional/optimal allocation
• Cluster sampling with probability-proportional-to-size (this was tricky!)
• Non-responders
• 3 types of missing data – missing at random (MAR): the chance of participation varies with the helper variables but not with the variable of interest, missing completely at random (MCAR): the chance of participating is constant and does not depend of the variable of interest, non-ignorable missing (NMAR): chance of participation varies with the variable of interest and helper variables.

# STAT 305 – Statistical Inference Course Reflection

STAT 305 – Introduction to Statistical Inference was a pretty difficult course in my opinion. It was very theory based with not many concrete examples. My favorite unit was probably likelihood estimators. I felt as if I could just follow the same game-plan for most questions:

1) Find the likelihood function by taking the product of n probability density functions

2) Then log it to make it the log likelihood which is easier to proceed with

3) Take the first derivative of the log likelihood and equate it to zero and solve for parameter of interest to find the MLE

4) Take the second derivative of of the log likelihood and if it is <0 then it ensures that you are maximizing

5) Fisher information is -E(second derivative)

6) Variance estimate is just 1/(Fisher Info)

Some topics or concepts covered in this course (Off the top of my head):

• Moment Generating functions. First derivative gives E(Y) or mean while the second derivative gives E(Y²). Var(Y)=E(Y²)-E(Y)²
• Likelihood functions
• Maximum likelihood estimators (MLE’s)
• Bayesian prior/posterior
• Hessian matrix
• Fisher information
• Wilk’s and Pearson’s statistics
• Paired comparisons/comparing 2 multinomial distributions
• Hypothesis testing using Neyman Pearson Lemma. Significance level, power, and p-value.
• Pooled samples
• Categorical data with free parameters

# STAT 302 – Introduction to Probability Course Reflection

STAT 302 – Introduction to Probability

First off, I would just like to say that this was the hardest course I have ever taken. Statistics 302 – Introduction to Probability covered so much material and drew concepts from calculus 1,2,3 and STAT 200. I found myself studying for not only the course itself, but I also had to review integration, multi-variable calculus, and introductory statistical analysis techniques. I truly do think that the material I learned will be useful in the future. Often times I would relate what I was learning to various real life situations.

Just a short list (in no particular order) of what was covered in this course:

• Advanced combinatorics / permutations and combinations (probably one of the hardest chapters)
• Probability laws which was almost the same as set theory (union, intersection, partition, commutative associative distributive and DeMorgan’s laws, complement, subset, disjoint)
• Conditional probability (Baye’s formula, odds, independence of events, conditional independence)
• Discrete random variables (probability mass function, cumulative distribution function, expectation, variance/standard deviation)
• Common discrete random variables: Bernoulli, binomial, geometric, negative binomial, Poisson, hypergeometric
• Continuous random variables (probability density function, cumulative distribution function, gamma/uniform/normal/exponential distributions)
• Joint probability (this chapter was also really difficult and covered so much)
• Markov and Chebyshev’s inequality
• Moment generating functions

# First Impressions of STAT 302 – Probability

I’m going to start this entry off by saying this course is incredibly interesting, but is by far one of the hardest classes I have ever had to take.

The class started off simple. It felt like review of MATH 220 to me: the union, intersection, and complements of probabilities acted similar to those of sets. For example:
Sets: Let the set A={1,2,3} and B={3,4,5}

A∪B= A+B-(A∩B) = {1,2,3,4,5}

Probability: Let the probability of event A=1/3 and B=1/2

A∪B= A+B-(A∩B) = 1/3 + 1/2 – 1/3 = 1/2

The next section was Combinatorics: Counting, Permutations and Combinations. I remember learning about this in grade 12, but did not go in depth. The questions we were expected to be able to do in this course were extremely complicated and I still believe that this section is one of the most difficult ones in the whole textbook. It forces us to think critically and even creatively, as these questions usually have more than one way of solving it.

One of the questions on the FIRST assignment: A quiz consists of 10 true/false questions. A student decides that he will not answer FALSE for any two consecutive questions. In how many ways can he answer all 10 questions?

The question seems quite simple to begin with. As soon as I tried to solve it, it was as if the question’s difficulty was increasing at an exponential rate. A random classmate of mine and I discussed our strategy in solving it and his solution looked like:

So for this question, drawing out all the combinations is possible but not very efficient. There was also talk among other classmates that it followed a Fibonacci sequence. My thought process was that there must be less than 6 false answers in order for none to be consecutive, then using the nCr (n choose r) formula. But the solution is much more complicated than that, which I will not go into on my blog post.