fb_QUANTITATIVE ANALYSIS.docx
QUANTITATIVE ANALYSIS This interview is designed to evaluate quantitative reasoning and applied statistics.?Quantitative reasoning tests knowledge of relevant mathematical/probabilistic/statistical concepts and how they relate to Facebook products.?Applied statistics tests?problems drawn from real-world data or estimation. ? Scope: Estimation and logical reasoning in the context of a real-world product. Elements of descriptive statistics (mean/expected value,median,mode,percentiles). Common distributions such as binomial or normal distributions. What does real-world data typically look like? Law of Large Numbers,Central Limit Theorem,Linear Regression. Conditional probabilities,including Bayes‘ Theorem. ? Sample Question: What do you think the distribution of time spent per day on Facebook looks like? What metrics would you use to describe that distribution? ? What?won’t be covered:?Advanced stats/math concepts: calculus or advanced statistical/ML models; more complex distributions like the exponential,Weibull,Beta,etc.; brainteasers or contrived estimation problems (e.g. how many golf balls fit in a 747). ?
1.)一条ads 是好评的概率 P(G) = P(G|L)P(L)+P(G|C)P(C) = 1*0.2+0.6*0.8 = 0.68 2.) 100个ads里number of 好评的expectation。 E(x=100) = 100*0.68 = 68 3.)有五个ads都是好评,是lazy的概率。 P(L|5G) = P(5G|L)P(L)/( P(5G|C)P(C)+ P(5G|L)P(L)) = 1*0.2/(0.6^5*0.8 + 1*0.2) =0.837 ?
Variance is a way to measure the spread of data around the mean. It summarizes how close each data point is to the mean value. With a small spread all data are very close to the mean,resulting in a small variance. ?
The distribution of number of postings per user per day should be right skewed,because the majority of users would be passive users,they probably view a lot but are unlikely to post or just post occasionally. There would be a small proportion of active users who post every day and the majority of this proportion would be business account. There would be much smaller group of users who create multiples postings every day. Median and mode should be 0,mean would be 1 due to the outliers. ?
The distribution of number of comments per user per day should be right skewed,they probably view a lot postings but are less likely to comment or just comment occasionally,like birthday or special event. There would be a small proportion of users who comment every day and the majority of this proportion would be very active users like teenagers. There would be much smaller group of users who create a lot comments every day. Median and mode should be 0,mean would be 1 due to the outliers. ?
P(A|B) = P(A&B)/P(B) = P(B|A)P(A)/P(B) ?
Approach a: let X denotes the number of ads,the probability distribution of X is binomial distribution. X~n(n,p) E(X) = 100*0.05 = 5 σ2 = Var(X)? = np*(1-p) = 100*0.05*(1-0.05) = 4.75 Approach b: E(X) = 100/20 = 5 approach a应该符合二项分布,p=0.05,q = 1-p = 0.95.
4) 每25个post出现一个ad 或4%的概率出现 哪种好? ?
1)??? If I were to pick one at random,what is the probability that it is occupied? P(A1&B1) + P(A1&B0|A0B1) = 1/3 + (1/2)*(1/3) = 1/2 2)??? follow up: If it turns out that that first one I go to is occupied and I decide to try the other one,what is the probability that the second one is also occupied??? P(S1|F1) = P(S1&F1)/P(F1) = (1/3)/(1/2) = 2/3 ?
Based on the data collected,we can calculate the test statistic,if the probability of finding the test statistic or more extreme value is pretty small,that mean it‘s very unlikely to occur given null hypothesis is true. In this case,the null hypothesis can be rejected.
?
A confidence interval estimates are intervals within which the parameter is expected to fall,with a certain degree of confidence. 95% confidence interval means if we repeat the experiment many times,95% times interval in fact contain the true value of the parameter. With the same confidence interval,the wide CI means it’s more likely that the interval will contain the null hypothesis value that means it’s less likely to reject the null hypothesis when it’s false. In another word,wider CI has high the type 2 error and low power. Also,the wide CI means less precise estimates of effects. Variability 还有就是,小哥问了怎么可以确认metric的变动是由于某某因素导致的 Designed experiment is a method of applying treatments to a group and recording the effects,it’s used to show causality by randomly assigning control and experiment group,and then make comparison. Random assignment makes it unlikely that the samples who have something in common will end up in the same group,that means it creates roughly similar groups by approximately balancing potentially confounding variables between the two groups. Since we already ran the designed experiment and got significant result,if we can confirm randomized controlled experiments (this is the assumption for ab test)/the random assignment and passed the sanity check,we should be sure the effect is due to the new feature. ?
K-means 第二道类似,给你两袋子硬币,有每袋里面各个硬币的直径,让你判断这两袋是不是一个厂生产。 如果histogram差不多normal的话可以考虑用two sample t test. 如果不normal的话(虽然不太可能)应该就是求empirical distribution之间的距离,用ks或者那几个nonparametric的方法做吧 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |