刚出炉的 DS 脸书店面跪经

刚刚毫无准备,裸面啦FaceBook data scientist的视频面试, 挂的彻彻底底。




45 min SQL+ABtesting+product


table1 : date, receiver_id, caller_id, duration, country

Q1 for the user who firstly use the new feature on 2018-07-01, how many of them will use it again after X days?

(* no matter they are caller or receiver, they all count as active users)

expected output:

Day UserCount:.

0 99999

1 65789

2 35689


99 123

Q2 This new feature is only available to specific users for testing, and for each day the features exposed to the users as in the following table:

users wh have access to this feature might not use it at all.

(* receiver_id and caller_id in the upper table are all user_id)

table 2: date, user_id

question: if a user use this function for at least 3 seconds for one time a day, then we call this users as an active user of this day.

what’s the percentage of active users among all the users who have access to the feature for each day?


1 如何选择ABtesting的样本, 考虑哪些因素选择test group 和 control group, 做 random sampling吗

2 如果我们选择要不要推出一个产品, 做ABtesting的话会选择哪些metrics

3 如两组group有显著差异,那么我会考了哪些因素