和地里的面经大同小异,一上来先相互介绍一下自己,然后说一下为什么想来FB
介绍完就开始介绍今天的大致背景,SQL和product题目都是围绕ad4ad这个平台。简单来说ad4ad就是FB会找到在FB post里面卖东西的一些人,然后联系这些人给他们创造一个广告空间,在给这些user创造广告空间的同时FB会有potential loss或者cost because FB could spent money on other advertisers, instead of targeting potential advertisers
介绍完背景就开始SQL了,两个table如下:
table a: date|event(’‘impression’’,’‘click’’,’‘create_ad’’)|user_id|item_id|cost|spend
table b: user_id|country|date
table a介绍:event有三个值,其中impression就是FB为user创造的广告space,click代表user点进去了,create_ad代表user点进去之后正式create了ad。然后cost就是FB为这个user的这个item创造广告space所花费的cost,spend就是在user create_ad之后pay给FB的,所以只有create_ad那一行的spend不是null
第一题:number of users that FB has invited in the last 30 days for each country
不难,只需要a left join b 然后group by country就可以
第二题:avg number of impressions per user per item before user creates ad
这道题稍微有点绕,但是仔细想一下也就想出来。先写个subquery,需要将table a自己join一下,然后找同时是impression和create_ad的record,最后group by user,item;写完subquery再用一个AVG就可以了。做完之后有个follow up question,问avg 有什么不好的地方
product:list few metrics related to ad4ad, why these metrics. If one metric goes down, what is the reason?
能回忆起来的差不多就这些了,希望能对大家有帮助,也希望自己onsite能好好发挥
补充内容 (2018-11-1 02:31):
再说一下关于product的问题,如果一个metric go down了,大家不仅仅要break down the metric from user perspective(e.g. country, device, etc), 同时要想到这个metric是怎么算的,denominator和numerator是什么
select avg(num_impression) as avg_impression
(select user_id, item_id, sum(case when event=’‘impression’’ then 1 else 0 end) as num_impression
from table a
where user_id in
(select distinct user_id from table a where event = ‘‘create_ad’’)
group by user_id, item_id) temp;
SELECT a.country, COUNT(DISTINCT a.user_id) AS num_users
FROM a LEFT JOIN b
ON a.user_id = b.user_id AND DATEDIFF(a.date, CURDATE()) <= 30
GROUP BY a.country
SELECT AVG(sub.num_impression) AS avg_imp
FROM (SELECT a.user_id, a.item_id, SUM(CASE WHEN a.event = ‘‘impression’’ AND b.event IS NOT NULL THEN 1 ELSE 0 END AS) num_impression
FROM table_a a JOIN table_a b
ON a.user_id = b.user_id AND a.item_id = b.item_id AND a.event = ‘‘impression’’ AND b.event = ‘‘create_ad’’
WHERE a.date < b.date
GROUP BY a.user_id, a.item_id) sub
大致就是这样的,然后avg就说不够robust to outliers,可以考虑用median
第一题:
以tablea为基准,left join b来得到country。
select b.country, count(distinct a.use_id)
from tablea a left join tableb b on a.user_id = b.user_id
where a.date >= date_sub(current_Date(), 30) group by b.country;
第二题:
我就count了有CREATE AD record的 UserID&ItemId pair 的Impression record数量。
select avg(sub.cnt) from
(select user_id, item_id, count(case when event = ‘‘impression’’ then 1 else 0 end) as cnt from tablea where concat(user_id, item_id) in (select concat(t.user_id, t.item_id) from tablea t where t.event = ‘create_ad’)) sub;