讨论一道脸书的SQL面经

来自隔壁贴FB的SQL面经

听歌问题

table1: time|userid|songid|
talbe2: userid1|userid2

SQL第二问是要求找出含有pairs of friends的list, 里面的每对朋友都有两首以上共同听过的歌。我试着做了一下,我希望其实是user1, user2的两个column的结果,我这里用了concat, 不是很理想,但想不出来更好的。抛砖引玉,请大家多多post更好的解法。

select
concat(userid1, userid2) as friend_pairs
count() as song_ct
from
(select
distinct
userid1, userid2, b.songid as song1, c.songid as song2
from table2 a
left outer join table1 b on a.userid1 = b.userid
left outer join table1 c on a.userid2 = c.userid) a
where a.song1 = a.song2 and a.song1 is not null and a.song2 is not null
group by concat(userid1, userid2)
having count(
) >= 2

WITH CTE(userid1, userid2, songid)
AS(
SELECT t2.userid1, t2.userid2, t3.songid
FROM table2 as t2
LEFT JOIN table1 as t1
ON t2.userid1 = t1.userid
INNER JOIN table1 as t3
ON t2.userid2 = t3.userid AND t1.songid = t3.songid
)
SELECT userid1, userid2
FROM CTE
GROUPBY userid1, userid2
HAVING COUNT(songid) >= 2

it depends on if we have duplicate pairs in table2, like A B, BA. if we only have unique pairs in table2. My strategy is to left join table1 to table2 to have songid for each userid1, then inner join table1 again to find out the same song userid1 and userid2 both listen to, finally group by userid1, userid2, having count(songid) >=2. you will get the pairs as required.

为啥都是用left join 我觉得能用inner join 啊

Select b.userid1, b.userid2, count(distinct songid) as song_cnt
from
(From table1 a
Inner join table2 b
On a.userid = b.userid1
Inner join table1 c
On b.userid2 = c.userid and a.songid = c.songid)
Group by b.userid1, b.userid2
Having song_cnt > 2

table1 self join 然后count 可不可以呢

有点长,大家帮看看 可不可以work. 多谢

SELECT
b.Userid1,
b.Userid2
FROM
(SELECT
b.Userid1,
b.Userid2,
COUNT(DISTINCT a.songid) AS num_song
FROM
(SELECT
b.Userid1,
b.Userid2,
A.songid,
d.songid
FROM
(SELECT
b.Userid1,
b. Userid2,
a. songid
FROM table 2 b
LEFT JOIN table 1 a
ON b.userid1 = a.userid) c
LEFT JOIN table 1 d
On c.userid2 = d.userid)
WHERE a.songid = d.songid
GROUP BY b.Userid1, b.Userid2
HAVING num_song >= 2)

这样可以吗? 当然我的列表里 AB 跟 BA 都列出来了,如果要diduplicate, 得再左右Self Join一下,然后Sum 除以2
Select F.userid1 as userid, F.userid2 as friendid, SUM(IF(UserA.songid=UserB.songid, 1,0)) as songs
From
Table2 F
Left Join
(Select userid, songid from table1Group by 1,2) UserA
On F.userid1=UserA.userid
Left Join
(Select userid, songif from table1 Group By 1,2) UserB
On F.userid2=UserB AND UserA.songid=UserB.songid
Group By 1
Having songs>=2

思路:先把userid2 LEFT JOIN到table1,然后再INNER JOIN table1, 这样就得出来俩人听歌一样的table,再count

SELECT t1.userid AS userid, new.freindid AS friendid, COUNT(t1.songid) AS songcount
FROM (SELECT t1.userid AS userid, t2.userid2 AS friendid, t1.songid AS songid
FROM table1 t1 LEFT JOIN table2 t2 on t1.userid = t2.userid1) new, t1
WHERE t1.userid = new.friendid AND t1.songid = new.songid
GROUP BY userid, friendid

楼主请教一下,你的solution里两次left join之后,where clause里的and a.song1 is not null and a.song2 is not null部分是必须的吗,where a.song1 = a.song2这个条件不足以exclude null?对null经常有些困惑。。。

my solution is:

WITH s as(
SELECT t2.userid1, t2.userid2, t11.songid as songid1
FROM table2 t2 LEFT JOIN
table1 t11
ON t2.userid1 = t11.userid LEFT JOIN
table1 t12
ON t2.userid2 = t12.userid
WHERE t11.songid = t12.songid
)

SELECT s.userid1, s.userid2
FROM s
GROUP BY s.userid1, s.userid2
HAVING COUNT(s.songid1) > 2

同意ls的思路,需要两次join,第一次songtable join friends table找到每个userid的friends,第二次刚刚join成的table join songtable找到friends的song list,然后限定条件为userid1和userid2(也就是friend)所听歌曲一样,在group by 和having 限定,具体code和test table 见下面。

CREATE TABLE songlist
(TIME datetime,
USERID integer,
SONGID integer)

INSERT INTO songlist VALUES
(‘‘2018-06-01’’, 1, 101),
(‘‘2018-06-01’’, 1, 102),
(‘‘2018-06-02’’, 1, 103),
(‘‘2018-06-03’’, 2, 104),
(‘‘2018-06-01’’, 3, 101),
(‘‘2018-06-01’’, 4, 102),
(‘‘2018-06-01’’, 4, 103),
(‘‘2018-06-01’’, 5, 101),
(‘‘2018-06-01’’, 5, 102),
(‘‘2018-06-01’’, 5, 103),
(‘‘2018-06-01’’, 5, 104)

CREATE TABLE friends
(USERID1 integer,
USERID2 integer)

INSERT INTO friends VALUES
(1,2),(2,3),(1,3),(2,4),(1,4),(1,5),(2,5),(3,5)

SELECT s1.userid, s2.userid, COUNT(s1.songid)
FROM songlist s1 JOIN friends f JOIN songlist s2
ON s1.userid = f.userid1
AND f.userid2 = s2.userid
WHERE s1.songid = s2.songid
GROUP BY s1.userid, s2.userid
HAVING COUNT(s1.songid) >= 2;

null和null是不相等的,所以我同意你的想法,a.song1=a.song2会自动exclude null。