LinkedIn Data Engineer Interview Question

// Question2:
/* Write SQL considering scalability to find the count of New Unique Users logging in by Date?
Assume that this is history data and size is around 1000 GB.

Traffic Data

date userId. activity
2017-01-01 u1 ‘login’
2017-01-01 u2 ‘jobs’
2017-01-01 u3 ‘groups’
2017-01-01 u4 ‘homepage’
2017-01-01 u5 ‘homepage’
2017-01-01 u1 ‘homepage’
2017-01-02 u1 ‘jobs’
2017-01-02 u6 ‘groups’
2017-01-02 u7 ‘groups’
2017-01-02 u2 ‘login’
2017-01-03 u2 ‘logout’
2017-01-03 u8 ‘groups’
2017-01-03 u9 ‘jobs’
2017-01-03 u10 ‘logout’
*/

// Answer 2: # e.g. within 90 days count

Hive QL

with a as (
select userid, min(date) as min_date
from traffic
where activity = ‘login’
group by userid
)

select date, count(distinct userId)
from traffic u left join a on u.userid = a.userid
where u.date = a.min_date
and to_date(date)>= date_sub(current_date, 90)
group by date