前两天提到了大家对DA/DS存在的误区,分享了想申请DA/DS岗位,要从哪些方面去准备的部分要素。
Recap一下,Data Scientist/Data Analyst 通常需要集中准备的分为以下几块内容:
- Machine Learning
- 统计,概率与 A/B testing
- Online coding(Python + R)
- SQL
- Product sense
- Project
- Extra Skills
今天将继续分享,SQL, Product Sense, Project, Extra Skills 该如何准备!
四、SQL
- 常见面试问题
- What is the difference between union and union all? where and having?
- Table【in_app_purchase】:
uid: unique user id.
timestamp: specific timestamp detailed to seconds. purchase amount: the amount of a one-time purchase.
This is a table containing in-app purchase data. A certain user could have multiple purchases on the same day
Question 1: List out the top 3 names of the users who have the most purchase amount on ‘2018-01-01’
Question 2: Sort the table by timestamp for each user. Create a new column named “cum amount” which calculates the cumulative amount of a certain user of purchase on the same day.
Question 3: For each day, calculate the growth rate of purchase amount compared to the previous day. if no result for a previous day, show ‘Null’.
Question 4: For each day, calculate a 30day rolling average purchase amount.
- Table【Friending】
time = timestamp of the action
date = human-readable timestamp, i.e, 20108-01-01
action = {‘send’, ‘accept’}
actor_id = uid of the person pressing the button to take the action target_id = uid of another person who is involved in the action.
Question: what was the friend request acceptance rate for requests sent out on 2018-01-01?
- 题目二涵盖了简单的 aggregate 问题,cumulative 问题,rolling window 问题等 等。搞定这些,其他的都只是一些简单变形。
题目三涵盖了 self-join,并且有一些 tricky 的大于等于号的应用,有兴趣可以在地 里查一下 Facebook 面经的解答。
其他的题目无非是多了一些 table,join 麻烦一些或者加了一些 case when,难度 都不会有太大的变化。做好几个经典题,然后自己整理好就可以以不变应万变了。
- 相关资料准备
- 扫盲网站:SQL ZOO 和 W3schools,非常实用,适合翻阅。
- 两个 Udemy 的 SQL 课:SQL - MySQL for Data Analytics and Business Intelligence 和 The Ultimate MySQL Bootcamp
- 刷题的话,Leetcode上有一些题,可以做一下。还有好心人直接做了个整理,在这里: summary of sql in leetcode。
- Hackerrank 上的题自然是要全刷光的,因为难度非常简单,快的话一两天也许就做完了。
- DataCamp:
Data Gathering- Why API Medium What Is an API and Why Should I Use One? | by Tyler Elliot Bettilyon | Medium
Intro to SQL https://lnkd.in/giWs-3N
Complete SQL Bootcamp https://lnkd.in/gsgf_fF
Data Visualization Medium The 7 Kinds of Data Visualization People | by Elijah Meeks | Nightingale | Medium - 更多的网站:18 best sql online learning resources
- 建议自己下载一个 My SQL 装到电脑上,模拟真实的 SQL 环境来学习。Mysql 里关于Windows function 和 frame clause 的教程: Windows function ,Frame Clause。这个非常重要,windows function 可以说是 SQL 面试里的大杀器,非常节省时间而且思 路清晰。
- 建议也学会用 WITH common_table_expression。可以让你的 SQL 看起来非常整洁和容易理解。
- 最最重要的来了。如果你觉得刷完题或者学完以上的内容就万事大吉了,那还真的不 是。我一开始也有这样的误区。实际上刷完 Hackerank 也并不能帮你很快的做出我给的例题。而其实,对于 metrics 或者 product 的了解能够帮助你很好的准备 SQL 面 试,因为所有的 SQL 面试都是围绕着与 business 相关的 metrics而展开的。举例而言,游戏公司一定会考 DAU(daily active user)或者 purchase rate, Facebook 就会是 friend request 相关的,以此类推。所以熟悉你申请公司的业务再针对性准备 SQL,一定会事半功倍。
五、Product sense
- 常见面试问题
- Today you immediately notice that our app’s new users are doubled. What could be the reason? Do you think it’s good or not?
- If we have an app with in-app purchase, name at least 4 metrics you would like to monitor in your dashboard.
- If you are running an A/B testing and find that the result is very positive, thus you decide to launch it. In the first 2 weeks, the performance of our website is very positive./However, with time flying by, all metrics seem to go back to normal. How will you explain this result?
- Assume we are Facebook and we would like to add a new ‘love’ button. should we do this?
- We are running 30 tests at the same time, trying different versions of our home page. In only one case test wins against the old home page. P-value is 0.04. Would you make the change?
- If after running an A/B testing you find the fact that the desired metric(i.e, Click Through Rate) is going up while another metric is decreasing(i.e., Clicks). How would you make a decision?
- Assume that you are assigned to estimate the LTV(lifetime value) of our game app player. what kind of metrics would you like to calculate so as to make a good prediction?/Assume that you already collect all that you want. How would you make this prediction/estimation?
- If you got a chance to add on new features for our app to increase our profit within a very short term. What will you do?
大多是围绕着 metrics 和如何提高 product performance 来展开的。说实话这些对于 new grads 来说非常不友好因为没有工作经验。
- 相关资料准备
- Product school: https://www.productschool.com/。 貌似是各个湾区的 tech 公司的 DS 或 PM 大佬们来分享一些案例和学习经验。
- Metrics:The 19 Metrics Every Mobile Game Needs to Track | Behavioral Data Analysis and Visualization | CoolaDataGaming Analytics
一个关于 Game metrics 的汇总 - Critical metrics every product manager must track (Critical Metrics Every Product Manager Must Track | by Evgeny Lazarenko | Product Coalition)
- A/B testing:A/B 测试中 20 个必须知道的问题
- A Collection of Data Science Takehome Challenges: 可以让你有机会解决很多实际的 DS 问题,并且也和 product 有关。
一些个人想法:
大多的产品问题都是围绕着产品的 metrics 或运营中遇到的问题展开的。如果针对互联网 行业来说, 一个典型的产品要从推出后经历以下阶段:user acquisition → user engagment / retention → monitization。
讲一个异曲同工的事件来帮助理解吧。了解头条系公司的都知道,公司内有一个流水线作 业的产品工厂模式。他们只有三个最核心的职能部门,技术,user growth 和商业化。技术是保障了整体的运营,所以在整体的产品很稳定之后,就要看一下用户的留存率。如果 发现用户的留存率(retention)或使用率(engagement)非常高,则进入 user growth 的 推动环节,大批量的 marketing 来拉动新的用户增长(user acquisition)。最后把商业化 (广告等)的内容接入产品,迅速变现。虽然顺序和我说的不太一样,但是实际上就是在 在这三个部分对一个产品进行不断的迭代。所以 metrics 也离不开这三个部分。user acquisition 讲求新用户注册率,user engagement 讲求 DAU(日活量),monitization 讲 求 LTV 和 ARPDAU(Average revenue per daily active user)等等。逐渐加深对互联网 产品的理解,就能更好的应对 metrics 的问题。面试中 metrics 的问题,本质上一切都是为了产品迭代和用户增长以及变现。
关于产品中被问到的 A/B testing 的问题,很多是面试官为了考察你是否能针对特 定的情况来分析 A/B testing 的结果。只看 P-value 是非常学生的思维,具体情况中的 A/B testing 是要从一开始的 new feature 的想法到后来设计整体的实验再到分析结果再到最后给建议的一整个流程。面试官想要的是一个批判思维,需要你对每一个步骤都扣的非常细 致,以此来确保实验和你的结论之间没有断层。
六、 Project
- Projects/Competitions - Kaggle Kernels
https://www.kaggle.com/ - Problem Solving Challenges - HackerRank
HackerRank
七、ExtraSkills
- Communication - Data Storytelling
https://lnkd.in/gtiCSNT - Business Analytics- Geckoboard
How to Analyze Data: A Basic Guide | Geckoboard blog - NLP - How to solve 90% of NLP
https://lnkd.in/gh8bKe4 - Recommendation Systems - Spotify
How Does Spotify Know You So Well? | by Sophia Ciocca | Medium - Time Series Analysis - Complete Guide
https://lnkd.in/gFZU2Rb
[急招内推机会 - New Grads]
1. 全球机构投资公司。通过培养研究,创新和协作的文化,努力在所有市场环境中提供一致,互不相关的绝对回报。公司致力于产生金融与技术的交汇,将领先的投资组合和金融分析的深厚行业知识与软件工程和定量研究相结合。利用团队的集体专业知识来寻找新的投资机会,分析市场状况,最大程度地降低风险,并为投资伙伴提供优质的服务。公司已在世界各地拥有500多名员工,拥护一种思想自由交流,促进职业发展与世界一流的福利。
招聘 Entry Level [Investment Data Analyst]
全职起薪 $75000,Sponsor OPT/Ext/H1b
2. 南加品牌策划公司。将不同的世界和文化聚集在一起,在品牌和媒体创作者之间架起了桥梁。从中国到洛杉矶到迪拜再到意大利,通过整合各行各业的营销人员,公司已成为国际影响力营销行业的领导者,并拥有“雄心勃勃”的权利,以实现看似不可能的事情。我们照顾我们的创作者,我们相信共同成长,因为我们知道只有在出色的人带领下,品牌营销才能发挥最佳作用。
招聘 Entry Level [Tech Product Manager]
全职起薪 $80000,Sponsor OPT/Ext/H1b/GC
3. Silicon Valley从事风险投资,私募股权,投资银行和咨询行业的技术投资公司。主要在IPO前约1-4年投资于成长型公司和后期阶段的公司。团队包含顾问委员会,由一群杰出的成功企业家和风险资本家组成。加入公司将亲身接触风险资本,私募,投行和咨询行业,并通过研究和尽职调查评估市场潜力和投资机会。
招聘 Entry Level [Project Analyst]
全职起薪 $66000,Sponsor OPT/Ext/H1b
4. 美西湾区最大电子杂货商,通过与本地供应商合作,重新设计价值链并利用社会购买力,正在彻底改变杂货店的业务。目前在不同地区,不同类别和不同种族之间的增长速度都非常出色(同比增长6倍)。迄今为止,已经从DST,Blackstone,Tiger Global,Lightspeed Ventures,Goodwater Capital,XVC和iFly等主要投资者那里筹集了4亿美元以上的资金
招聘 Entry Level [Data Scientist]
全职起薪 $90000,Sponsor OPT/Ext/H1b/GC
5. AI金融服务创业公司,汽车财务平台,帮助人们优化拥有汽车的成本和体验。从帮助优化汽车保险成本开始,建立一个由机器学习驱动的个人化服务,寻求节省固定成本的方法,协商更好的利率和文书工作,以自动切换和节省资金。公司由连续创业企业家成立,曾建造并扩展了YourMechanic,截至目前已筹集了超过5000万美元的资金。
招聘 Entry Level [Data Analyst]
全职起薪 $72000,Sponsor OPT/Ext/H1b
[Job Descriptions/Requirements]
Investment Data Analyst
- Partnering with investors to respond to and address their data needs;
- Developing a keen understanding of how data is utilized in our investment processes to generate insights and ideas;
- Designing and implementing programmatic data accuracy, outlier detection, error correction and remediation processes;
- Evaluating new and differentiated data within the firm and helping strategically prioritize new data initiatives;
- Working closely with our Data Engineering teams and defining the on-boarding and production requirements for all new data;
- Must have a passion for data and experience in applying that passion to high quality data products;
- Must have the ability to perform data analysis and wrangle the data using Python and a strong understanding of time-series data, third party data vendors, and how they apply to quant and fundamental analysis;
- Prior experience with quantitative investors (either as a quant or vendor) is strongly preferred;
- Serve as an in-house expert on data, leveraging your knowledge of vendor and market data collection;
- Experience working in an agile environment and with development teams. Strong understanding of SQL and relational databases and familiar with AWS is a huge plus.
Tech Product Manager
- Monitor and analyze market trends;
- Study competitors’ services and products;
- Explore new ways of improving existing services and products;
- Provide product training and technical expertise;
- Identify and present innovative product solutions;
- Work with development leads so that product requirements are understood;
- Work with project management software;
- Work within a software development methodology like AGILE;
- Coordinate product releases with marketing, sales, and development teams;
- Answer product related inquiries.
Project Analyst
- Performing financial projections through the input and review of income, operating expenses, capital budgets;
- Evaluating potential investments with respect to the financial return on investment;
- Assisting in the preparation of preliminary investment summaries;
- Assisting with due diligence review and coordinating project closings;
- Experience in working with minimal direction from supervisor and take initiative to follow up on projects and /or assignments. Make decision and resolve problems with minimal supervisor, and exercise good judgment with priorities;
- Experience using a range of organizational and time management skills to coordinate and prioritize a diverse, complex workload and to meet competing deadlines in a fast paced environment with high attention to detail.
- Strong communication and interpersonal skills;
- One or more of the following is a plus: 1) outstanding skill in designing, editing and reviewing professional documents, etc.; 2) fluency in Chinese (translation to and from English required); 3) strong research (company research, market research, etc.) capabilities.
Data Scientist
- Apply advanced knowledge of SQL and the ability to build complex modeling features;
*Build machine learning models that leverage our unique data sources to recommend optimal product, offer, content, and information; - Build end-to-end infrastructure from exploring your data, designing, deploying, testing, to monitoring your own models;
- Help identify new opportunities by applying machine learning and statistical models for improved business outcomes;
- MS or Ph.D. or equivalent experience in a quantitative field such as computer or data science, math, statistics, or physics;
- Trackable experience in developing and deploying machine learning or deep learning models in a professional setting;
*Domains of expertise should include at least one of the following: collaborative filtering, content based recommender systems, link-click prediction, predictive customer targeting; - Strong communication skills, and ability to work with multiple stakeholders.
Data Analyst
- Owner of the core company data pipeline, responsible for scaling up data processing flow to meet the rapid data growth;
- Consistently evolve data model & data schema based on business and engineering needs;
- Implement systems tracking data quality and consistency;
- SQL and MapReduce job tuning to improve data processing performance;
- Proficient in SQL, especially with Postgres dialect preferred;
- Expertise in Python, BI software (preferably Metabase or Tableau), Hadoop preferred.
更多求职资料、更多New Grads Friendly内推机会,添加微信:Gary1988Oct 详细咨询~