① 1.22,我开始按着 709 的 notebook 做 content filtering。最简单的就是用线性回归,用文章和用户的特征预测 utility。类别型的特征用 One-Hot 编码,导致测试集的 size 爆炸,经常搞垮 AWS server。709 疯狂吐槽 AWS 辣鸡,还说 econ 的人遇到这种情况可能就傻眼了,所以 lab 需要 CS 同学。
② 709 说他周末加班把这个做好,我当时还想天哪我好不容易找到一个可以做的事情,又要重新找课题了,结果周一(1.25)发现他并没有做 lol。线性回归、决策树的问题是,所有特征都是独立的变量(z = ax + by + c),特征之间没有互动(比如用点积比较相似度:z = xTy)。于是后来 709 还是改用矩阵分解做。
③ 1.26,709 发给我一篇 3 年前 Susan 写的 paper(1801.07826),让我写写读后感,于是 1.27 我写了篇 500 多词的短文,从 3 个角度分析了 S2M 可以从 TTFM 借鉴的 idea:模型、数据处理、评估方法。那是我入 lab 之后最有成就感的一天,感觉开辟了很多有趣的、能出成果的方向。
④ 1.28,我的任务是给 709 的代码添加一个模块,本来是比较简单的任务,不过中途出了 bug,我需要看他的源代码。他的代码写得比较“随性”,我读起来效率很低,我甚至都在搜“代码转文本”的 paper 了(可以作为 224n 备选 Project)。
⑤ 想起 D5P320-200527:“ SVL 测试要求里强调代码质量大于 performance ”,我更深地理解了写码习惯对团队合作的重要性;另外,142 的 Project 需要跑 JShint,让我想起了以前 3251 要跑 linter 来规范 C++ 代码的格式。
⑥ 2.2,710 和 709 向 S2M 团队提出了线上测试方案:分为原系统、新模型 1、新模型 2,共 3 组,互相对照,当然两个新模型之间的对照可能不会很明显。讨论了可能的问题(如果出了大 bug 怎么止损)。
⑦ 2.4 是第一次和 Susan 的 1 对 1。709 建议我更应该征求她对我总体职业规划的意见,而不是对目前项目的反馈。我不同意他的观点,认为不需要 Susan 给我职业建议;反之,她对我眼下工作的建议是对我个人成长最有用的。
⑧ 开会前,我花 1.5 小时写了算是“一封信”,融合了我对入 lab 以来遇到的问题和可能的解决办法的思考。主要是说我现在主要在改 709 的代码,效率不高,也没有太多实际的进展,所以想开辟一个自己的项目,问问 1.27 的那些思路可不可行。里面提到的问题我是和工作了的朋友们讨论过的,是合理的;The PhD Grind 也提过有点类似的问题。原文如下:
⑨ Hi Susan ! I know your time is valuable, so thank you for taking the time to provide individual feedback for me on the S2M project.
⑩ After I did the visualization tool for Path Analysis in December, since January I’ve been mostly working on tasks such as “ help 709 figure out why something is not working, ” or “ try to change something in 709’s code and see the results. ”
⑪ I feel that it’s not very efficient, especially when his code is a work in progress, or when I need to understand a lot of legacy code. In addition, this working style means that I often get assigned disparate tasks on a daily basis, and I usually don’t have a clear picture of what to work on beyond a few days.
⑫ Now that I have joined for nearly two months, I have begun to develop some project ideas of my own. At some point in the future, I may want to start a relatively independent project, instead of letting 709 assign tasks to me.
⑬ For example, a week ago I read your TTFM paper, and wrote about how its ideas can be applied to the S2M project. I and 709 agree that the most important takeaway is to include time aspects in the output. In other words, predict not only a user’s utility with an item, but this utility at a specific time stamp.
⑭ Another idea is to use more advanced deep learning methods to replace our current models. I can try out Wide&Deep or other neural-network based methods, and write a full report about the model, result, and analysis, similar to doing a final project of a CS course.
⑮ Some of my friends at big tech companies like Facebook agree that they need to come up with individual projects, and take full responsibility for them. For example, design a new Facebook feature, and implement, test, and deploy it.
⑯ Of course, I would be happy to help 709 if he needs a hand on something, but I feel that the team may be more productive overall if I do a relatively self-contained project.
⑰ If I start my own project, there are also some challenges.
⑱ (1) It’s difficult to know what others have done before. I know 709 and Rina have already done extensive work, so I don’t want to do duplicate tasks. However, I look at the documents, but it’s often difficult to navigate and understand.
⑲ (2) It might be difficult to get your feedback in group meetings, since 709 and 710 already need extensive feedback on their work.
⑳ Therefore, I would like to know your thoughts about whether I should propose my own project, and if so, what are some good tasks I can work on that fit into the larger goals.
㉑ Susan 只花了 45 秒就读完了,然后认为我的提议超出了经费支持的范畴。她提到以往 GSB 会给她一些钱,让她可以支持学生进行一些自由的探索(比如延申 TTFM 等以前的课题),但现在她的经费来源只有外部了。
㉒ 之后我把 Susan 的话转达给 709 和 710。他们给了我很大支持,说以后会尽量为我整一个让我做起来更有成就感的项目。在 2.4 之前我每天都在积极思考我应该把这个 project 往什么方向推动,那之后就没再管了,而是主要花时间思考 224n 的 project 怎么做。