tx

Data Mining Research

Goal#

At its core, the goal of this research internship is to develop an algorithm that can clean up a noisy, unreliable dataset. By "clean up", we mean remove outliers and impute (i.e. fill in) missing data in the dataset.

Task Summary#

The algorithm was already implemented by my PhD mentor (Yue Hu), and we aimed to apply it to a new dataset (the AoT dataset).

My tasks primarily consisted of:

  1. analyzing and visualizing the training dataset — matplotlib
  2. organize the dataset into the correct input format to feed into the algorithm — pandas
  3. run experiments under various input conditions — MATLAB
  4. summarize and visualize the experiment results — matplotlib
For more details, please see this Diary entry.

The Story  #

Click "" to read about my feelings about this internship, as well as what I learned from it.

Relevant Diary:
Technical Summary — D5P227;
Internship experience — D5P218, D5P222, D5P223

Sci-fi VS real research

This excerpt from D5P248 pointed out how I came into the lab with ambition to build smart cities, but was struck by the reality of how research works:

I had been interested in applying ML techniques to build smarter cities and transportation systems. However, after experiencing what research really is like, I find that sometimes, technologies described in sci-fi movies or documentaries aren’t being developed at all. My ambition before entering Prof. Work’s lab was gradually disillusioned as I worked on things that have little to do with my envisions of better transportation systems, i.e. AoT nodes.

Role in my career path

Up until grad school application, the Work Lab RAship was almost the only research or internship experience I had, and thus, it was crucial for my application—the 250 hours I spent in the lab possibly outweighed the 5,000 hours I spent on coursework during college life. In October 2019, as I wrote my Statement of Purpose for grad school, I had to go back and "scrutinize every single detail of what I did" in the lab (D5P251).

Luckily, my advisor, Prof. Dan Work, gave me a great deal of support for my applications, even though most of what I did in the lab was pretty basic data cleaning and analysis. To this day I'm very grateful for that.

When I applied to internships in 2020 Spring, this was still one of the only projects I could refer to, whenever I'm asked about my past experiences.

In all, it's not an understatement to say this internship opened the door for my career development.

Back to Projects