Amazon SDE Intern
SUMMER 2022
For my 12-week internship, I built Alexa Intelligent Semantic Search (AlexISS), a low-cost, low-latency semantic search framework that can work for any kind of dataset at the massive scale of Alexa. I was able to reduce search latency using by 55% when searching over 80,000 queries storage costs by 97% over existing tools. In the end, I also collaborated with cross-functional teams and scientists to build a data quality tool with AlexISS integration that is capable of identifying any data that are causing model confusion or failures.
Experience
Week 1-2
I spent the first two weeks onboarding to learn about Amazon’s culture, and the internal tools they offer, and documentation that my mentor has compiled in order to be prepared for the project. Getting the correct permissions to access everything was the main bottleneck, but my manager and mentor were both helpful in getting me through it.
Week 2-6
Next, I worked to build a minimally viable product (MVP) of AlexISS that had the basic functionality to query and search for semantically similar results on a medium-sized dataset in a reasonable amount of time. AlexISS still outperformed existing solutions in this stage, which proves the value this project could provide. My mentor, direct manager, the applied scientist I was working with, and I also participated in a multitude of customer meetings (Amazon SDEs and scientists) to present our project in order to see how our search framework can help them and get more feedback on how we can improve our initial designs.
Week 7-10
Through this week, I researched and implemented optimizations for AlexISS to further improve the search latency and decrease the storage costs. I spent the first week researching, testing, and benchmarking different machine learning methods provided by FAISS to find the optimal configurations. Due to lack of time, I was only able to verify the correctness of the optimization for datasets up to 10,000,000 items. Nevertheless, I was still able to achieve a 55% search latency reduction and 97% storage reduction over existing methods.
I also spent the time working on integrating AlexISS into a “Data Quality Analysis Tool” that my team can start using for cleaning up data. Based on certain criterion you select, this tool would provide a report of data that could be causing model confusion/failures. From the generated report, scientists can better understand the data and how to clean it to create a more accurate model. To use, I also built it as a command line tool, similar to AlexISS.
Week 11-12
For the last few weeks, I worked on my final presentation, and tidying everything from documents to comments in my code so that others who have never worked on AlexISS and the Data Quality tool can still understand my code. The final presentation to my managers went well, and I learned a lot from this experience.
Takeaways
- Generalize code to work for any use case.
- Write clean, tidy, code.
- Improved Python skills.
- Building command line interfaces in Python.
- Writing scripts.
- Internal Amazon tools.
- AWS S3 buckets.
- Developing with a customer first approach.
- Communication and teamwork.