Hi everyone, I’m Padma and I’m a rising senior at UC Irvine studying computer science and engineering. This summer, I had the opportunity to be a technology fellow at Caylent! I was also a hackNY Fellow, which connected me with a community of tech interns in New York City. Between my internship, the fellowship, and NYC, I had an incredible summer. I’ll share more about my experience in this blogpost!
Coming into the internship at Caylent, I had some exposure to AWS through my prior internships but didn’t really dive into their features beyond storage. I especially had no knowledge regarding their generative AI features, which Caylent leverages to improve clients’ infrastructures. Knowing the impact of cloud computing and generative AI, I was excited to be working at this company. Here are some things I worked on:
Benchmarking LLM Models
I worked on some benchmarking tools in Python aimed to assess foundation models (such as Claude and Sonnet) deployed on various cloud platforms, for scalability and cost efficiency. The existing benchmarking tools supported limited cloud platforms, involved calls to deprecated APIs, and didn’t take advantage of multi-threading to reduce evaluation time. I also added support for adjusting different hyperparameters, trying to find a way to reduce latency without sacrificing performance. Below, I’ve included one result from a preliminary draft of some tests I wrote to see whether different methods of clamping output tokens could reduce latency. In building these tools and writing these tests, I gained insight into large language models, use cases, and generative AI workflows.