GPT 32k, π» Open-source, DeepLake ποΈ, GPT Functions Search π and more!
Author: Kevin Lu
Excited to announce our latest migrations to GPT 32k and ActiveLoop deep lake, open-sourcing of Sweep and improvements on Sweepβs file search mechanism.
GPT 32k Migration
GPT 32k (June 13th edition) shows significantly more consistent code generation and instruction-following capabilities, at the cost of a higher price tag and slower PR generation times (2 - 3 min β 5 - 10 min). This drastically reduces some issues we were facing: failure to generate requested changes and failure to follow instructions and respond in specified formats. Expect to see less of βAn error has occurredβ and more well written PRs. Improving the formats also increases our prompt engineering speed.
Open-sourcing Sweep
A lot of users weβre concerned about how we store your source code. We decided that open-sourcing Sweep would provide transparency with how we handle data as well, on top of showing some of the algorithms we use for chunking, indexing, querying and prompting.
ActiveLoop Deep Lake
Migrating to Deep Lake drastically improved our vector DBβs consistency and reliability, with our previous system being another open-source vector DB library. Deep Lakeβs developer interface was also much easier to use, with built-in locking features working well with our serverless Modal backend. Thereβs still additional work to be done to improve the efficiency like caching embeddings.
GPT Functions Search
GPT Functions (https://openai.com/blog/function-calling-and-other-api-updates (opens in a new tab)) is basically OpenAIβs interface for more easily creating agents (like ReAct) released yesterday. We integrated GPT Functions into our retrieval system to allow Sweep to decide at runtime whether more search queries should be made. This improves the system by only retrieving more relevant information when needed.