January 2, 2020

657 words 4 mins read

Best practices for scaling modeling platforms

Best practices for scaling modeling platforms

Companies are increasingly building modeling platforms to empower their researchers to efficiently scale the development and productionalization of their models. Scott Clark and Matt Greenwood share a case study from a leading algorithmic trading firm to illustrate best practices for building these types of platforms in any industry.

Talk Title Best practices for scaling modeling platforms
Speakers Scott Clark (SigOpt), Matt Greenwood (Two Sigma Investments)
Conference O’Reilly Artificial Intelligence Conference
Conf Tag Put AI to Work
Location New York, New York
Date April 16-18, 2019
URL Talk Page
Slides Talk Slides
Video

Companies are increasingly building modeling platforms to empower their researchers to efficiently scale the development and productionalization of their models. Scott Clark and Matt Greenwood share a case study from a leading algorithmic trading firm to illustrate best practices for building these types of platforms in any industry. Join in to learn how Two Sigma, a leading quantitative investment and technology firm, solved its model optimization problem. Algorithmic trading firms leverage massive amounts of data, advanced engineering, and quantitative research through every step of the investment process to maximize the returns for their customers. Parameterized models exist at the heart of each stage. Finding the optimal settings for these models is an ongoing challenge. Some models are simple or well studied enough to have closed-form analytic solutions. Others, like increasingly popular deep learning models, have analytic mathematical formulations that make them good targets for powerful gradient descent methods. Unfortunately, many models require full market simulations or machine learning algorithms where none of these fast optimization methods can be used. Two Sigma tried both unsophisticated “grid search” and more sophisticated open source Bayesian optimization methods (like GPyOpt) to solve this problem. The former were far too expensive for even moderately complex models, and the latter were too brittle and inconsistent in their performance to use across modeling pipelines at scale. Furthermore, the cost of building, updating and maintaining the systems was a greater tax on Two Sigma’s resources than expected. In a departure from its preference for open source or internally built tools, Two Sigma trialed SigOpt as the optimization engine in a component of their modeling platform. The company first tested it against other methods to benchmark performance and quickly standardized on SigOpt as the preferred optimization engine powering the modeling platform. In the process, the Two Sigma team realized a few benefits. First, SigOpt drove significant performance gains. In testing against alternative methods like GPyOpt, SigOpt delivered better results much faster. To contextualize this significant performance gain, consider one machine learning model that had particularly lengthy training cycles. Using GPyOpt, it took 24 days to tune. With SigOpt, the tuning process resulted in a more accurate model and only took 3 days to do so. That is, it resulted in a better performing model 8x faster. Second, SigOpt offered advanced optimization features that allowed Two Sigma to solve entirely new business problems with modeling. One of the more intuitive examples of these advanced features is multimetric optimization. This feature empowers teams to optimize multiple metrics at the same time and analyze the Pareto-optimal frontier of solutions. This feature is useful in traditional machine learning scenarios, where, for example, teams may sacrifice accuracy for inference time. Finally, SigOpt offers asynchronous parallelization of compute. Other solutions take advantage of massive clusters but evaluate tasks in batches and wait for every task within the batch to complete before launching the next set of tasks. SigOpt’s algorithm provides a new task to evaluate as soon as one completes, meaning 100% of machines are utilized throughout the optimization process. Scott and Matt explore each of these scenarios more deeply and provide a deeper overview of this particular benchmark—and what this faster time to tune practically means for teams who are building modeling platforms. They then discuss how techniques like multimetric optimization and asynchronous parallelization combine to empower teams to implement entirely new modeling strategies with significantly greater asset utilization.

comments powered by Disqus