H2O is a relatively new open-source library for efficiently modeling large amounts of data. The library supports a wide array of machine learning models, integrated with common data processing platforms. The team worked with H2O.ai to leverage the company's Sparkling Water platform, which runs machine learning models directly on top of Apache Spark to allow for highly scalable data processing, to predict the score and probability of gilding for Reddit comments, based on the comment's content as well as metadata about it such as the parent comment, time and date posted, etc.

