ML@B Blog

Machine Learning Crash Course: Part 2

Perceptrons, Logistic Regression, and SVMs

In this post we’ll talk about one of the most fundamental machine learning algorithms: the perceptron algorithm. This algorithm forms the basis for many modern day ML algorithms, most notably neural networks. In addition, we’ll discuss the perceptron algorithm’s cousin, logistic regression. And then we’ll conclude with an introduction to SVMs, or support vector machines, which are perhaps one of the most flexible algorithms used today.

Continue reading

SAP Hana Vora Stock Trading

One crucial element that all statistical learning algorithms need is the ability to handle a tremendous amount of data very quickly. People have used different frameworks for querying, or fetching, data. Among these include Hadoop’s MapReduce framework and the Apache Spark framework. SAP Hana Vora’s (HV) unique in-memory Hadoop query engine for MapReduce frameworks is a promising new tool for big data and performing analysis in a distributed fashion on large databases of information. We demonstrate HV’s potential as a powerful resource for ML by examining its performance on tasks such as stock prediction on market data. We also contributed some additional functionality to the SAP HV library in the process.

Continue reading

Grand Rounds Medical Data Classification

Since our last demo day, the Grand Rounds team has been busy working on many aspects of the project. Grand Rounds provided our team a database of anonymous medicare information, including information about doctors visits. Associated with each visit were claim numbers, the total charge, the complication code (essentially the reason for the visit), and the NPI number (which indicates which physician was doing the procedure), as well as other information.

Continue reading

Github Programming Language Classification

Github doesn’t only make version control software; it’s tasked with the unique challenge of analyzing programming languages. With over 10 million repositories, Github has an incredible corpus of files written by over 3 million users… that’s a lot of files. Our team had the unique opportunity of identifying which programming language each file was written in. Our dataset has 50,000+ files with over 600 languages, a nontrivial multi-class problem. We’ve improved on Github’s existing classifier, Linguist, and, in the process, learned about what makes each programming language unique from one another.

Continue reading

Code Synthesis Update 2

Imagine being able to just tell a computer what you want it to do, rather than programming it and having to deal with annoying syntax, semicolons and debugging. For the last few months, we, Code Synthesis team, have been working on just that. This is similar to the process of automated theorem proving, the process of using computers to solve proofs, which has recently experienced some significant breakthroughs through the use of artificial neural networks. Automated theorem proving uses a description of the proof to derive a way to reach the desired end-product, This idea can be applied to our project where we use a description of a program to actually generate the code for that program.

Continue reading

Share us on social media!

Hacker News
© Machine Learning at Berkeley 2018 • Please contact us for permission to republish any part of this website