Event language
UI language
<p>Pandas, the de-facto standard DataFrame implementation in Python, is very popular among data scientists, but it does not scale well to big data. It was designed for small data sets that a single machine could handle. On the other hand, Apache Spark has emerged as the de-facto standard for big data workloads. Today many data scientists use Pandas for coursework, pet projects, and small data tasks, but when they work with very large data sets, they either have to migrate to PySpark to leverage Spark or downsample their data so that they can use pandas.</p><p>Now with Koalas, an open-source implementation of the Pandas API on Apache Spark, data scientists can make the transition from a single machine to a distributed environment without needing to learn a new framework. In this talk, we'll go through the basics of Koalas, along with demos.</p>