Compiled, automatically parallel Python for data-science¶
Any code you run with pyfora works as-is in python, but with pyfora it can run hundreds or thousands of times faster, and can operate on datasets many times larger than the RAM of a single machine. You can speed up your computations by running them on hundreds of CPU cores with terabytes of RAM, and you can do this with hardly any changes to your code.
pyfora consists of two main components:
- A distributed backend that runs on one or more machines in your local network or in the cloud.
- A Python package that sends your code to the backend for compilation and execution.
The following program uses pyfora to sum
math.sin() over the first billion integers:
import math, pyfora executor = pyfora.connect('http://localhost:30000') with executor.remotely.downloadAll(): x = sum(math.sin(i) for i in xrange(10**9)) print x
This program runs in 13.76 seconds on a 3.40GHz Intel(R) Core(TM) i7-2600 quad-core (8 hyperthreaded) CPU, and utilizes all 8 cores. The same program in the local python interpreter takes 185.95 seconds and uses one core.
pip install pyfora
pyfora requires Python 2.7. Python 3 is not supported yet.
Only official CPython distributions from python.org are supported at this time. This is what OS X and most Linux distributions include by default as their “system” Python.
The pyfora backend is distributed as a docker image that can be run in any docker-supported environment. The Setup Guides below contain instructions for setting up the backend in various environments.