Introduction to pyfora¶
pyfora sends your Python code to a local or remote cluster, where it is compiled and can be automatically run in parallel on many machines. With pyfora you can run distributed computations on a cluster without having to modify your code. You can speed up your computations by running them on hundreds of CPU cores, and you can operate on datasets many times larger than the RAM of a single machine.
Setting up a Cluster¶
To get started with pyfora you will need a backend cluster with at least one worker. The backend is available as a docker image that can be run locally on your machine in a single-node setup, or on a cluster of machines on a local network or in the cloud.
Connecting to a Cluster¶
Once you have a running cluster, you can connect to it and start executing code:
import pyfora
executor = pyfora.connect('http://localhost:30000')
The variable executor
is now bound to a pyfora Executor
that can be used
to submit and execute code on the cluster.
There are two ways to run code with an Executor
:
- Asynchronously using
Future
objects. - Synchronously by enclosing code to be run remotely in a special python
with
block.
In this tutorial we will use the synchronous model.
First, we’ll define a function that we are going to work with:
def isPrime(p):
x = 2
while x*x <= p:
if p%x == 0:
return 0
x = x + 1
return 1
Computing Remotely¶
Now we can use the Executor
to run some remote computations with the function
we defined:
with executor.remotely.downloadAll():
result = sum(isPrime(x) for x in xrange(10 * 1000 * 1000))
print result
What just happened?¶
The code in the body of the with
block was shipped to the cluster, along with any dependent
objects and code (like isPrime()
in this case) that it refers to, directly or indirectly.
The python code was translated into pyfora bitcode and executed on the cluster. The resulting
objects (result
in the example above) were downloaded from the cluster and copied back
back into the local python environment because we used
executor.remotely.downloadAll()
.
Depending on the code you are running, and the number of CPU cores available in your cluster, the
runtime looks for opportunities to parallelize the execution of your code.
In the example above, the runtime can see that individual invocations of isPrime()
within the generator
expression isPrime(x) for x in xrange(10 * 1000 * 1000)
are independent of each other can therefore
be run in parallel.
In fact, what the runtime does in this example is to split the xrange()
iteration across all available
cores in the cluster. If a particular subrange completes while others are still running, the runtime
dynamically subdivides a range that is still computing to maximize the utilization of CPUs.
This is something that is bound to happen in problems like this when time-complexity of a computation
is not constant across the entire input space (determining whether a large number is prime is much
harder than a small number).
Important
Not all python code can be converted to pyfora bitcode and run in this way. In order to benefit form the performance and scalability advantages of pyfora, your code must be:
- Side-effectless: data structures cannot be mutated.
- Deterministic: running with the same input must always yield the same result.
See Pure Python for more details.
Working with proxies¶
In the previous example, the result of the computation was the number of prime numbers in the specified
range. That’s a single int
that can be easily downloaded from the cluster and copied into
the local python environment.
Now consider this variation of the code:
with executor.remotely.remoteAll():
primes = [x for x in xrange(10 * 1000 * 1000) if isPrime(x)]
This time the result of the computation, the variable primes
,
is a list
of all prime numbers in the range. But because we used
executor.remotely.remoteAll()
,
the variable primes
is a proxy to a list of primes that lives in-memory on the
cluster (it is actually an instance of RemotePythonObject
).
There are two things you can do with remote python objects:
- Download them into the local python scope where they become regular python objects.
- Use them in subsequent remote computations on the cluster.
Downloading a remote object is done using its toLocal()
method, which returns a Future
that resolves to the downloaded object.
To do it all synchronously in one line you can write:
primes = primes.toLocal().result()
This call downloads all the data in the remote primes
list from the cluster to the
client machine where it is converted back into python. If the list is very large, or the connection
to the cluster is slow, this can be a slow operation.
Furthermore, the size of the list may be greater than the amount of memory available on the local
machine, in which case it is impossible to download it this way.
As an alternative to downloading the entire result, we may choose to compute with it inside of
another with executor.remotely
block. For example:
with executor.remotely.downloadAll():
lastFewPrimes = primes[-100:]
The backend recognizes that primes
is a proxy to data that lives remotely in the cluster,
and lets us refer to it in dependent computations, the result of which we then return as regular
python objects.
For convenience, it also possible to write:
with executor.remotely.downloadSmall(bytecount=100*1024):
...
In this case, objects larger than bytecount
are left in the cluster and returned as proxies,
while smaller objects are downloaded and copied into the local scope.
Pure Python¶
The pyfora runtime supports a restricted, “purely functional” subset of python that we call “Pure Python”. By “purely functional” we mean code in which:
- All data-structures are immutable (e.g. no modification of lists).
- No operations have side-effects (e.g. no sockets, no
print
). - All operations are deterministic - running them on the same input always yields the same result (e.g. no access to system time, amount of available memory, etc.)
These restrictions are essential to the kinds of reasoning that pyfora applies to your code. Some of these restrictions may be relaxed in the future under certain circumstances, but at this time the following constraints are enforced:
Objects are immutable (except for
self
within an object’s__init__()
). Expressions likeobj.x = 10
are disallowed, as they would modifyobj
. The exception to this rule isself
within__init__()
, where assignments are used to provide initial values to object members.Lists are immutable. Expressions like
a[0] = 10
won’t work, nor willa.append(10)
.However, given a list
a
, “appending” a valuex
to it by producing a new list usinga + [x]
results in effecient code without superflous copying of data.Dictionaries are immutable. In the future, updates to dictionaries will be allowed in cases where pyfora can prove that there is exactly one reference to the dictionary. But for the moment dictionaries can only be constructed from iterators, as in:
dict((x, x**2) for x in xrange(100))
Also note that at the moment, dictionaries can be quite slow, so use them sparingly.
No augmented assignment. Expressions like
x += 10
are disallowed since they modifyx
.print
is disabled.locals()
andglobals()
are disabled.del
is disabled.No
eval
orexec
.
Note
Regular variable assignments do work as expected. The following code, for example, is allowed:
x = x + 5
v = [x]
v = v + [12]
Violation of Constraints¶
Whenever you invoke pyfora on a block of python code, the runtime attempts to give you either (a) the exact same answer you would have received had you run the same code in your python interpreter locally, or (b) an exception [1].
Constraint checking happens in two places. Some of the constraints are enforced at parse-time.
As soon as you enter a with executor.remotely
block, pyfora tries to determine all the code your
invocation can touch. If any of that code contains syntatic elements that pyfora knows are invalid
(such as print()
statements), it will generate an exception.
Other constraints are enforced at runtime. For instance, the append method of lists, when invoked in pyfora,
raises a InvalidPyforaOperation
exception that’s not catchable by python
code running inside of pyfora. This indicates that the program has attempted to execute semantics that
pyfora can’t faithfully reproduce.
Footnotes
[1] | Currently, the only intended exception to this rule is integer arithmetic:
on the occurrence of an integer arithmetic overflow, pyfora will give you the semantics of the
underyling hardware, whereas python will produce an object of type long with the correct value.
Eventually, we will make this tradeoff configurable, but it has pretty serious performance implications,
so for the moment we’re just ignoring this difference. |