pyfora¶

pyfora.connect(url, timeout=30.0)¶

Opens a connection to a pyfora cluster

Parameters:	url (str) – The HTTP URL of the cluster’s manager (e.g. `http://192.168.1.200:30000`) timeout (Optional float) – A timeout for the operation in seconds, or None to wait indefinitely.
Returns:	An `Executor` that can be used to submit work to the cluster.

Exceptions¶

exception pyfora.PyforaError¶: Base class for all pyfora exceptions.

exception pyfora.ConnectionError¶: Raised when a connection to the pyfora backend cannot be established.

exception pyfora.NotCallableError¶: Raised when an attempt is made to call a non-callable object.

exception pyfora.ComputationError(remoteException, trace)¶

Raised when a remote computation results in an exception.

Parameters:	remoteException (Exception) – The exception raised by the remote computation. trace (Optional[List]) – A representation of the stack trace in which the exception was raised. It takes the form: `[{'path':str, 'line': int}, ... ]`

exception pyfora.PythonToForaConversionError(message, trace=None)¶

Raised when an attempt is made to use a Python object that cannot be remoted by pyfora.

This may happen when, for example:

A function attempts to mutate state or produce side-effect (i.e. it is not “purely functional”).

A call is made to a Python builtin that is not supported by pyfora (e.g. open())

Parameters:	message (str) – Error message. trace (Optional[List]) – A representation of the stack trace in which the exception was raised. It takes the form: `[{'path':str, 'line': int}, ... ]`

exception pyfora.ForaToPythonConversionError¶: Raised when attempting to download a remote object that cannot be converted to Python.

exception pyfora.PyforaNotImplementedError¶: Feature not yet implemented in pyfora.

exception pyfora.InvalidPyforaOperation¶: Raised when a running computation performs an operation that cannot be faithfully executed with pyfora.

exception pyfora.ResultExceededBytecountThreshold¶: Raised when attempting to download a remote object whose size exceeds the specified maximum.

Executor¶

class pyfora.Executor.Executor(connection, pureImplementationMappings=None)¶

Submits computations to a pyfora cluster and marshals data to/from the local Python.

The Executor is the main point of interaction with a pyfora cluster. It is responible for sending computations to the cluster and returning the result as a RemotePythonObject future.

It is modeled after the same-named abstraction in the concurrent.futures module that is part of the Python3 standard library.

All interactions with the remote cluster are asynchronous and return Future objects that represent the in-progress operation.

Python objects are sent to the server using the define() method, which returns a Future that resolves to a RemotePythonObject corresponding to the submitted object.

Similarly, functions and their arguments can be submitted using the submit() method which returns a Future that resolves to a RemotePythonObject of the evaluated expression or raised exception.

Note

This class is not intended to be constructed explicitly. Instances of it are created by calling connect().

Parameters:	connection (pyfora.Connection.Connection) – an open connection to a cluster. pureImplementationMappings (optional) – a `PureImplementationMappings` that defines mapping between Python libraries and their “pure” `pyfora` implementation.

close()¶: Closes the connection to the pyfora cluster.

define(obj)¶

Create a remote representation of an object.

Sends the specified object to the server and return a Future that resolves to a RemotePythonObject representing the object on the server.

Parameters:	obj – A python object to send
Returns:	A `Future` that resolves to a `RemotePythonObject` representing the object on the server.

exportS3Dataset(valueAsString, bucketname, keyname)¶

Write a ComputedRemotePythonObject representing a pyfora string to S3

Parameters:	valueAsString (RemotePythonObject.ComputedRemotePythonObject) – a computed string. bucketname (str) – The name of the S3 bucket to write to. keyname (str) – The S3 key to write to.
Returns:	A `Future` representing the completion of the export operation. It resolves either to `None` (success) or to an instance of `PyforaError`.

getWorkerCount()¶

Returns the number of workers connected to the cluster.

Returns:	The number of workers currently available in the cluster.
Return type:	int

importRemoteFile(path)¶

Loads the content of a file as a string

Note

The file must be available to all machines in the cluster using the specified path. If you run multiple workers you must either copy the file to all machines, or if using a network file-system, mount it into the same path on all machines.

In addition, pyfora may cache the content of the file. Changes to the file’s content made after it has been loaded may have no effect.

Parameters:	path (str) – Full path to the file. This must be a valid path on all worker machines in the cluster.
Returns:	A `Future` that resolves to a `RemotePythonObject` representing the content of the file as a string.

importS3Dataset(bucketname, keyname, verify=True)¶

Creates a RemotePythonObject that represents the content of an S3 key as a string.

Parameters:	bucketname (str) – The S3 bucket to read from. keyname (str) – The S3 key to read. verify – Throw an exception immediately if the key or bucket cannot be read.
Returns:	A `Future` that resolves to a `RemotePythonObject` representing the content of the S3 key.

isClosed()¶

Determine if the Executor is connected to the cluster.

Returns:	`True` if `close()` has been called, `False` otherwise.
Return type:	bool

remotely¶

Returns a WithBlockExecutor.WithBlockExecutor that can be used to enter a block of “pure” Python code.

The with executor.remotely: syntax allows you to automatically submit an entire block of python code for remote execution. All the code nested in the remotely with block is submitted.

Returns:	A `WithBlockExecutor` that extracts python code from a with block and submits it to the pyfora cluster for remote execution. Results of the remote execution are returned as RemotePythonObject and are automatically reasigned to their corresponding local variables in the with block.

submit(fn, *args)¶

Submits a callable to be executed on the cluster with the provided arguments.

This function is shorthand for calling define() on the callable and all arguments and then invoking the remote callable with the remoted arguments.

Returns:	A `Future` representing the given call. The future eventually resolves to a `RemotePythonObject` instance or an exception.

WithBlockExecutor¶

class pyfora.WithBlockExecutor.WithBlockExecutor(executor)¶

A helper object used to synchronously run blocks of code on a cluster.

When entering a with block using a WithBlockExecutor, the body of the block is extracted and submitted to the pyfora cluster for execution, along with all its local dependencies. Variable assignments within the block are returned as RemotePythonObject and reassigned to their corresponding local varialbes when exiting the block.

Use downloadAll(), remoteAll(), and downloadSmall() to modify the default behavior and select which objects should be downloaded from the server and which objects should be returned as RemotePythonObject futures.

Note

Instances of WithBlockExecutor are only intended to be created by Executor. User code typically uses remotely to get a WithBlockExecutor.

downloadAll()¶

Modify the executor to download all results into the local namespace.

Returns:	`self` to support chaining.

downloadSmall(bytecount=10000)¶

Modify the executor to download small results into the local namespace and return proxies for everything else.

Returns:	`self` to support chaining.

remoteAll()¶

Modify the executor to leave all results on the server and only return proxies (default).

Returns:	`self` to support chaining.

withStatusCallback(callback)¶

Modify the executor to call ‘callback’ while computations are blocked with status updates.

‘callback’ will receive a json package from the server containing information about the current computation. This will override the default callback, which attempts to determine whether we’re in a jupyter notebook.

RemotePythonObject¶

A proxy for some object, data or callable that lives in memory on a pyfora cluster

class pyfora.RemotePythonObject.RemotePythonObject(executor)¶

A local proxy for a python object that lives in memory on a pyfora cluster.

This is an abstract class and should not be used directly, but through its two subclasses: DefinedRemotePythonObject and ComputedRemotePythonObject.

Parameters:	executor – An `Executor`

toLocal()¶

Downloads the remote object.

Returns:	A `Future` that resolves to the python object that this `RemotePythonObject` represents.

RemotePythonObject.DefinedRemotePythonObject¶

class pyfora.RemotePythonObject.DefinedRemotePythonObject(objectId, executor)¶

A proxy that represents a local object, which has been uploaded to a pyfora cluster.

Note

Only Executor is intended to create instances of DefinedRemotePythonObject. They are created by calling define().

Parameters:	objectId (int) – a value that uniquely identifies the remote object that this `DefinedRemotePythonObject` represents. executor – the `Executor` that created this `DefinedRemotePythonObject`.

RemotePythonObject.ComputedRemotePythonObject¶

class pyfora.RemotePythonObject.ComputedRemotePythonObject(computedValue, executor, isException)¶

A proxy that represents a remote object created on a pyfora cluster as a result of some computation.

Note

Only Executor is intended to create instances of ComputedRemotePythonObject. They are created by calling submit().

Parameters:	computedValue – an instance of a SubscribableWebObject computedValue representing the computation that produced this `ComputedRemotePythonObject`. executor – the `Executor` that created this `DefinedRemotePythonObject`.

Future¶

class pyfora.Future.Future(onCancel=None)¶

Bases: concurrent.futures._base.Future

This pyfora.Future object subclasses the standard Python concurrent.futures._base.Future object. See: http://pythonhosted.org/futures/ https://pypi.python.org/pypi/futures

Futures wrap the result to an asynchronous computation which can be accessed by a blocking call to result().

The pyfora Future object extends the concurrent.futures object by supporting cancellation with the cancel() method.

cancel()¶: Cancel a running computation

Algorithms¶

Linear Regression¶

pyfora.algorithms.linearRegression(predictors, responses)¶

Compute the regression coefficients (with intercept) for a set of predictors against responses.

Parameters:

predictors (DataFrame) – a pandas.DataFrame with the predictor columns.
responses (DataFrame) – a pandas.DataFrame whose first column is used as the regression’s target.

Returns:

A numpy.array with the regression coefficients. The last: element in the array is the intercept.

Logistic Regression¶

class pyfora.algorithms.BinaryLogisticRegressionFitter(C, hasIntercept=True, method='newton-cg', interceptScale=1.0, tol=0.0001, maxIter=100000.0, splitLimit=1000000)¶

A logistic regression “fitter” ithat holds fitting parameters used to fit logit models.

Parameters:

C (float) – Inverse of regularization strength; must be a positive float.
hasIntercept (bool) – If True, include an intercept (aka bias) term in the fitted models.
method (string) – one of ‘newton-cg’ (default) or ‘majorization’
interceptScale (float) – When hasIntercept is True, feature vectors become [x, interceptScale], i.e. we add a “synthetic” feature with constant value interceptScale to all of the feature vectors. This synthetic feature is subject to regularization as all other features. To lessen the effect of regularization, users should increase this value.
tol (float) – Tolerance for stopping criteria. Fitting stops when the l2-norm of the parameters to update do not change more than tol.
maxIter (int) – A hard limit on the number of update cycles allowed.

fit(X, y)¶

fit a (regularized) logit model to the predictors X and responses y.

Parameters:	X – a dataframe of feature vectors. y – a dataframe (with one column) which contains the “target” values, corresponding to the feature vectors in `X`.
Returns:	A `BinaryLogisticRegressionModel` which represents the fit model.

Example:

# fit a logit model without intercept using regularizer 1.0

from pyfora.algorithms import BinaryLogisticRegressionFitter

fitter = BinaryLogisticRegressionFitter(1.0, False)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})

model = fitter.fit(x, y)

class pyfora.algorithms.logistic.BinaryLogisticRegressionModel.BinaryLogisticRegressionModel(coefficients, classZeroLabel, classOneLabel, intercept, interceptScale, iters)¶

Represents a fit logit model.

coefficients¶: numpy.array – The regressions coefficients.

intercept¶: float – The fitted model’s intercept

Note

This class is not intended to be constructed directly. Instances of it are returned by fit().

predict(X)¶

Predict the class labels of X.

Parameters:	X (DataFrame, or numpy.array) – a set of feature vectors
Returns:	array containing the predicted class labels.
Return type:	numpy.array

predict_probability(X)¶

Estimate the conditional class-zero probability for the features in X.

Parameters:	X (DataFrame, or numpy.array) – a set of feature vectors
Returns:	array containing the predicted probabilities.
Return type:	numpy.array

score(X, y)¶

Returns the mean accuracy on the given test data and labels.

Parameters:	X (DataFrame) – Feature vectors y (DataFrame) – Target labels, corresponding to the vectors in `X`.
Returns:	The mean accuracy of `predict()` with respect to `y`.
Return type:	float

Regression Trees¶

class pyfora.algorithms.regressionTrees.RegressionTree.RegressionTreeBuilder(maxDepth, minSamplesSplit=2, numBuckets=10000, minSplitThresh=1000000)¶

Fits regression trees to data using specified tree parameters.

Parameters:

maxDepth (int) – The maximum depth of a fit tree
minSamplesSplit (int) – The minimum number of samples required to split a node
numBuckets (int) – The number of buckets used in the estimation of optimal column splits.
minSplitThresh (int) – an “internal” argument, not generally of interest to casual users, giving the splitting rule in computeBucketedSampleSummaries.

Returns:

A RegressionTree instance.

Examples:

from pyfora.algorithms import RegressionTreeBuilder

builder = RegressionTreeBuilder(2)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})
regressionTree = builder.fit(x, y)

static buildTree(x, y, minSamplesSplit, maxDepth)¶

Fit a regression tree to predictors x and responses y using parameters minSamplesSplit and maxDepth.

Parameters:	x (`pandas.DataFrame`) – of the predictors. y (`pandas.DataFrame`) – giving the responses. maxDepth – The maximum depth of a fit tree minSamplesSplit – The minimum number of samples required to split a node

fit(x, y)¶

Using a RegressionTreeBuilder, fit a regression tree to predictors x and responses y.

Parameters:	x (`pandas.DataFrame`) – of the predictors. y (`pandas.DataFrame`) – giving the responses.
Returns:	a `RegressionTree` instance.

Examples:

builder = pyfora.algorithms.regressionTrees.RegressionTree.RegressionTreeBuilder(2)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})
regressionTree = builder.fit(x, y)

class pyfora.algorithms.regressionTrees.RegressionTree.RegressionTree(rules, numDimensions=None, columnNames=None)¶

A class representing a regression tree.

A regression tree is represented, essentially, as a list of “rules”, which are either SplitRule, giving “split” nodes, which divide the domain by a hyperplane, or RegressionLeafRule, which just hold a prediction value.

Note

This class is not generally instantiated directly by users. Instead, they are normally returned by RegressionTreeBuilder.

predict(x, depth=None)¶

Predicts the responses corresponding to pandas.DataFrame x.

Returns:	A `pandas.Series` giving the predictions of the rows of `x`.

Examples:

from pyfora.algorithms import RegressionTreeBuilder

builder = RegressionTreeBuilder(2)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})
regressionTree = builder.fit(x, y)

# predict `regressionTree` on `x` itself
regressionTree.predict(x)

score(x, yTrue)¶

Returns the coefficient of determination R² of the prediction.

The coefficient R² is defined as (1 - u / v), where u is the regression sum of squares ((yTrue - yPredicted) ** 2).sum() and v is the residual sum of squares ((yTrue - yTrue.mean()) ** 2).sum(). Best possible score is 1.0, lower values are worse.

Returns:	(float) the R² value

Examples:

from pyfora.algorithms import RegressionTreeBuilder

builder = RegressionTreeBuilder(2)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})
regressionTree = builder.fit(x, y)

# predict `regressionTree` on `x` itself
regressionTree.score(x, y)

Gradient Boosting¶

class pyfora.algorithms.regressionTrees.GradientBoostedRegressorBuilder.GradientBoostedRegressorBuilder(maxDepth=3, nBoosts=100, learningRate=1.0, minSamplesSplit=2, numBuckets=10000, loss='l2')¶

A class which builds (or “fits”) gradient-boosted regression trees to data with specified parameters. These parameters are

Parameters:

maxDepth (int) – The max depth allowed of each constituent regression tree.
nBoosts (int) – The number of “boosting iterations” used.
learningRate (float) – The learning rate of the model, used for regularization. Each successive tree from boosting stages are added with multiplier learningRate.
minSamplesSplit (int) – The minimum number of samples required to split a regression tree node.
numBuckets (int) – The number of buckets used in the estimation of optimal column splits for building regression trees.
loss – the loss used when forming gradients. Defaults to l2, for least-squares loss. The only other allowed value currently is lad, for “least absolute deviation” (aka l1-loss).

fit(X, y)¶

Fits predictors X to responses y.

Parameters:	X (`pandas.DataFrame`) – predictors. y (`pandas.DataFrame`) – responses.
Returns:	A `RegressionModel` instance.

Examples:

from pyfora.algorithms import GradientBoostedRegressorBuilder

builder = GradientBoostedRegressorBuilder(1, 1, 1.0)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})

model = builder.fit(x, y)

iterativeFitter(X, y)¶

Returns an IterativeFitter instance which can iteratively fit boosting models.

Parameters:	X (`pandas.DataFrame`) – predictors. y (`pandas.DataFrame`) – responses.

Examples:

from pyfora.algorithms import GradientBoostedRegressorBuilder

builder = GradientBoostedRegressorBuilder(1, 1, 1.0)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})

fitter = builder.iterativeFitter(x, y)

# compute scores vs number of boosts
numBoosts = 5
scores = []
for ix in xrange(numBoosts):
    fitter = fitter.next()
    scores = scores + [fitter.model.score(x, y)]

class pyfora.algorithms.regressionTrees.RegressionModel.RegressionModel(additiveRegressionTree, X, XDimensions, yAsSeries, loss, regressionTreeBuilder, learningRate)¶

A class representing a gradient-boosted regression tree model fit to data.

Note

These classes are not normally instantiated directly. Instead, they are typically returned by GradientBoostedRegressorBuilder instances.

predict(df, nEstimators=None)¶

Predict on the pandas.DataFrame df.

Example:

from pyfora.algorithms import GradientBoostedRegressorBuilder

builder = GradientBoostedRegressorBuilder(1, 1, 1.0)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})

model = builder.fit(x, y)

# predict `x` using the model `model`:
model.score(x, y)

score(X, yTrue)¶

Return the coefficient of determination (R²) of the prediction.

The coefficient R² is defined as (1 - u / v), where u is the regression sum of squares ((yTrue - yPredicted) ** 2).sum() and v is the residual sum of squares ((yTrue - yTrue.mean()) ** 2).sum(). Best possible score is 1.0, lower values are worse.

Parameters:	X – the predictor DataFrame. yTrue – the (true) responses DataFrame.
Returns:	(float) the R² value.

Example:

from pyfora.algorithms import GradientBoostedRegressorBuilder

builder = GradientBoostedRegressorBuilder(1, 1, 1.0)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})

model = builder.fit(x, y)

# compute the score of the fit model:
model.score(x, y)

class pyfora.algorithms.regressionTrees.GradientBoostedRegressorBuilder.IterativeFitter(model, predictions)¶

A sort of iterator class which is capable of fitting subsequent boosting models.

model¶: the current regression model.

predictions¶: the current predictions of the regression model (with respect to the training set implicit in model).

Note

This class is typically not instantiated directy. Instead these classes are returned from iterativeFitter().

next()¶

Fit one boosting stage, returning a new IterativeFitter object that holds the next regression model and predictions.

Examples:

from pyfora.algorithms import GradientBoostedRegressorBuilder

builder = GradientBoostedRegressorBuilder(1, 1, 1.0)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})

fitter = builder.iterativeFitter(x, y)

# compute scores vs number of boosts
numBoosts = 5
scores = []
for ix in xrange(numBoosts):
    fitter = fitter.next()
    scores = scores + [fitter.model.score(x, y)]

class pyfora.algorithms.regressionTrees.GradientBoostedClassifierBuilder.GradientBoostedClassifierBuilder(maxDepth=3, nBoosts=100, learningRate=1.0, minSamplesSplit=2, numBuckets=10000)¶

A class which builds (or “fits”) gradient boosted (regression) trees to form classification models.

Parameters:

maxDepth (int) – The max depth allowed of each constituent regression tree.
nBoosts (int) – The number of boosting iterations used.
learningRate (float) – The learning rate of the model, used for regularization. Each successive tree from boosting stages are added with multiplier learningRate.
minSamplesSplit (int) – The minimum number of samples required to split a regression tree node.
numBuckets (int) – The number of buckets used in the estimation of optimal column splits for building regression trees.
loss – the loss used when forming gradients. Defaults to l2, for least-squares loss. The only other allowed value currently is lad, for “least absolute deviation” (aka l1-loss).

Note

Only nClasses = 2 cases are currently supported.

fit(X, y)¶

Fit predictors X to responses y.

Parameters:	X (`pandas.DataFrame`) – predictors. y (`pandas.DataFrame`) – responses.
Returns:	a `BinaryClassificationModel`

Examples:

from pyfora.algorithms import GradientBoostedClassifierBuilder

builder = GradientBoostedClassifierBuilder(1, 1, 1.0)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})

model = builder.fit(x, y)

iterativeFitter(X, y)¶

Create an IterativeFitter instance which can iteratively fit boosting models.

Parameters:	X (`DataFrame`) – predictors. y (`DataFrame`) – responses.

Examples:

from pyfora.algorithms import GradientBoostedClassifierBuilder

builder = GradientBoostedClassifierBuilder(1, 1, 1.0)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})

fitter = builder.iterativeFitter(x, y)

numBoosts = 5
for ix in xrange(numBoosts):
    fitter = fitter.next()

class pyfora.algorithms.regressionTrees.BinaryClassificationModel.BinaryClassificationModel(additiveRegressionTree, X, classes, XDimensions, yAsSeries, loss, baseModelBuilder, learningRate)¶

A class representing a gradient-boosted binary classification tree model fit to data.

Note

These classes are not normally instantiated directly. Instead, they are typically returned by GradientBoostedClassifierBuilder instances.

deviance(x, yTrue)¶

Compute the binomial deviance (average negative log-likihood) of the instances in predictors X with responses y.

Parameters:	x – the predictor DataFrame. yTrue – the (true) responses DataFrame.

Examples:

builder = GradientBoostedClassifierBuilder(1, 1, 1.0)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})

model = builder.fit(x, y)

# compute the deviance:
model.deviance(x, y)

predict(df)¶

Predict the class labels of the rows of df.

Parameters:	df (`pandas.DataFrame`) – input DataFrame.
Returns:	A `pandas.Series` giving the row-wise predictions.

Examples:

builder = GradientBoostedClassifierBuilder(1, 1, 1.0)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})

model = builder.fit(x, y)

# use the fit model to predict `x` itself:
model.predict(x)

predictProbability(df)¶

Return class-zero probability estimates of the rows of a DataFrame df.

Parameters:	df (`pandas.DataFrame`) – input DataFrame.
Returns:	A `pandas.Series` giving the row-wise estimated class-zero probability estimates

Examples:

builder = GradientBoostedClassifierBuilder(1, 1, 1.0)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})

model = builder.fit(x, y)

# use the fit model to predict `x` itself:
model.predictProbability(x)

score(x, yTrue)¶

Compute the mean accuracy in predicting x with respect to yTrue.

Parameters:	x – the predictor DataFrame. yTrue – the (true) responses DataFrame.

Examples:

builder = GradientBoostedClassifierBuilder(1, 1, 1.0)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})

model = builder.fit(x, y)

# use the fit model to predict `x` itself:
model.score(x, y)

class pyfora.algorithms.regressionTrees.GradientBoostedClassifierBuilder.IterativeFitter(model, previousRegressionValues)¶

A sort of iterator class which is capable of fitting subsequent boosting models.

model: the current regression model.

predictions: the current predictions of the regression model (with respect to the training set implicit in model).

Note

This class is typically not instantiated directy: instead these classes are returned from iterativeFitter().

next()¶

Boost once and return a new IterativeFitter

Returns:	A `IterativeFitter` instance.

Example:

from pyfora.algorithms import GradientBoostedClassifierBuilder

builder = GradientBoostedClassifierBuilder(1, 1, 1.0)
x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]})
y = pandas.DataFrame({'y': [0,1,1]})

fitter = builder.iterativeFitter(x, y)

# compute scores vs number of boosts
numBoosts = 5
scores = []
for ix in xrange(numBoosts):
    fitter = fitter.next()
    scores = scores + [fitter.model.score(x, y)]

Data Frames¶

pyfora.pandas_util.read_csv_from_string(data)¶

Reads a string in CSV format into a DataFrame. This function is similar to pandas.read_csv() but it takes a string as input instead of a file.

This function is intended to be used in pyfora code that runs remotely in a pyfora cluster.

Parameters:	data (str) – a string of comma-separated values
Returns:	A `pandas.DataFrame` that holds the parsed data.

Note

This function currently assumes that all values are of type float (or floatifiable), and that the first row contains column headers. This limitation will be removed in the near future.