pyfora¶
-
pyfora.connect(url, timeout=30.0)¶ Opens a connection to a pyfora cluster
Parameters: - url (str) – The HTTP URL of the cluster’s manager (e.g.
http://192.168.1.200:30000) - timeout (Optional float) – A timeout for the operation in seconds, or None to wait indefinitely.
Returns: An
Executorthat can be used to submit work to the cluster.- url (str) – The HTTP URL of the cluster’s manager (e.g.
Exceptions¶
-
exception
pyfora.PyforaError¶ Base class for all pyfora exceptions.
-
exception
pyfora.ConnectionError¶ Raised when a connection to the pyfora backend cannot be established.
-
exception
pyfora.NotCallableError¶ Raised when an attempt is made to call a non-callable object.
-
exception
pyfora.ComputationError(remoteException, trace)¶ Raised when a remote computation results in an exception.
Parameters: - remoteException (Exception) – The exception raised by the remote computation.
- trace (Optional[List]) – A representation of the stack trace in which the exception was raised.
It takes the form:
[{'path':str, 'line': int}, ... ]
-
exception
pyfora.PythonToForaConversionError(message, trace=None)¶ Raised when an attempt is made to use a Python object that cannot be remoted by
pyfora.This may happen when, for example:
- A function attempts to mutate state or produce side-effect (i.e. it is not “purely functional”).
- A call is made to a Python builtin that is not supported by
pyfora(e.g.open())
Parameters: - message (str) – Error message.
- trace (Optional[List]) – A representation of the stack trace in which the exception was raised.
It takes the form:
[{'path':str, 'line': int}, ... ]
-
exception
pyfora.ForaToPythonConversionError¶ Raised when attempting to download a remote object that cannot be converted to Python.
-
exception
pyfora.PyforaNotImplementedError¶ Feature not yet implemented in
pyfora.
-
exception
pyfora.InvalidPyforaOperation¶ Raised when a running computation performs an operation that cannot be faithfully executed with
pyfora.
-
exception
pyfora.ResultExceededBytecountThreshold¶ Raised when attempting to download a remote object whose size exceeds the specified maximum.
Executor¶
-
class
pyfora.Executor.Executor(connection, pureImplementationMappings=None)¶ Submits computations to a pyfora cluster and marshals data to/from the local Python.
The Executor is the main point of interaction with a pyfora cluster. It is responible for sending computations to the cluster and returning the result as a RemotePythonObject future.
It is modeled after the same-named abstraction in the concurrent.futures module that is part of the Python3 standard library.
All interactions with the remote cluster are asynchronous and return
Futureobjects that represent the in-progress operation.Python objects are sent to the server using the
define()method, which returns aFuturethat resolves to aRemotePythonObjectcorresponding to the submitted object.Similarly, functions and their arguments can be submitted using the
submit()method which returns aFuturethat resolves to aRemotePythonObjectof the evaluated expression or raised exception.Note
This class is not intended to be constructed explicitly. Instances of it are created by calling
connect().Parameters: - connection (pyfora.Connection.Connection) – an open connection to a cluster.
- pureImplementationMappings (optional) – a
PureImplementationMappingsthat defines mapping between Python libraries and their “pure”pyforaimplementation.
-
close()¶ Closes the connection to the pyfora cluster.
-
define(obj)¶ Create a remote representation of an object.
Sends the specified object to the server and return a Future that resolves to a RemotePythonObject representing the object on the server.
Parameters: obj – A python object to send Returns: A Futurethat resolves to aRemotePythonObjectrepresenting the object on the server.
-
exportS3Dataset(valueAsString, bucketname, keyname)¶ Write a ComputedRemotePythonObject representing a
pyforastring to S3Parameters: - valueAsString (RemotePythonObject.ComputedRemotePythonObject) – a computed string.
- bucketname (str) – The name of the S3 bucket to write to.
- keyname (str) – The S3 key to write to.
Returns: A
Futurerepresenting the completion of the export operation. It resolves either toNone(success) or to an instance ofPyforaError.
-
getWorkerCount()¶ Returns the number of workers connected to the cluster.
Returns: The number of workers currently available in the cluster. Return type: int
-
importRemoteFile(path)¶ Loads the content of a file as a string
Note
The file must be available to all machines in the cluster using the specified path. If you run multiple workers you must either copy the file to all machines, or if using a network file-system, mount it into the same path on all machines.
In addition, pyfora may cache the content of the file. Changes to the file’s content made after it has been loaded may have no effect.
Parameters: path (str) – Full path to the file. This must be a valid path on all worker machines in the cluster. Returns: A Futurethat resolves to aRemotePythonObjectrepresenting the content of the file as a string.
-
importS3Dataset(bucketname, keyname, verify=True)¶ Creates a
RemotePythonObjectthat represents the content of an S3 key as a string.Parameters: - bucketname (str) – The S3 bucket to read from.
- keyname (str) – The S3 key to read.
- verify – Throw an exception immediately if the key or bucket cannot be read.
Returns: A
Futurethat resolves to aRemotePythonObjectrepresenting the content of the S3 key.
-
isClosed()¶ Determine if the
Executoris connected to the cluster.Returns: Trueifclose()has been called,Falseotherwise.Return type: bool
-
remotely¶ Returns a
WithBlockExecutor.WithBlockExecutorthat can be used to enter a block of “pure” Python code.The
with executor.remotely:syntax allows you to automatically submit an entire block of python code for remote execution. All the code nested in the remotelywithblock is submitted.Returns: A WithBlockExecutorthat extracts python code from a with block and submits it to the pyfora cluster for remote execution. Results of the remote execution are returned as RemotePythonObject and are automatically reasigned to their corresponding local variables in the with block.
-
submit(fn, *args)¶ Submits a callable to be executed on the cluster with the provided arguments.
This function is shorthand for calling
define()on the callable and all arguments and then invoking the remote callable with the remoted arguments.Returns: A Futurerepresenting the given call. The future eventually resolves to aRemotePythonObjectinstance or an exception.
WithBlockExecutor¶
-
class
pyfora.WithBlockExecutor.WithBlockExecutor(executor)¶ A helper object used to synchronously run blocks of code on a cluster.
When entering a
withblock using aWithBlockExecutor, the body of the block is extracted and submitted to the pyfora cluster for execution, along with all its local dependencies. Variable assignments within the block are returned asRemotePythonObjectand reassigned to their corresponding local varialbes when exiting the block.Use
downloadAll(),remoteAll(), anddownloadSmall()to modify the default behavior and select which objects should be downloaded from the server and which objects should be returned asRemotePythonObjectfutures.Note
Instances of
WithBlockExecutorare only intended to be created byExecutor. User code typically usesremotelyto get aWithBlockExecutor.-
downloadAll()¶ Modify the executor to download all results into the local namespace.
Returns: selfto support chaining.
-
downloadSmall(bytecount=10000)¶ Modify the executor to download small results into the local namespace and return proxies for everything else.
Returns: selfto support chaining.
-
remoteAll()¶ Modify the executor to leave all results on the server and only return proxies (default).
Returns: selfto support chaining.
-
withStatusCallback(callback)¶ Modify the executor to call ‘callback’ while computations are blocked with status updates.
‘callback’ will receive a json package from the server containing information about the current computation. This will override the default callback, which attempts to determine whether we’re in a jupyter notebook.
-
RemotePythonObject¶
A proxy for some object, data or callable that lives in memory on a pyfora cluster
-
class
pyfora.RemotePythonObject.RemotePythonObject(executor)¶ A local proxy for a python object that lives in memory on a pyfora cluster.
This is an abstract class and should not be used directly, but through its two subclasses:
DefinedRemotePythonObjectandComputedRemotePythonObject.Parameters: executor – An Executor-
toLocal()¶ Downloads the remote object.
Returns: A Futurethat resolves to the python object that thisRemotePythonObjectrepresents.
-
RemotePythonObject.DefinedRemotePythonObject¶
-
class
pyfora.RemotePythonObject.DefinedRemotePythonObject(objectId, executor)¶ A proxy that represents a local object, which has been uploaded to a pyfora cluster.
Note
Only
Executoris intended to create instances ofDefinedRemotePythonObject. They are created by callingdefine().Parameters: - objectId (int) – a value that uniquely identifies the remote object that
this
DefinedRemotePythonObjectrepresents. - executor – the
Executorthat created thisDefinedRemotePythonObject.
- objectId (int) – a value that uniquely identifies the remote object that
this
RemotePythonObject.ComputedRemotePythonObject¶
-
class
pyfora.RemotePythonObject.ComputedRemotePythonObject(computedValue, executor, isException)¶ A proxy that represents a remote object created on a pyfora cluster as a result of some computation.
Note
Only
Executoris intended to create instances ofComputedRemotePythonObject. They are created by callingsubmit().Parameters: - computedValue – an instance of a SubscribableWebObject computedValue representing
the computation that produced this
ComputedRemotePythonObject. - executor – the
Executorthat created thisDefinedRemotePythonObject.
- computedValue – an instance of a SubscribableWebObject computedValue representing
the computation that produced this
Future¶
-
class
pyfora.Future.Future(onCancel=None)¶ Bases:
concurrent.futures._base.FutureThis pyfora.Future object subclasses the standard Python concurrent.futures._base.Future object. See: http://pythonhosted.org/futures/ https://pypi.python.org/pypi/futures
Futures wrap the result to an asynchronous computation which can be accessed by a blocking call to
result().The pyfora Future object extends the concurrent.futures object by supporting cancellation with the
cancel()method.-
cancel()¶ Cancel a running computation
-
Algorithms¶
Linear Regression¶
-
pyfora.algorithms.linearRegression(predictors, responses)¶ Compute the regression coefficients (with intercept) for a set of predictors against responses.
Parameters: - predictors (DataFrame) – a
pandas.DataFramewith the predictor columns. - responses (DataFrame) – a
pandas.DataFramewhose first column is used as the regression’s target.
Returns: - A
numpy.arraywith the regression coefficients. The last element in the array is the intercept.
- predictors (DataFrame) – a
Logistic Regression¶
-
class
pyfora.algorithms.BinaryLogisticRegressionFitter(C, hasIntercept=True, method='newton-cg', interceptScale=1.0, tol=0.0001, maxIter=100000.0, splitLimit=1000000)¶ A logistic regression “fitter” ithat holds fitting parameters used to fit logit models.
Parameters: - C (float) – Inverse of regularization strength; must be a positive float.
- hasIntercept (bool) – If True, include an intercept (aka bias) term in the fitted models.
- method (string) – one of ‘newton-cg’ (default) or ‘majorization’
- interceptScale (float) – When
hasInterceptis True, feature vectors become[x, interceptScale], i.e. we add a “synthetic” feature with constant valueinterceptScaleto all of the feature vectors. This synthetic feature is subject to regularization as all other features. To lessen the effect of regularization, users should increase this value. - tol (float) – Tolerance for stopping criteria. Fitting stops when
the l2-norm of the parameters to update do not change more than
tol. - maxIter (int) – A hard limit on the number of update cycles allowed.
-
fit(X, y)¶ - fit a (regularized) logit model to the predictors
Xand responsesy.Parameters: - X – a dataframe of feature vectors.
- y – a dataframe (with one column) which contains the “target” values,
corresponding to the feature vectors in
X.
Returns: A
BinaryLogisticRegressionModelwhich represents the fit model.Example:
# fit a logit model without intercept using regularizer 1.0 from pyfora.algorithms import BinaryLogisticRegressionFitter fitter = BinaryLogisticRegressionFitter(1.0, False) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = fitter.fit(x, y)
-
class
pyfora.algorithms.logistic.BinaryLogisticRegressionModel.BinaryLogisticRegressionModel(coefficients, classZeroLabel, classOneLabel, intercept, interceptScale, iters)¶ Represents a fit logit model.
-
coefficients¶ numpy.array – The regressions coefficients.
-
intercept¶ float – The fitted model’s intercept
Note
This class is not intended to be constructed directly. Instances of it are returned by
fit().-
predict(X)¶ Predict the class labels of
X.Parameters: X (DataFrame, or numpy.array) – a set of feature vectors Returns: array containing the predicted class labels. Return type: numpy.array
-
predict_probability(X)¶ Estimate the conditional class-zero probability for the features in
X.Parameters: X (DataFrame, or numpy.array) – a set of feature vectors Returns: array containing the predicted probabilities. Return type: numpy.array
-
Regression Trees¶
-
class
pyfora.algorithms.regressionTrees.RegressionTree.RegressionTreeBuilder(maxDepth, minSamplesSplit=2, numBuckets=10000, minSplitThresh=1000000)¶ Fits regression trees to data using specified tree parameters.
Parameters: - maxDepth (int) – The maximum depth of a fit tree
- minSamplesSplit (int) – The minimum number of samples required to split a node
- numBuckets (int) – The number of buckets used in the estimation of optimal column splits.
- minSplitThresh (int) – an “internal” argument, not generally of interest to
casual users, giving the splitting rule in
computeBucketedSampleSummaries.
Returns: A
RegressionTreeinstance.Examples:
from pyfora.algorithms import RegressionTreeBuilder builder = RegressionTreeBuilder(2) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) regressionTree = builder.fit(x, y)
-
static
buildTree(x, y, minSamplesSplit, maxDepth)¶ Fit a regression tree to predictors x and responses y using parameters minSamplesSplit and maxDepth.
Parameters: - x (
pandas.DataFrame) – of the predictors. - y (
pandas.DataFrame) – giving the responses. - maxDepth – The maximum depth of a fit tree
- minSamplesSplit – The minimum number of samples required to split a node
- x (
-
fit(x, y)¶ Using a
RegressionTreeBuilder, fit a regression tree to predictors x and responses y.Parameters: - x (
pandas.DataFrame) – of the predictors. - y (
pandas.DataFrame) – giving the responses.
Returns: a
RegressionTreeinstance.Examples:
builder = pyfora.algorithms.regressionTrees.RegressionTree.RegressionTreeBuilder(2) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) regressionTree = builder.fit(x, y)
- x (
-
class
pyfora.algorithms.regressionTrees.RegressionTree.RegressionTree(rules, numDimensions=None, columnNames=None)¶ A class representing a regression tree.
A regression tree is represented, essentially, as a list of “rules”, which are either
SplitRule, giving “split” nodes, which divide the domain by a hyperplane, orRegressionLeafRule, which just hold a prediction value.Note
This class is not generally instantiated directly by users. Instead, they are normally returned by
RegressionTreeBuilder.-
predict(x, depth=None)¶ Predicts the responses corresponding to
pandas.DataFramex.Returns: A pandas.Seriesgiving the predictions of the rows ofx.Examples:
from pyfora.algorithms import RegressionTreeBuilder builder = RegressionTreeBuilder(2) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) regressionTree = builder.fit(x, y) # predict `regressionTree` on `x` itself regressionTree.predict(x)
-
score(x, yTrue)¶ Returns the coefficient of determination R2 of the prediction.
The coefficient R2 is defined as
(1 - u / v), whereuis the regression sum of squares((yTrue - yPredicted) ** 2).sum()andvis the residual sum of squares((yTrue - yTrue.mean()) ** 2).sum(). Best possible score is1.0, lower values are worse.Returns: (float) the R2 value Examples:
from pyfora.algorithms import RegressionTreeBuilder builder = RegressionTreeBuilder(2) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) regressionTree = builder.fit(x, y) # predict `regressionTree` on `x` itself regressionTree.score(x, y)
-
Gradient Boosting¶
-
class
pyfora.algorithms.regressionTrees.GradientBoostedRegressorBuilder.GradientBoostedRegressorBuilder(maxDepth=3, nBoosts=100, learningRate=1.0, minSamplesSplit=2, numBuckets=10000, loss='l2')¶ A class which builds (or “fits”) gradient-boosted regression trees to data with specified parameters. These parameters are
Parameters: - maxDepth (int) – The max depth allowed of each constituent regression tree.
- nBoosts (int) – The number of “boosting iterations” used.
- learningRate (float) – The learning rate of the model, used for regularization.
Each successive tree from boosting stages are added with multiplier
learningRate. - minSamplesSplit (int) – The minimum number of samples required to split a regression tree node.
- numBuckets (int) – The number of buckets used in the estimation of optimal column splits for building regression trees.
- loss – the loss used when forming gradients. Defaults to
l2, for least-squares loss. The only other allowed value currently islad, for “least absolute deviation” (aka l1-loss).
-
fit(X, y)¶ Fits predictors
Xto responsesy.Parameters: - X (
pandas.DataFrame) – predictors. - y (
pandas.DataFrame) – responses.
Returns: A
RegressionModelinstance.Examples:
from pyfora.algorithms import GradientBoostedRegressorBuilder builder = GradientBoostedRegressorBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y)
- X (
-
iterativeFitter(X, y)¶ Returns an
IterativeFitterinstance which can iteratively fit boosting models.Parameters: - X (
pandas.DataFrame) – predictors. - y (
pandas.DataFrame) – responses.
Examples:
from pyfora.algorithms import GradientBoostedRegressorBuilder builder = GradientBoostedRegressorBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) fitter = builder.iterativeFitter(x, y) # compute scores vs number of boosts numBoosts = 5 scores = [] for ix in xrange(numBoosts): fitter = fitter.next() scores = scores + [fitter.model.score(x, y)]
- X (
-
class
pyfora.algorithms.regressionTrees.RegressionModel.RegressionModel(additiveRegressionTree, X, XDimensions, yAsSeries, loss, regressionTreeBuilder, learningRate)¶ A class representing a gradient-boosted regression tree model fit to data.
Note
These classes are not normally instantiated directly. Instead, they are typically returned by
GradientBoostedRegressorBuilderinstances.-
predict(df, nEstimators=None)¶ Predict on the
pandas.DataFramedf.Example:
from pyfora.algorithms import GradientBoostedRegressorBuilder builder = GradientBoostedRegressorBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y) # predict `x` using the model `model`: model.score(x, y)
-
score(X, yTrue)¶ Return the coefficient of determination (R2) of the prediction.
The coefficient R2 is defined as
(1 - u / v), whereuis the regression sum of squares((yTrue - yPredicted) ** 2).sum()andvis the residual sum of squares((yTrue - yTrue.mean()) ** 2).sum(). Best possible score is1.0, lower values are worse.Parameters: - X – the predictor DataFrame.
- yTrue – the (true) responses DataFrame.
Returns: (float) the R2 value.
Example:
from pyfora.algorithms import GradientBoostedRegressorBuilder builder = GradientBoostedRegressorBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y) # compute the score of the fit model: model.score(x, y)
-
-
class
pyfora.algorithms.regressionTrees.GradientBoostedRegressorBuilder.IterativeFitter(model, predictions)¶ A sort of iterator class which is capable of fitting subsequent boosting models.
-
model¶ the current regression model.
-
predictions¶ the current predictions of the regression model (with respect to the training set implicit in
model).
Note
This class is typically not instantiated directy. Instead these classes are returned from
iterativeFitter().-
next()¶ Fit one boosting stage, returning a new
IterativeFitterobject that holds the next regression model and predictions.Examples:
from pyfora.algorithms import GradientBoostedRegressorBuilder builder = GradientBoostedRegressorBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) fitter = builder.iterativeFitter(x, y) # compute scores vs number of boosts numBoosts = 5 scores = [] for ix in xrange(numBoosts): fitter = fitter.next() scores = scores + [fitter.model.score(x, y)]
-
-
class
pyfora.algorithms.regressionTrees.GradientBoostedClassifierBuilder.GradientBoostedClassifierBuilder(maxDepth=3, nBoosts=100, learningRate=1.0, minSamplesSplit=2, numBuckets=10000)¶ A class which builds (or “fits”) gradient boosted (regression) trees to form classification models.
Parameters: - maxDepth (int) – The max depth allowed of each constituent regression tree.
- nBoosts (int) – The number of boosting iterations used.
- learningRate (float) – The learning rate of the model, used for regularization.
Each successive tree from boosting stages are added with multiplier
learningRate. - minSamplesSplit (int) – The minimum number of samples required to split a regression tree node.
- numBuckets (int) – The number of buckets used in the estimation of optimal column splits for building regression trees.
- loss – the loss used when forming gradients. Defaults to
l2, for least-squares loss. The only other allowed value currently islad, for “least absolute deviation” (akal1-loss).
Note
Only
nClasses = 2cases are currently supported.-
fit(X, y)¶ Fit predictors
Xto responsesy.Parameters: - X (
pandas.DataFrame) – predictors. - y (
pandas.DataFrame) – responses.
Returns: Examples:
from pyfora.algorithms import GradientBoostedClassifierBuilder builder = GradientBoostedClassifierBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y)
- X (
-
iterativeFitter(X, y)¶ Create an
IterativeFitterinstance which can iteratively fit boosting models.Parameters: - X (
DataFrame) – predictors. - y (
DataFrame) – responses.
Examples:
from pyfora.algorithms import GradientBoostedClassifierBuilder builder = GradientBoostedClassifierBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) fitter = builder.iterativeFitter(x, y) numBoosts = 5 for ix in xrange(numBoosts): fitter = fitter.next()
- X (
-
class
pyfora.algorithms.regressionTrees.BinaryClassificationModel.BinaryClassificationModel(additiveRegressionTree, X, classes, XDimensions, yAsSeries, loss, baseModelBuilder, learningRate)¶ A class representing a gradient-boosted binary classification tree model fit to data.
Note
These classes are not normally instantiated directly. Instead, they are typically returned by
GradientBoostedClassifierBuilderinstances.-
deviance(x, yTrue)¶ Compute the binomial deviance (average negative log-likihood) of the instances in predictors
Xwith responsesy.Parameters: - x – the predictor DataFrame.
- yTrue – the (true) responses DataFrame.
Examples:
builder = GradientBoostedClassifierBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y) # compute the deviance: model.deviance(x, y)
-
predict(df)¶ Predict the class labels of the rows of
df.Parameters: df ( pandas.DataFrame) – input DataFrame.Returns: A pandas.Seriesgiving the row-wise predictions.Examples:
builder = GradientBoostedClassifierBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y) # use the fit model to predict `x` itself: model.predict(x)
-
predictProbability(df)¶ Return class-zero probability estimates of the rows of a DataFrame
df.Parameters: df ( pandas.DataFrame) – input DataFrame.Returns: A pandas.Seriesgiving the row-wise estimated class-zero probability estimatesExamples:
builder = GradientBoostedClassifierBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y) # use the fit model to predict `x` itself: model.predictProbability(x)
-
score(x, yTrue)¶ Compute the mean accuracy in predicting
xwith respect toyTrue.Parameters: - x – the predictor DataFrame.
- yTrue – the (true) responses DataFrame.
Examples:
builder = GradientBoostedClassifierBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y) # use the fit model to predict `x` itself: model.score(x, y)
-
-
class
pyfora.algorithms.regressionTrees.GradientBoostedClassifierBuilder.IterativeFitter(model, previousRegressionValues)¶ A sort of iterator class which is capable of fitting subsequent boosting models.
-
model the current regression model.
-
predictions the current predictions of the regression model (with respect to the training set implicit in
model).
Note
This class is typically not instantiated directy: instead these classes are returned from
iterativeFitter().-
next()¶ Boost once and return a new
IterativeFitterReturns: A IterativeFitterinstance.Example:
from pyfora.algorithms import GradientBoostedClassifierBuilder builder = GradientBoostedClassifierBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) fitter = builder.iterativeFitter(x, y) # compute scores vs number of boosts numBoosts = 5 scores = [] for ix in xrange(numBoosts): fitter = fitter.next() scores = scores + [fitter.model.score(x, y)]
-
Data Frames¶
-
pyfora.pandas_util.read_csv_from_string(data)¶ Reads a string in CSV format into a DataFrame. This function is similar to
pandas.read_csv()but it takes a string as input instead of a file.This function is intended to be used in pyfora code that runs remotely in a pyfora cluster.
Parameters: data (str) – a string of comma-separated values Returns: A pandas.DataFramethat holds the parsed data.Note
This function currently assumes that all values are of type float (or floatifiable), and that the first row contains column headers. This limitation will be removed in the near future.