pyfora¶
-
pyfora.
connect
(url, timeout=30.0)¶ Opens a connection to a pyfora cluster
Parameters: - url (str) – The HTTP URL of the cluster’s manager (e.g.
http://192.168.1.200:30000
) - timeout (Optional float) – A timeout for the operation in seconds, or None to wait indefinitely.
Returns: An
Executor
that can be used to submit work to the cluster.- url (str) – The HTTP URL of the cluster’s manager (e.g.
Exceptions¶
-
exception
pyfora.
PyforaError
¶ Base class for all pyfora exceptions.
-
exception
pyfora.
ConnectionError
¶ Raised when a connection to the pyfora backend cannot be established.
-
exception
pyfora.
NotCallableError
¶ Raised when an attempt is made to call a non-callable object.
-
exception
pyfora.
ComputationError
(remoteException, trace)¶ Raised when a remote computation results in an exception.
Parameters: - remoteException (Exception) – The exception raised by the remote computation.
- trace (Optional[List]) – A representation of the stack trace in which the exception was raised.
It takes the form:
[{'path':str, 'line': int}, ... ]
-
exception
pyfora.
PythonToForaConversionError
(message, trace=None)¶ Raised when an attempt is made to use a Python object that cannot be remoted by
pyfora
.This may happen when, for example:
- A function attempts to mutate state or produce side-effect (i.e. it is not “purely functional”).
- A call is made to a Python builtin that is not supported by
pyfora
(e.g.open()
)
Parameters: - message (str) – Error message.
- trace (Optional[List]) – A representation of the stack trace in which the exception was raised.
It takes the form:
[{'path':str, 'line': int}, ... ]
-
exception
pyfora.
ForaToPythonConversionError
¶ Raised when attempting to download a remote object that cannot be converted to Python.
-
exception
pyfora.
PyforaNotImplementedError
¶ Feature not yet implemented in
pyfora
.
-
exception
pyfora.
InvalidPyforaOperation
¶ Raised when a running computation performs an operation that cannot be faithfully executed with
pyfora
.
-
exception
pyfora.
ResultExceededBytecountThreshold
¶ Raised when attempting to download a remote object whose size exceeds the specified maximum.
Executor¶
-
class
pyfora.Executor.
Executor
(connection, pureImplementationMappings=None)¶ Submits computations to a pyfora cluster and marshals data to/from the local Python.
The Executor is the main point of interaction with a pyfora cluster. It is responible for sending computations to the cluster and returning the result as a RemotePythonObject future.
It is modeled after the same-named abstraction in the concurrent.futures module that is part of the Python3 standard library.
All interactions with the remote cluster are asynchronous and return
Future
objects that represent the in-progress operation.Python objects are sent to the server using the
define()
method, which returns aFuture
that resolves to aRemotePythonObject
corresponding to the submitted object.Similarly, functions and their arguments can be submitted using the
submit()
method which returns aFuture
that resolves to aRemotePythonObject
of the evaluated expression or raised exception.Note
This class is not intended to be constructed explicitly. Instances of it are created by calling
connect()
.Parameters: - connection (pyfora.Connection.Connection) – an open connection to a cluster.
- pureImplementationMappings (optional) – a
PureImplementationMappings
that defines mapping between Python libraries and their “pure”pyfora
implementation.
-
close
()¶ Closes the connection to the pyfora cluster.
-
define
(obj)¶ Create a remote representation of an object.
Sends the specified object to the server and return a Future that resolves to a RemotePythonObject representing the object on the server.
Parameters: obj – A python object to send Returns: A Future
that resolves to aRemotePythonObject
representing the object on the server.
-
exportS3Dataset
(valueAsString, bucketname, keyname)¶ Write a ComputedRemotePythonObject representing a
pyfora
string to S3Parameters: - valueAsString (RemotePythonObject.ComputedRemotePythonObject) – a computed string.
- bucketname (str) – The name of the S3 bucket to write to.
- keyname (str) – The S3 key to write to.
Returns: A
Future
representing the completion of the export operation. It resolves either toNone
(success) or to an instance ofPyforaError
.
-
getWorkerCount
()¶ Returns the number of workers connected to the cluster.
Returns: The number of workers currently available in the cluster. Return type: int
-
importRemoteFile
(path)¶ Loads the content of a file as a string
Note
The file must be available to all machines in the cluster using the specified path. If you run multiple workers you must either copy the file to all machines, or if using a network file-system, mount it into the same path on all machines.
In addition, pyfora may cache the content of the file. Changes to the file’s content made after it has been loaded may have no effect.
Parameters: path (str) – Full path to the file. This must be a valid path on all worker machines in the cluster. Returns: A Future
that resolves to aRemotePythonObject
representing the content of the file as a string.
-
importS3Dataset
(bucketname, keyname, verify=True)¶ Creates a
RemotePythonObject
that represents the content of an S3 key as a string.Parameters: - bucketname (str) – The S3 bucket to read from.
- keyname (str) – The S3 key to read.
- verify – Throw an exception immediately if the key or bucket cannot be read.
Returns: A
Future
that resolves to aRemotePythonObject
representing the content of the S3 key.
-
isClosed
()¶ Determine if the
Executor
is connected to the cluster.Returns: True
ifclose()
has been called,False
otherwise.Return type: bool
-
remotely
¶ Returns a
WithBlockExecutor.WithBlockExecutor
that can be used to enter a block of “pure” Python code.The
with executor.remotely:
syntax allows you to automatically submit an entire block of python code for remote execution. All the code nested in the remotelywith
block is submitted.Returns: A WithBlockExecutor
that extracts python code from a with block and submits it to the pyfora cluster for remote execution. Results of the remote execution are returned as RemotePythonObject and are automatically reasigned to their corresponding local variables in the with block.
-
submit
(fn, *args)¶ Submits a callable to be executed on the cluster with the provided arguments.
This function is shorthand for calling
define()
on the callable and all arguments and then invoking the remote callable with the remoted arguments.Returns: A Future
representing the given call. The future eventually resolves to aRemotePythonObject
instance or an exception.
WithBlockExecutor¶
-
class
pyfora.WithBlockExecutor.
WithBlockExecutor
(executor)¶ A helper object used to synchronously run blocks of code on a cluster.
When entering a
with
block using aWithBlockExecutor
, the body of the block is extracted and submitted to the pyfora cluster for execution, along with all its local dependencies. Variable assignments within the block are returned asRemotePythonObject
and reassigned to their corresponding local varialbes when exiting the block.Use
downloadAll()
,remoteAll()
, anddownloadSmall()
to modify the default behavior and select which objects should be downloaded from the server and which objects should be returned asRemotePythonObject
futures.Note
Instances of
WithBlockExecutor
are only intended to be created byExecutor
. User code typically usesremotely
to get aWithBlockExecutor
.-
downloadAll
()¶ Modify the executor to download all results into the local namespace.
Returns: self
to support chaining.
-
downloadSmall
(bytecount=10000)¶ Modify the executor to download small results into the local namespace and return proxies for everything else.
Returns: self
to support chaining.
-
remoteAll
()¶ Modify the executor to leave all results on the server and only return proxies (default).
Returns: self
to support chaining.
-
withStatusCallback
(callback)¶ Modify the executor to call ‘callback’ while computations are blocked with status updates.
‘callback’ will receive a json package from the server containing information about the current computation. This will override the default callback, which attempts to determine whether we’re in a jupyter notebook.
-
RemotePythonObject¶
A proxy for some object, data or callable that lives in memory on a pyfora cluster
-
class
pyfora.RemotePythonObject.
RemotePythonObject
(executor)¶ A local proxy for a python object that lives in memory on a pyfora cluster.
This is an abstract class and should not be used directly, but through its two subclasses:
DefinedRemotePythonObject
andComputedRemotePythonObject
.Parameters: executor – An Executor
-
toLocal
()¶ Downloads the remote object.
Returns: A Future
that resolves to the python object that thisRemotePythonObject
represents.
-
RemotePythonObject.DefinedRemotePythonObject¶
-
class
pyfora.RemotePythonObject.
DefinedRemotePythonObject
(objectId, executor)¶ A proxy that represents a local object, which has been uploaded to a pyfora cluster.
Note
Only
Executor
is intended to create instances ofDefinedRemotePythonObject
. They are created by callingdefine()
.Parameters: - objectId (int) – a value that uniquely identifies the remote object that
this
DefinedRemotePythonObject
represents. - executor – the
Executor
that created thisDefinedRemotePythonObject
.
- objectId (int) – a value that uniquely identifies the remote object that
this
RemotePythonObject.ComputedRemotePythonObject¶
-
class
pyfora.RemotePythonObject.
ComputedRemotePythonObject
(computedValue, executor, isException)¶ A proxy that represents a remote object created on a pyfora cluster as a result of some computation.
Note
Only
Executor
is intended to create instances ofComputedRemotePythonObject
. They are created by callingsubmit()
.Parameters: - computedValue – an instance of a SubscribableWebObject computedValue representing
the computation that produced this
ComputedRemotePythonObject
. - executor – the
Executor
that created thisDefinedRemotePythonObject
.
- computedValue – an instance of a SubscribableWebObject computedValue representing
the computation that produced this
Future¶
-
class
pyfora.Future.
Future
(onCancel=None)¶ Bases:
concurrent.futures._base.Future
This pyfora.Future object subclasses the standard Python concurrent.futures._base.Future object. See: http://pythonhosted.org/futures/ https://pypi.python.org/pypi/futures
Futures wrap the result to an asynchronous computation which can be accessed by a blocking call to
result()
.The pyfora Future object extends the concurrent.futures object by supporting cancellation with the
cancel()
method.-
cancel
()¶ Cancel a running computation
-
Algorithms¶
Linear Regression¶
-
pyfora.algorithms.
linearRegression
(predictors, responses)¶ Compute the regression coefficients (with intercept) for a set of predictors against responses.
Parameters: - predictors (DataFrame) – a
pandas.DataFrame
with the predictor columns. - responses (DataFrame) – a
pandas.DataFrame
whose first column is used as the regression’s target.
Returns: - A
numpy.array
with the regression coefficients. The last element in the array is the intercept.
- predictors (DataFrame) – a
Logistic Regression¶
-
class
pyfora.algorithms.
BinaryLogisticRegressionFitter
(C, hasIntercept=True, method='newton-cg', interceptScale=1.0, tol=0.0001, maxIter=100000.0, splitLimit=1000000)¶ A logistic regression “fitter” ithat holds fitting parameters used to fit logit models.
Parameters: - C (float) – Inverse of regularization strength; must be a positive float.
- hasIntercept (bool) – If True, include an intercept (aka bias) term in the fitted models.
- method (string) – one of ‘newton-cg’ (default) or ‘majorization’
- interceptScale (float) – When
hasIntercept
is True, feature vectors become[x, interceptScale]
, i.e. we add a “synthetic” feature with constant valueinterceptScale
to all of the feature vectors. This synthetic feature is subject to regularization as all other features. To lessen the effect of regularization, users should increase this value. - tol (float) – Tolerance for stopping criteria. Fitting stops when
the l2-norm of the parameters to update do not change more than
tol
. - maxIter (int) – A hard limit on the number of update cycles allowed.
-
fit
(X, y)¶ - fit a (regularized) logit model to the predictors
X
and responsesy
.Parameters: - X – a dataframe of feature vectors.
- y – a dataframe (with one column) which contains the “target” values,
corresponding to the feature vectors in
X
.
Returns: A
BinaryLogisticRegressionModel
which represents the fit model.Example:
# fit a logit model without intercept using regularizer 1.0 from pyfora.algorithms import BinaryLogisticRegressionFitter fitter = BinaryLogisticRegressionFitter(1.0, False) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = fitter.fit(x, y)
-
class
pyfora.algorithms.logistic.BinaryLogisticRegressionModel.
BinaryLogisticRegressionModel
(coefficients, classZeroLabel, classOneLabel, intercept, interceptScale, iters)¶ Represents a fit logit model.
-
coefficients
¶ numpy.array – The regressions coefficients.
-
intercept
¶ float – The fitted model’s intercept
Note
This class is not intended to be constructed directly. Instances of it are returned by
fit()
.-
predict
(X)¶ Predict the class labels of
X
.Parameters: X (DataFrame, or numpy.array) – a set of feature vectors Returns: array containing the predicted class labels. Return type: numpy.array
-
predict_probability
(X)¶ Estimate the conditional class-zero probability for the features in
X
.Parameters: X (DataFrame, or numpy.array) – a set of feature vectors Returns: array containing the predicted probabilities. Return type: numpy.array
-
Regression Trees¶
-
class
pyfora.algorithms.regressionTrees.RegressionTree.
RegressionTreeBuilder
(maxDepth, minSamplesSplit=2, numBuckets=10000, minSplitThresh=1000000)¶ Fits regression trees to data using specified tree parameters.
Parameters: - maxDepth (int) – The maximum depth of a fit tree
- minSamplesSplit (int) – The minimum number of samples required to split a node
- numBuckets (int) – The number of buckets used in the estimation of optimal column splits.
- minSplitThresh (int) – an “internal” argument, not generally of interest to
casual users, giving the splitting rule in
computeBucketedSampleSummaries
.
Returns: A
RegressionTree
instance.Examples:
from pyfora.algorithms import RegressionTreeBuilder builder = RegressionTreeBuilder(2) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) regressionTree = builder.fit(x, y)
-
static
buildTree
(x, y, minSamplesSplit, maxDepth)¶ Fit a regression tree to predictors x and responses y using parameters minSamplesSplit and maxDepth.
Parameters: - x (
pandas.DataFrame
) – of the predictors. - y (
pandas.DataFrame
) – giving the responses. - maxDepth – The maximum depth of a fit tree
- minSamplesSplit – The minimum number of samples required to split a node
- x (
-
fit
(x, y)¶ Using a
RegressionTreeBuilder
, fit a regression tree to predictors x and responses y.Parameters: - x (
pandas.DataFrame
) – of the predictors. - y (
pandas.DataFrame
) – giving the responses.
Returns: a
RegressionTree
instance.Examples:
builder = pyfora.algorithms.regressionTrees.RegressionTree.RegressionTreeBuilder(2) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) regressionTree = builder.fit(x, y)
- x (
-
class
pyfora.algorithms.regressionTrees.RegressionTree.
RegressionTree
(rules, numDimensions=None, columnNames=None)¶ A class representing a regression tree.
A regression tree is represented, essentially, as a list of “rules”, which are either
SplitRule
, giving “split” nodes, which divide the domain by a hyperplane, orRegressionLeafRule
, which just hold a prediction value.Note
This class is not generally instantiated directly by users. Instead, they are normally returned by
RegressionTreeBuilder
.-
predict
(x, depth=None)¶ Predicts the responses corresponding to
pandas.DataFrame
x
.Returns: A pandas.Series
giving the predictions of the rows ofx
.Examples:
from pyfora.algorithms import RegressionTreeBuilder builder = RegressionTreeBuilder(2) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) regressionTree = builder.fit(x, y) # predict `regressionTree` on `x` itself regressionTree.predict(x)
-
score
(x, yTrue)¶ Returns the coefficient of determination R2 of the prediction.
The coefficient R2 is defined as
(1 - u / v)
, whereu
is the regression sum of squares((yTrue - yPredicted) ** 2).sum()
andv
is the residual sum of squares((yTrue - yTrue.mean()) ** 2).sum()
. Best possible score is1.0
, lower values are worse.Returns: (float) the R2 value Examples:
from pyfora.algorithms import RegressionTreeBuilder builder = RegressionTreeBuilder(2) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) regressionTree = builder.fit(x, y) # predict `regressionTree` on `x` itself regressionTree.score(x, y)
-
Gradient Boosting¶
-
class
pyfora.algorithms.regressionTrees.GradientBoostedRegressorBuilder.
GradientBoostedRegressorBuilder
(maxDepth=3, nBoosts=100, learningRate=1.0, minSamplesSplit=2, numBuckets=10000, loss='l2')¶ A class which builds (or “fits”) gradient-boosted regression trees to data with specified parameters. These parameters are
Parameters: - maxDepth (int) – The max depth allowed of each constituent regression tree.
- nBoosts (int) – The number of “boosting iterations” used.
- learningRate (float) – The learning rate of the model, used for regularization.
Each successive tree from boosting stages are added with multiplier
learningRate
. - minSamplesSplit (int) – The minimum number of samples required to split a regression tree node.
- numBuckets (int) – The number of buckets used in the estimation of optimal column splits for building regression trees.
- loss – the loss used when forming gradients. Defaults to
l2
, for least-squares loss. The only other allowed value currently islad
, for “least absolute deviation” (aka l1-loss).
-
fit
(X, y)¶ Fits predictors
X
to responsesy
.Parameters: - X (
pandas.DataFrame
) – predictors. - y (
pandas.DataFrame
) – responses.
Returns: A
RegressionModel
instance.Examples:
from pyfora.algorithms import GradientBoostedRegressorBuilder builder = GradientBoostedRegressorBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y)
- X (
-
iterativeFitter
(X, y)¶ Returns an
IterativeFitter
instance which can iteratively fit boosting models.Parameters: - X (
pandas.DataFrame
) – predictors. - y (
pandas.DataFrame
) – responses.
Examples:
from pyfora.algorithms import GradientBoostedRegressorBuilder builder = GradientBoostedRegressorBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) fitter = builder.iterativeFitter(x, y) # compute scores vs number of boosts numBoosts = 5 scores = [] for ix in xrange(numBoosts): fitter = fitter.next() scores = scores + [fitter.model.score(x, y)]
- X (
-
class
pyfora.algorithms.regressionTrees.RegressionModel.
RegressionModel
(additiveRegressionTree, X, XDimensions, yAsSeries, loss, regressionTreeBuilder, learningRate)¶ A class representing a gradient-boosted regression tree model fit to data.
Note
These classes are not normally instantiated directly. Instead, they are typically returned by
GradientBoostedRegressorBuilder
instances.-
predict
(df, nEstimators=None)¶ Predict on the
pandas.DataFrame
df
.Example:
from pyfora.algorithms import GradientBoostedRegressorBuilder builder = GradientBoostedRegressorBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y) # predict `x` using the model `model`: model.score(x, y)
-
score
(X, yTrue)¶ Return the coefficient of determination (R2) of the prediction.
The coefficient R2 is defined as
(1 - u / v)
, whereu
is the regression sum of squares((yTrue - yPredicted) ** 2).sum()
andv
is the residual sum of squares((yTrue - yTrue.mean()) ** 2).sum()
. Best possible score is1.0
, lower values are worse.Parameters: - X – the predictor DataFrame.
- yTrue – the (true) responses DataFrame.
Returns: (float) the R2 value.
Example:
from pyfora.algorithms import GradientBoostedRegressorBuilder builder = GradientBoostedRegressorBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y) # compute the score of the fit model: model.score(x, y)
-
-
class
pyfora.algorithms.regressionTrees.GradientBoostedRegressorBuilder.
IterativeFitter
(model, predictions)¶ A sort of iterator class which is capable of fitting subsequent boosting models.
-
model
¶ the current regression model.
-
predictions
¶ the current predictions of the regression model (with respect to the training set implicit in
model
).
Note
This class is typically not instantiated directy. Instead these classes are returned from
iterativeFitter()
.-
next
()¶ Fit one boosting stage, returning a new
IterativeFitter
object that holds the next regression model and predictions.Examples:
from pyfora.algorithms import GradientBoostedRegressorBuilder builder = GradientBoostedRegressorBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) fitter = builder.iterativeFitter(x, y) # compute scores vs number of boosts numBoosts = 5 scores = [] for ix in xrange(numBoosts): fitter = fitter.next() scores = scores + [fitter.model.score(x, y)]
-
-
class
pyfora.algorithms.regressionTrees.GradientBoostedClassifierBuilder.
GradientBoostedClassifierBuilder
(maxDepth=3, nBoosts=100, learningRate=1.0, minSamplesSplit=2, numBuckets=10000)¶ A class which builds (or “fits”) gradient boosted (regression) trees to form classification models.
Parameters: - maxDepth (int) – The max depth allowed of each constituent regression tree.
- nBoosts (int) – The number of boosting iterations used.
- learningRate (float) – The learning rate of the model, used for regularization.
Each successive tree from boosting stages are added with multiplier
learningRate
. - minSamplesSplit (int) – The minimum number of samples required to split a regression tree node.
- numBuckets (int) – The number of buckets used in the estimation of optimal column splits for building regression trees.
- loss – the loss used when forming gradients. Defaults to
l2
, for least-squares loss. The only other allowed value currently islad
, for “least absolute deviation” (akal1-loss
).
Note
Only
nClasses = 2
cases are currently supported.-
fit
(X, y)¶ Fit predictors
X
to responsesy
.Parameters: - X (
pandas.DataFrame
) – predictors. - y (
pandas.DataFrame
) – responses.
Returns: Examples:
from pyfora.algorithms import GradientBoostedClassifierBuilder builder = GradientBoostedClassifierBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y)
- X (
-
iterativeFitter
(X, y)¶ Create an
IterativeFitter
instance which can iteratively fit boosting models.Parameters: - X (
DataFrame
) – predictors. - y (
DataFrame
) – responses.
Examples:
from pyfora.algorithms import GradientBoostedClassifierBuilder builder = GradientBoostedClassifierBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) fitter = builder.iterativeFitter(x, y) numBoosts = 5 for ix in xrange(numBoosts): fitter = fitter.next()
- X (
-
class
pyfora.algorithms.regressionTrees.BinaryClassificationModel.
BinaryClassificationModel
(additiveRegressionTree, X, classes, XDimensions, yAsSeries, loss, baseModelBuilder, learningRate)¶ A class representing a gradient-boosted binary classification tree model fit to data.
Note
These classes are not normally instantiated directly. Instead, they are typically returned by
GradientBoostedClassifierBuilder
instances.-
deviance
(x, yTrue)¶ Compute the binomial deviance (average negative log-likihood) of the instances in predictors
X
with responsesy
.Parameters: - x – the predictor DataFrame.
- yTrue – the (true) responses DataFrame.
Examples:
builder = GradientBoostedClassifierBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y) # compute the deviance: model.deviance(x, y)
-
predict
(df)¶ Predict the class labels of the rows of
df
.Parameters: df ( pandas.DataFrame
) – input DataFrame.Returns: A pandas.Series
giving the row-wise predictions.Examples:
builder = GradientBoostedClassifierBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y) # use the fit model to predict `x` itself: model.predict(x)
-
predictProbability
(df)¶ Return class-zero probability estimates of the rows of a DataFrame
df
.Parameters: df ( pandas.DataFrame
) – input DataFrame.Returns: A pandas.Series
giving the row-wise estimated class-zero probability estimatesExamples:
builder = GradientBoostedClassifierBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y) # use the fit model to predict `x` itself: model.predictProbability(x)
-
score
(x, yTrue)¶ Compute the mean accuracy in predicting
x
with respect toyTrue
.Parameters: - x – the predictor DataFrame.
- yTrue – the (true) responses DataFrame.
Examples:
builder = GradientBoostedClassifierBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) model = builder.fit(x, y) # use the fit model to predict `x` itself: model.score(x, y)
-
-
class
pyfora.algorithms.regressionTrees.GradientBoostedClassifierBuilder.
IterativeFitter
(model, previousRegressionValues)¶ A sort of iterator class which is capable of fitting subsequent boosting models.
-
model
the current regression model.
-
predictions
the current predictions of the regression model (with respect to the training set implicit in
model
).
Note
This class is typically not instantiated directy: instead these classes are returned from
iterativeFitter()
.-
next
()¶ Boost once and return a new
IterativeFitter
Returns: A IterativeFitter
instance.Example:
from pyfora.algorithms import GradientBoostedClassifierBuilder builder = GradientBoostedClassifierBuilder(1, 1, 1.0) x = pandas.DataFrame({'x0': [-1,0,1], 'x1': [0,1,1]}) y = pandas.DataFrame({'y': [0,1,1]}) fitter = builder.iterativeFitter(x, y) # compute scores vs number of boosts numBoosts = 5 scores = [] for ix in xrange(numBoosts): fitter = fitter.next() scores = scores + [fitter.model.score(x, y)]
-
Data Frames¶
-
pyfora.pandas_util.
read_csv_from_string
(data)¶ Reads a string in CSV format into a DataFrame. This function is similar to
pandas.read_csv()
but it takes a string as input instead of a file.This function is intended to be used in pyfora code that runs remotely in a pyfora cluster.
Parameters: data (str) – a string of comma-separated values Returns: A pandas.DataFrame
that holds the parsed data.Note
This function currently assumes that all values are of type float (or floatifiable), and that the first row contains column headers. This limitation will be removed in the near future.