Package 'binst'

Title: Data Preprocessing, Binning for Classification and Regression
Description: Various supervised and unsupervised binning tools including using entropy, recursive partition methods and clustering.
Authors: Chapman Siu
Maintainer: Chapman Siu <[email protected]>
License: MIT + file LICENSE
Version: 0.2.1
Built: 2025-01-08 06:13:41 UTC
Source: https://github.com/sourdoughcat/binst

Help Index


Creates bins given breaks

Description

Creates bins given breaks

Usage

create_bins(x, breaks, method = "cuts")

Arguments

x

X is a numeric vector which is to be discretized

breaks

Breaks are the breaks for the vector X to be broken at. This excludes endpoints

method

the approach to bin the variable, can either be cuts or hinge.

Value

A vector same length as X is returned with the numeric discretization

See Also

create_breaks

Examples

create_bins(1:10, c(3, 5))

A convenience functon for creating breaks with various methods.

Description

A convenience functon for creating breaks with various methods.

Usage

create_breaks(x, y = NULL, method = "kmeans", control = NULL, ...)

Arguments

x

X is a numeric vector to be discretized

y

Y is the response vector used for calculating metrics for discretization

method

Method is the type of discretization approach used. Possible methods are: "dt", "entropy", "kmeans", "jenks"

control

Control is used for optional parameters for the method. It is a list of optional parameters for the function

...

instead of passing a list into control, arguments can be parsed as is.

Value

A vector containing the breaks

See Also

get_control, create_bins

Examples

kmeans_breaks <- create_breaks(1:10)
create_bins(1:10, kmeans_breaks)

# passing the k means parameter "centers" = 4
kmeans_breaks <- create_breaks(1:10, list(centers=4))
create_bins(1:10, kmeans_breaks)

entropy_breaks <- create_breaks(1:10, rep(c(1,2), each = 5), method="entropy")
create_bins(1:10, entropy_breaks)

dt_breaks <- create_breaks(iris$Sepal.Length, iris$Species, method="dt")
create_bins(iris$Sepal.Length, dt_breaks)

Create breaks using decision trees (recursive partitioning)

Description

Create breaks using decision trees (recursive partitioning)

Usage

create_dtbreaks(x, y, control = NULL)

Arguments

x

X is a numeric vector to be discretized

y

Y is the response vector used for calculating metrics for discretization

control

Control is used for optional parameters for the method

Value

A vector containing the breaks

See Also

create_breaks

Examples

dt_breaks <- create_breaks(iris$Sepal.Length, iris$Species, method="dt")
create_bins(iris$Sepal.Length, dt_breaks)

Create breaks using earth (i.e. MARS)

Description

Create breaks using earth (i.e. MARS)

Usage

create_earthbreaks(x, y, control = NULL)

Arguments

x

X is a numeric vector to be discretized

y

Y is the response vector used for calculating metrics for discretization

control

Control is used for optional parameters for the method

Value

A vector containing the breaks

See Also

create_breaks

Examples

earth_breaks <- create_breaks(x=iris$Sepal.Length, y=iris$Sepal.Width, method="earth")
create_bins(iris$Sepal.Length, earth_breaks)

Create Jenks breaks

Description

Create Jenks breaks

Usage

create_jenksbreaks(x, control = NULL)

Arguments

x

X is a numeric vector to be discretized

control

Control is used for optional parameters for the method

Value

A vector containing the breaks

See Also

create_breaks

Examples

jenks_breaks <- create_breaks(1:10, method="jenks")
create_bins(1:10, jenks_breaks)

Create kmeans breaks.

Description

Create kmeans breaks.

Usage

create_kmeansbreaks(x, control = NULL)

Arguments

x

X is a numeric vector to be discretized

control

Control is used for optional parameters for the method

Value

A vector containing the breaks

See Also

create_breaks

Examples

kmeans_breaks <- create_breaks(1:10)
create_bins(1:10, kmeans_breaks)

Create breaks using mdlp

Description

Create breaks using mdlp

Usage

create_mdlpbreaks(x, y)

Arguments

x

X is a numeric vector to be discretized

y

Y is the response vector used for calculating metrics for discretization

Value

A vector containing the breaks

See Also

create_breaks

Examples

entropy_breaks <- create_breaks(1:10, rep(c(1,2), each = 5), method="entropy")
create_bins(1:10, entropy_breaks)

gets the default parameters for each method.

Description

gets the default parameters for each method.

Usage

get_control(method = "kmeans", control = NULL)

Arguments

method

Method is the type of discretization approach used

control

Control are the controls for the algorithm

Value

List of default parameters