API Reference¶
This page is automatically generated from the docstrings in our code.
pyecoacc.util.analytics
¶
compare_models_cv(X, y, model_dict, cv=5, cv_method='stratified', individuals=None, random_state=42, round_digits=3)
¶
Compute summary tables to compare models.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
array
|
feature matrix |
required |
y
|
array
|
labels vector |
required |
model_dict
|
dict
|
a dictionary of models to evaluate with format {"model_name": model} |
required |
cv
|
int
|
number of cross-validation splits. Defaults to 5. |
5
|
cv_method
|
str
|
cross-validation method. Options include stratified, animal-groups, and LOIO. Defaults to "stratified". |
'stratified'
|
individuals
|
array
|
individual identifiers for grouping. Defaults to None. |
None
|
random_state
|
int
|
random state for reproducibility. Defaults to 42. |
42
|
round_digits
|
int
|
number of decimal places to round results. Defaults to 3. |
3
|
Returns:
| Name | Type | Description |
|---|---|---|
accuracy |
DataFrame
|
summary table of overall accuracy across CV splits |
recall |
DataFrame
|
summary table of recall per model |
precision |
DataFrame
|
summary table of precision per model |
f1 |
DataFrame
|
summary table of F1-score per model |
all_data |
dict
|
detailed results per model |
Source code in pyecoacc/util/analytics.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | |
compute_confusion_matrix(y_true, y_pred, normalize='true', round=2)
¶
Compute confusion matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array or list
|
Ground truth labels. |
required |
y_pred
|
array or list
|
Predicted labels. |
required |
normalize
|
str
|
When set to 'true' normalize rows of confusion matrix. Passed to sklearn.metrics.confusion_matrix. Defaults to 'true'. |
'true'
|
round
|
int
|
Number of decimal points to keep. Defaults to 2. |
2
|
Returns:
| Name | Type | Description |
|---|---|---|
confusion_matrix |
DataFrame
|
Confusion matrix as a pandas DataFrame. |
Source code in pyecoacc/util/analytics.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | |
model_analytics_cv(X, y, model, cv=5, cv_method='stratified', individuals=None, random_state=42)
¶
Computes model analytics table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
array
|
feature matrix |
required |
y
|
array
|
labels vector |
required |
model
|
Pipeline or Estimator
|
model to evaluate |
required |
cv
|
int
|
number of cross-validation splits. Defaults to 5. |
5
|
cv_method
|
str
|
cross-validation method. Options include stratified, animal-groups, and LOIO. Defaults to "stratified". |
'stratified'
|
individuals
|
array
|
individual identifiers for grouping. Defaults to None. |
None
|
random_state
|
int
|
random state for reproducibility. Defaults to 42. |
42
|
Returns:
| Name | Type | Description |
|---|---|---|
overall_accuracy |
DataFrame
|
overall accuracy per CV split |
mean_report |
DataFrame
|
mean classification report across CV splits |
std_report |
DataFrame
|
standard deviation of classification report across CV splits |
splits_output |
dict
|
detailed classification reports per CV split |
Source code in pyecoacc/util/analytics.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | |
pyecoacc.util.time_budget
¶
compute_time_budget(raw_acc, clf, cm=None, apply_cm_correction=True)
¶
Use a trained classifier to compute a time budget.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_acc
|
array
|
the raw accelerometer data to compute the time budget from |
required |
clf
|
Pipeline
|
the trained classifier |
required |
cm
|
DataFrame
|
the confusion matrix used for correction. Defaults to None. |
None
|
apply_cm_correction
|
bool
|
If False, no correction is applied. Defaults to True. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
budget |
Series
|
the time budget |
Source code in pyecoacc/util/time_budget.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | |
confusion_matrix_correction(budget, cm)
¶
Apply the confusion matrix correction for time budgets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
budget
|
Series
|
the raw time budget |
required |
cm
|
DataFrame
|
the confusion matrix used for correction |
required |
Returns:
| Name | Type | Description |
|---|---|---|
corrected_budget |
Series
|
the corrected time budget |
Source code in pyecoacc/util/time_budget.py
6 7 8 9 10 11 12 13 14 15 16 17 | |
options: show_submodules: false members: - confusion_matrix_correction - compute_time_budget filters: ["!^"]
pyecoacc.util.training
¶
train_compute_cm(model, X, y, cm_estimation_percent=0.2, round=2)
¶
summary
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Pipeline or Estimator
|
the model to train and evaluate |
required |
X
|
array
|
featuers to train and evaluate on |
required |
y
|
array
|
labels to train and evaluate on |
required |
cm_estimation_percent
|
float
|
the fraction of data used for confusion matrix estimation. Defaults to .2. |
0.2
|
round
|
int
|
the number of decimal places to round the confusion matrix. Defaults to 2. |
2
|
Returns:
| Name | Type | Description |
|---|---|---|
confusion_matrix |
DataFrame
|
the estimated confusion matrix |
Source code in pyecoacc/util/training.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | |
options: show_submodules: false members: - train_compute_cm filters: ["!^"]
pyecoacc.util.preprocessing
¶
long_to_wide_multi_animal(df, id_col='AnimalID', segment_duration='1s', xcol='accX', ycol='accY', zcol='accZ', timestamp_col='Timestamp', sort_by_time=True)
¶
Make a wide dataframe: one row per non-overlapping segment. Appropriate to use for multiple animals with continuous data for each one.
Assumes (approximately) constant sampling interval; drops the last partial segment. Assumes no gaps in the data of each animal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
input dataframe |
required |
segment_duration
|
str
|
encodes the duration of each segment. Defaults to "1s". |
'1s'
|
xcol
|
str
|
the name of the column for the X acceleration axis. Defaults to "accX". |
'accX'
|
ycol
|
str
|
the name of the column for the Y acceleration axis. Defaults to "accY". |
'accY'
|
zcol
|
str
|
the name of the column for the Z acceleration axis. Defaults to "accZ". |
'accZ'
|
timestamp_col
|
str
|
the name of the column for the timestamp. Defaults to "Timestamp". |
'Timestamp'
|
sort_by_time
|
bool
|
if True, sort the data by timestamp. Defaults to True. |
True
|
id_col
|
str
|
description. Defaults to "AnimalID". |
'AnimalID'
|
Returns:
| Name | Type | Description |
|---|---|---|
segment_table |
DataFrame
|
a wide dataframe with one row per non-overlapping segment. |
Source code in pyecoacc/util/preprocessing.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
long_to_wide_segments(df, segment_duration='1s', xcol='accX', ycol='accY', zcol='accZ', timestamp_col='Timestamp', sort_by_time=True)
¶
Make a wide dataframe: one row per non-overlapping segment. Appropriate to use for a single animal with continuous data.
The input format uses the original long-shape columns: X, Y, Z, timestamp. Assumes (approximately) constant sampling interval; drops the last partial segment. Assumes no gaps.
We thank an anonymous reviewer for contributing this function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
input dataframe |
required |
segment_duration
|
str
|
encodes the duration of each segment. Defaults to "1s". |
'1s'
|
xcol
|
str
|
the name of the column for the X acceleration axis. Defaults to "accX". |
'accX'
|
ycol
|
str
|
the name of the column for the Y acceleration axis. Defaults to "accY". |
'accY'
|
zcol
|
str
|
the name of the column for the Z acceleration axis. Defaults to "accZ". |
'accZ'
|
timestamp_col
|
str
|
the name of the column for the timestamp. Defaults to "Timestamp". |
'Timestamp'
|
sort_by_time
|
bool
|
if True, sort the data by timestamp. Defaults to True. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
segment_table |
DataFrame
|
a wide dataframe with one row per non-overlapping segment. |
Source code in pyecoacc/util/preprocessing.py
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | |