API
- class src.autofeatinsights.autofeat_class.FeatureDiscovery[source]
Bases:
object- add_relationship(table1: str, col1: str, table2: str, col2: str, weight: float)[source]
Adds a relationship between two columns in different tables.
- Parameters:
table1 (str) – The name of the first table.
col1 (str) – The name of the column in the first table.
table2 (str) – The name of the second table.
col2 (str) – The name of the column in the second table.
weight (float) – The weight of the relationship.
- add_table(table: str)[source]
Adds an extra table to the list of tables used for feature generation.
- Parameters:
table (str) – The name of the table to be added.
- adjust_non_null_ratio(tree_id: int, table: str, value: float)[source]
Adjusts the non-null ratio for a specific tree and table.
- Parameters:
tree_id (int) – The ID of the tree.
table (str) – The name of the table.
value (float) – The new non-null ratio value.
- adjust_redundancy_value(tree_id: int, feature: str, value: float)[source]
Adjusts the redundancy value for a specific feature in a given tree.
- Parameters:
tree_id (int) – The ID of the tree.
feature (str) – The name of the feature.
value (float) – The new redundancy value.
- adjust_relevance_value(tree_id: int, feature: str, value: float)[source]
Adjusts the relevance value of a feature for a specific tree.
- Parameters:
tree_id (int) – The ID of the tree.
feature (str) – The name of the feature.
value (float) – The new relevance value.
- Returns:
None
- augment_dataset(algorithm='GBM', relation_threshold: float = 0.5, non_null_threshold=0.5, matcher='coma', top_k_features: int = 10, top_k_paths: int = 3, explain=True, verbose=True, use_cache=True)[source]
Augments the dataset by finding relationships between features, computing join trees, and evaluating the trees.
- Parameters:
algorithm (str) – The algorithm to use for tree evaluation. Default is “GBM”.
relation_threshold (float) – The threshold for considering a relationship between features. Default is 0.5.
non_null_threshold – The threshold for considering a feature as non-null. Default is 0.5.
matcher (str) – The matcher to use for finding relationships. Default is “coma”.
top_k_features (int) – The number of top features to select. Default is 10.
top_k_paths (int) – The number of top paths to select. Default is 3.
explain (bool) – Whether to explain the process. Default is True.
verbose (bool) – Whether to print verbose output. Default is True.
use_cache (bool) – Whether to use cached relationship weights. Default is True.
- base_dataset: str
- compute_join_trees(top_k_features: int = 10, non_null_threshold=0.5, explain=False, verbose=True)[source]
Compute join trees for feature selection.
- Parameters:
top_k_features (int) – Number of top features to select. Defaults to 10.
non_null_threshold (float) – Threshold for non-null ratio. Defaults to 0.5.
explain (bool) – Whether to explain the join trees. Defaults to False.
verbose (bool) – Whether to print verbose output. Defaults to True.
- display_join_tree(tree_id)[source]
Display the join path with the given tree_id.
Parameters: - tree_id: The ID of the join path to display.
- display_join_trees(top_k: int | None = None)[source]
Display the join trees for the AutoFeatClass instance.
- Parameters:
top_k (int) – The number of join trees to display. If None, display all join trees.
- display_table_relationship(table1: str, table2: str)[source]
Display the relationship between two tables.
- Parameters:
table1 (str) – The name of the first table.
table2 (str) – The name of the second table.
- evaluate_augmented_table(tree_id: int, algorithm='GBM', verbose=False)[source]
Evaluate the augmented table using the specified algorithm and tree ID.
Parameters: - tree_id (int): The ID of the tree to use for evaluation. - algorithm (str): The algorithm to use for evaluation. Default is ‘GBM’. - verbose (bool): Whether to print verbose output. Default is False.
- evaluate_trees(algorithm='GBM', top_k_paths: int = 3, verbose=True, explain=False)[source]
Evaluate the performance of the generated trees.
Parameters: - algorithm (str): The algorithm to use for evaluation. Default is ‘GBM’. - top_k_paths (int): The number of top paths to consider. Default is 3. - verbose (bool): Whether to print verbose output. Default is True. - explain (bool): Whether to explain the evaluation results. Default is False.
- exlude_tables: [(<class 'str'>, <class 'str'>)]
- explain_relationship(table1: str, table2: str)[source]
Explains the relationship between two tables.
- Parameters:
table1 (str) – The name of the first table.
table2 (str) – The name of the second table.
- explain_result(tree_id: int, model: str = 'GBM')[source]
Explain the result of a specific tree in the AutoFeat pipeline.
- Parameters:
tree_id (int) – The ID of the tree to explain.
model (str, optional) – The model to use for explanation. Defaults to ‘GBM’.
- explain_tree(tree_id: int)[source]
Explain the tree identified by the given tree_id.
- Parameters:
tree_id (int) – The ID of the tree to explain.
- explore: bool
- extra_tables: [(<class 'str'>, <class 'str'>)]
- find_relationships(matcher='coma', relationship_threshold: float = 0.5, explain=False, use_cache=True, verbose=True)[source]
Finds relationships between features in the dataset.
- Parameters:
matcher (str, optional) – The name of the matcher to use for finding relationships. Defaults to “coma”.
relationship_threshold (float, optional) – The threshold value for determining the strength of a relationship. Defaults to 0.5.
explain (bool, optional) – Whether to provide an explanation for the relationships found. Defaults to False.
use_cache (bool, optional) – Whether to use a cache for storing previously computed relationships. Defaults to True.
verbose (bool, optional) – Whether to print verbose output during the process. Defaults to True.
- get_tables_repository()[source]
Retrieves the tables from the repository.
- Returns:
A list of table paths.
- Return type:
tables (list)
- get_weights_from_and_to_table(from_table, to_table)[source]
Returns a list of weights that have the specified ‘from_table’ and ‘to_table’ values.
- Parameters:
from_table (str) – The source table name.
to_table (str) – The destination table name.
- Returns:
A list of weights that match the specified ‘from_table’ and ‘to_table’ values.
- Return type:
list
- get_weights_from_table(table: str)[source]
Returns a list of weights from the specified table.
- Parameters:
table (str) – The name of the table.
- Returns:
A list of weights from the specified table.
- Return type:
list
- inspect_join_tree(tree_id: int)[source]
Inspects the join tree with the given tree_id.
- Parameters:
tree_id (int) – The ID of the join tree to inspect.
- join_keys: dict = {}
- materialise_join_tree(tree_id: int)[source]
Materializes the join tree with the given tree_id.
- Parameters:
tree_id (int) – The ID of the join tree to materialize.
- Returns:
The materialized join tree.
- move_features_to_discarded(tree_id: int, features: [<class 'str'>])[source]
Moves the specified features to the discarded list for the given tree.
- Parameters:
tree_id (int) – The ID of the tree.
features (list[str]) – The list of features to be moved to the discarded list.
- move_features_to_selected(tree_id: int, features: [<class 'str'>])[source]
Moves the specified features from discarded to the selected features list for the given tree.
- Parameters:
tree_id (int) – The ID of the tree.
features (list[str]) – The list of features to be moved.
- non_null_ratio_threshold: float
- partial_join: DataFrame
- partial_join_selected_features: dict = {}
- paths: [<class 'src.autofeatinsights.functions.classes.Tree'>]
- read_relationships(file_path)[source]
Reads the relationships from a file and updates the object’s internal state.
- Parameters:
file_path (str) – The path to the file containing the relationships.
- remove_join_path_from_tree(tree_id: int, table: str)[source]
Removes a join path from the tree.
- Parameters:
tree_id (int) – The ID of the tree.
table (str) – The name of the table to remove the join path from.
- remove_relationship(table1: str, col1: str, table2: str, col2: str)[source]
Removes a relationship between two columns in different tables.
- Parameters:
table1 (str) – The name of the first table.
col1 (str) – The name of the column in the first table.
table2 (str) – The name of the second table.
col2 (str) – The name of the column in the second table.
- remove_table(table: str)[source]
Removes a table from the list of extra tables and adds it to the list of excluded tables.
- Parameters:
table (str) – The name of the table to be removed.
- results: [<class 'src.autofeatinsights.functions.classes.Result'>]
- set_base_table(base_table: str, target_column: str)[source]
Sets the base table and target column for feature generation.
- Parameters:
base_table (str) – The name of the base table.
target_column (str) – The name of the target column.
- Returns:
None
- set_dataset_repository(dataset_repository: List[str] = [], all_tables: bool = False)[source]
Sets the dataset repository for the AutofeatClass object.
Parameters: - dataset_repository (List[str]): A list of dataset paths. - all_tables (bool): Flag indicating whether to use all tables in the repository.
Raises: - Exception: If both dataset_repository and all_tables are specified. - Exception: If neither dataset_repository nor all_tables are specified.
- show_features(tree_id: int, show_discarded_features: bool = False)[source]
Display the features for a given tree ID.
- Parameters:
tree_id (int) – The ID of the tree.
show_discarded_features (bool) – Whether to show discarded features or not. Default is False.
- targetColumn: str
- threshold: float
- update_relationship(table1: str, col1: str, table2: str, col2: str, weight: float)[source]
Update the relationship between two tables and their respective columns with a given weight.
- Parameters:
table1 (str) – The name of the first table.
col1 (str) – The name of the column in the first table.
table2 (str) – The name of the second table.
col2 (str) – The name of the column in the second table.
weight (float) – The weight of the relationship.
- weights: [<class 'src.autofeatinsights.functions.classes.Weight'>]