Use Cases

These use cases gives you examples to show how to use the package. Not all methods are covered. For that, please have a look at the API.

Run entire algorithm

This code runs the entire algorithm on the repository “school” and the base table “school/base.csv” with the target column “class”.

from autofeatinsights.autofeat_class import FeatureDiscovery
autofeat = FeatureDiscovery()
autofeat.set_base_table(base_table="school/base.csv", target_column="class")
autofeat.set_dataset_repository(dataset_repository=["school"])
autofeat.augment_dataset(explain=True)

Run algorithm with initermediate steps

This code runs the algorithm with the intermediate steps.

from autofeatinsights.autofeat_class import FeatureDiscovery
autofeat = FeatureDiscovery()
autofeat.set_base_table(base_table="school/base.csv", target_column="class")
autofeat.set_dataset_repository(dataset_repository=["school"])

autofeat.find_relationships()
autofeat.calculate_join_trees()
autofeat.evaluate_trees()

Add and Remove tables from the repository

from autofeatinsights.autofeat_class import FeatureDiscovery
autofeat = FeatureDiscovery()
autofeat.set_base_table(base_table="school/base.csv", target_column="class")
autofeat.set_dataset_repository(dataset_repository=["school"])
autofeat.add_table(table="dataset/table.csv")
autofeat.remove_table(table="dataset/table2.csv")

Calculate Relationships with Insights and Control

This code calculates the relationships between the columns in the base table and shows results. Furthermore, the relationships are adjusted.

from autofeatinsights.autofeat_class import FeatureDiscovery
autofeat = FeatureDiscovery()
autofeat.set_base_table(base_table="school/base.csv", target_column="class")
autofeat.set_dataset_repository(dataset_repository=["school"])
autofeat.find_relationships()
# Shows the best relationships
autofeat.display_best_relationships()
# Shows the best relationship between 2 tables.
autofeat.display_table_relationship(table1="school/base.csv", table2="school/qe.csv")
# Removes the relationship between 2 columns in different tables.
autofeat.remove_relationship(table1="school/ap.csv", col1="SchoolName" table2="school/qe.csv", col2="School Name")
# Adjust the relationship between 2 tables in different tables.
autofeat.update_relationship(table1="school/ap.csv", col1="SchoolName" table2="school/qe.csv", col2="School Name", weight=0.2)
# Add relationship between 2 columns in different tables.
autofeat.add_relationship(table1="school/ap.csv", col1="SchoolName" table2="school/qe.csv", col2="School Name", weight=0.2)
# Explains the relationships betweeen 2 tables
autofeat.explain_relationship(table1="school/ap.csv", table2="school/qe.csv")

Calculate Join Trees with Insights and Control

This code calculates the join trees between the columns in the base table and shows results. Furthermore, the join trees are adjusted.

from autofeatinsights.autofeat_class import FeatureDiscovery
autofeat = FeatureDiscovery()
autofeat.set_base_table(base_table="school/base.csv", target_column="class")
autofeat.set_dataset_repository(dataset_repository=["school"])
autofeat.find_relationships()
autofeat.calculate_join_trees()
# Shows all the trees
autofeat.display_join_trees()
# Shows the best tree
autofeat.display_join_trees(top_k=1)
# Shows a single tree by id
autofeat.display_join_tree(tree_id=1)
# Shows all the details of a tree by id
autofeat.inspect_join_tree(tree_id=3)
# Remove a table from a tree
autofeat.remove_join_path_from_tree(tree_id=1, table="school/ap.csv")
# Show selected features from tree 1 and with discarded features
autofeat.show_features(tree_id=1, show_discarded=True)
# Explains the join tree
autofeat.explain_tree(tree_id=1)

Evaluate Join Trees with Insights and Control

This code evaluates the join trees and shows results.

from autofeatinsights.autofeat_class import FeatureDiscovery
autofeat = FeatureDiscovery()
autofeat.set_base_table(base_table="school/base.csv", target_column="class")
autofeat.set_dataset_repository(dataset_repository=["school"])
autofeat.find_relationships()
autofeat.calculate_join_trees()
# Evaluate all trees
autofeat.evaluate_trees(algorithm='GBM', top_k_paths: int = 3, verbose=True, explain=False)
# Explains results
autofeat.explain_result(tree_id=1, model="GBM")
# Retuns the best result
best_result = get_best_result()
# Evaluate a single tree
autofeat.evaluate_augmented_table(tree_id=1, algorithm='GBM', verbose=False)