Pandas¶

Responsible for standardizing the use of DataFrame and Series across the project.

class PandasDataframe(path, df, **kwargs)[source]

Bases: object

Class responsible for the standardization and manipulation of Pandas DataFrames.

path

Absolute path to the CSV file.

Type:: str

df

Data stored in the dataframe.

Type:: pandas.DataFrame

list

List of dataframes used for concatenation.

Type:: list, optional

dict

Dictionary used to create a dataframe.

Type:: dict, optional

csv_to_df()[source]: Reads a CSV file from the specified path and loads it into the dataframe.

df_to_csv(path)[source]

Exports the current dataframe to a CSV file.

Parameters:: path (str) – Destination path for the CSV file.

dict_to_df()[source]: Converts the stored dictionary into a dataframe.

drop_column(column, direction)[source]

Drops rows or columns from the dataframe.

Parameters:

column (str or list) – Column name(s) or row label(s) to drop.
direction (int) – Axis to drop from (0 for rows, 1 for columns).

find_row_data(row)[source]

Retrieves a row from the dataframe by index.

Parameters:: row (int) – Row index.
Returns:: Row data.
Return type:: pandas.Series

find_row_date_greater_or_equals_than_indicated(date_str) → tuple[bool, int][source]

Finds the first row where the date is greater than or equal to the given date.

Parameters:: date_str (str or datetime) – Reference date.
Returns:: (True, index) if found, otherwise (False, 0).
Return type:: tuple

get_column_in_list(column)[source]

Returns a dataframe column as a Python list.

Parameters:: column (str) – Column name.
Returns:: Column values as a list.
Return type:: list

group_element(group_element)[source]

Groups the dataframe by the specified column(s).

Parameters:: group_element (str or list) – Column(s) to group by.

list_to_df()[source]: Concatenates a list of dataframes into a single dataframe.

order_columns(order_list)[source]

Reorders the dataframe columns.

Parameters:: order_list (list) – Desired column order.

query_date(start_date, end_date, column_name)[source]

Filters rows between two dates based on a specified date column.

Parameters:

start_date (str or datetime) – Start date for filtering.
end_date (str or datetime) – End date for filtering.
column_name (str) – Name of the date column.

query_date_and_element(start_date, end_date, date_column_name, investment_cnpj, investment_column_cnpj)[source]

Filters rows based on both a date range and a specific element.

Parameters:

start_date (str or datetime) – Start date for filtering.
end_date (str or datetime) – End date for filtering.
date_column_name (str) – Name of the date column.
investment_cnpj (str) – Value to filter in the investment column.
investment_column_cnpj (str) – Column name containing the investment identifier.

Returns:

Filtered dataframe.

Return type:

pandas.DataFrame

query_element_in(column, collection)[source]

Filters rows where column values are within a given collection.

Parameters:

column (str) – Column name to filter.
collection (list or set) – Collection of values to match.

reset_index()[source]: Resets the dataframe index.

sort_elements_list(sort_list)[source]

Sorts the dataframe by the specified columns.

Parameters:: sort_list (list) – List of column names to sort by.