Pandas

Responsible for standardizing the use of DataFrame and Series across the project.

class PandasDataframe(path, df, **kwargs)[source]

Bases: object

Class responsible for the standardization and manipulation of Pandas DataFrames.

path

Absolute path to the CSV file.

Type:

str

df

Data stored in the dataframe.

Type:

pandas.DataFrame

list

List of dataframes used for concatenation.

Type:

list, optional

dict

Dictionary used to create a dataframe.

Type:

dict, optional

csv_to_df()[source]

Reads a CSV file from the specified path and loads it into the dataframe.

df_to_csv(path)[source]

Exports the current dataframe to a CSV file.

Parameters:

path (str) – Destination path for the CSV file.

dict_to_df()[source]

Converts the stored dictionary into a dataframe.

drop_column(column, direction)[source]

Drops rows or columns from the dataframe.

Parameters:
  • column (str or list) – Column name(s) or row label(s) to drop.

  • direction (int) – Axis to drop from (0 for rows, 1 for columns).

find_row_data(row)[source]

Retrieves a row from the dataframe by index.

Parameters:

row (int) – Row index.

Returns:

Row data.

Return type:

pandas.Series

find_row_date_greater_or_equals_than_indicated(date_str) tuple[bool, int][source]

Finds the first row where the date is greater than or equal to the given date.

Parameters:

date_str (str or datetime) – Reference date.

Returns:

(True, index) if found, otherwise (False, 0).

Return type:

tuple

get_column_in_list(column)[source]

Returns a dataframe column as a Python list.

Parameters:

column (str) – Column name.

Returns:

Column values as a list.

Return type:

list

group_element(group_element)[source]

Groups the dataframe by the specified column(s).

Parameters:

group_element (str or list) – Column(s) to group by.

list_to_df()[source]

Concatenates a list of dataframes into a single dataframe.

order_columns(order_list)[source]

Reorders the dataframe columns.

Parameters:

order_list (list) – Desired column order.

query_date(start_date, end_date, column_name)[source]

Filters rows between two dates based on a specified date column.

Parameters:
  • start_date (str or datetime) – Start date for filtering.

  • end_date (str or datetime) – End date for filtering.

  • column_name (str) – Name of the date column.

query_date_and_element(start_date, end_date, date_column_name, investment_cnpj, investment_column_cnpj)[source]

Filters rows based on both a date range and a specific element.

Parameters:
  • start_date (str or datetime) – Start date for filtering.

  • end_date (str or datetime) – End date for filtering.

  • date_column_name (str) – Name of the date column.

  • investment_cnpj (str) – Value to filter in the investment column.

  • investment_column_cnpj (str) – Column name containing the investment identifier.

Returns:

Filtered dataframe.

Return type:

pandas.DataFrame

query_element_in(column, collection)[source]

Filters rows where column values are within a given collection.

Parameters:
  • column (str) – Column name to filter.

  • collection (list or set) – Collection of values to match.

reset_index()[source]

Resets the dataframe index.

sort_elements_list(sort_list)[source]

Sorts the dataframe by the specified columns.

Parameters:

sort_list (list) – List of column names to sort by.