Slice well

In this post, I briefly review a few methods to select rows and/or columns of a DataFrame that satisfy one or more criteria. I then introduce two additional requirements that arises frequently in practice–slicing with previously unknown criteria and managing serialization and deserialization to recover the desired data structure. Lever multiIndexes I often find pandas’ multiIndex to be helpful, although I do not observe it used very often. With a multi-indexed DataFrame, pandas’ ....

November 15, 2022 · Aaron Slowey

That which is aggregated and its metadata

It’s impossible to include an associated field value alongside an aggregate of another variable Unlike ndarrays, DataFrames are often heterogeneous. They are a more complete map of how we think of a data set as a whole. When we alter the structure of tabular data, often through aggregation of one field, we want to include values from other fields. This is an example of an issue that arises at the interface of pandas and scikit-learn, for which the ColumnTransformer was created....

August 12, 2022 · Aaron Slowey

Aggregation: Implications of indexing

While there are multiple syntaxes and methods to produce the same aggregated data, those variations produce different indices. The format and contents of the index can impact other processes, such as serialization and deserialization. Consider the following artificial transactional data. txns = pd.concat([pd.DataFrame({'dt': pd.date_range("2022", freq="D", periods=10), 'amount': np.random.random(10), 'segment': ['ex'] * 10})] * 10, axis=0) dt amount segment (Timestamp(‘2022-01-01 00:00:00’), 0) 2022-01-01 00:00:00 0.992821 ex (Timestamp(‘2022-01-01 00:00:00’), 0) 2022-01-01 00:00:00 0....

July 22, 2022 · Aaron Slowey