When attempting to log-transform an array of values with NumPy, keep in mind

  • Given negative numbers and zeroes, NumPy will output NaN and -inf, respectively, along with a RuntimeWarning. Such values can cause downstream processing to fail or behave unexpectedly.
  • numpy.log provides an argument to handle this situation
  • How that argument affects numpy.log’s behavior depends on whether the output goes to a preexisting container or if that container is created on the fly.

Consider this example, in which subject_data is a pandas DataFrame (hence, the chaining):

transforms = (np.log10(subject_data, where=column_name > 0)
              .replace([np.inf, -np.inf], np.nan)
              .dropna()
             )

The where=column_name > 0 argument will cause the logarithmic transformation to ignore rows where that column value is not greater than zero and instead place the original value. Any condition that evaluates to true or false can be used.

If observations did not exist prior, meaning it was uninitialized, locations of subject_data where the condition is False will result in observations remaining uninitialized in the corresponding positions. If you try this out, you will see NaN or maybe something like 6.952161e-310.

It may be better to initialize the output container, with zeroes, NaN, or whatever facilitates downstream use.

Let’s say we have 6 values to transform, of which one is a zero and another is negative

summary = pd.DataFrame({'cat': ['a', 'a', 'b', 'c', 'c', 'c'], 'z': [33, 22, 44, 0, 11, -8]})

transforms = np.zeros(6)

Note that we now call np.log without a literal assignment to transforms and instead assign transforms to the argument out:

np.log(summary.z, out=transforms, where=summary.z > 0)

>>> transforms
>>> array([3.49650756, 3.09104245, 3.78418963, 0.        , 2.39789527,
           0.        ])