machine_learning.mfcc
=====================

.. py:module:: machine_learning.mfcc

.. autoapi-nested-parse::

   Mel Frequency Cepstral Coefficients (MFCC) Calculation

   MFCC is an algorithm widely used in audio and speech processing to represent the
   short-term power spectrum of a sound signal in a more compact and
   discriminative way. It is particularly popular in speech and audio processing
   tasks such as speech recognition and speaker identification.

   How Mel Frequency Cepstral Coefficients are Calculated:
   1. Preprocessing:
      - Load an audio signal and normalize it to ensure that the values fall
        within a specific range (e.g., between -1 and 1).
      - Frame the audio signal into overlapping, fixed-length segments, typically
        using a technique like windowing to reduce spectral leakage.

   2. Fourier Transform:
      - Apply a Fast Fourier Transform (FFT) to each audio frame to convert it
        from the time domain to the frequency domain. This results in a
        representation of the audio frame as a sequence of frequency components.

   3. Power Spectrum:
      - Calculate the power spectrum by taking the squared magnitude of each
        frequency component obtained from the FFT. This step measures the energy
        distribution across different frequency bands.

   4. Mel Filterbank:
      - Apply a set of triangular filterbanks spaced in the Mel frequency scale
        to the power spectrum. These filters mimic the human auditory system's
        frequency response. Each filterbank sums the power spectrum values within
        its band.

   5. Logarithmic Compression:
      - Take the logarithm (typically base 10) of the filterbank values to
        compress the dynamic range. This step mimics the logarithmic response of
        the human ear to sound intensity.

   6. Discrete Cosine Transform (DCT):
      - Apply the Discrete Cosine Transform to the log filterbank energies to
        obtain the MFCC coefficients. This transformation helps decorrelate the
        filterbank energies and captures the most important features of the audio
        signal.

   7. Feature Extraction:
      - Select a subset of the DCT coefficients to form the feature vector.
        Often, the first few coefficients (e.g., 12-13) are used for most
        applications.

   References:
   - Mel-Frequency Cepstral Coefficients (MFCCs):
     https://en.wikipedia.org/wiki/Mel-frequency_cepstrum
   - Speech and Language Processing by Daniel Jurafsky & James H. Martin:
     https://web.stanford.edu/~jurafsky/slp3/
   - Mel Frequency Cepstral Coefficient (MFCC) tutorial
     http://practicalcryptography.com/miscellaneous/machine-learning
     /guide-mel-frequency-cepstral-coefficients-mfccs/

   Author: Amir Lavasani


Functions
---------

.. autoapisummary::

   machine_learning.mfcc.audio_frames
   machine_learning.mfcc.calculate_fft
   machine_learning.mfcc.calculate_signal_power
   machine_learning.mfcc.discrete_cosine_transform
   machine_learning.mfcc.example
   machine_learning.mfcc.freq_to_mel
   machine_learning.mfcc.get_filter_points
   machine_learning.mfcc.get_filters
   machine_learning.mfcc.mel_spaced_filterbank
   machine_learning.mfcc.mel_to_freq
   machine_learning.mfcc.mfcc
   machine_learning.mfcc.normalize


Module Contents
---------------

.. py:function:: audio_frames(audio: numpy.ndarray, sample_rate: int, hop_length: int = 20, ftt_size: int = 1024) -> numpy.ndarray

   Split an audio signal into overlapping frames.

   Args:
       audio: The input audio signal.
       sample_rate: The sample rate of the audio signal.
       hop_length: The length of the hopping (default is 20ms).
       ftt_size: The size of the FFT window (default is 1024).

   Returns:
       An array of overlapping frames.

   Examples:
   >>> audio = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]*1000)
   >>> sample_rate = 8000
   >>> frames = audio_frames(audio, sample_rate, hop_length=10, ftt_size=512)
   >>> frames.shape
   (126, 512)


.. py:function:: calculate_fft(audio_windowed: numpy.ndarray, ftt_size: int = 1024) -> numpy.ndarray

   Calculate the Fast Fourier Transform (FFT) of windowed audio data.

   Args:
       audio_windowed: The windowed audio signal.
       ftt_size: The size of the FFT (default is 1024).

   Returns:
       The FFT of the audio data.

   Examples:
   >>> audio_windowed = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
   >>> audio_fft = calculate_fft(audio_windowed, ftt_size=4)
   >>> bool(np.allclose(audio_fft[0], np.array([6.0+0.j, -1.5+0.8660254j,
   ...     -1.5-0.8660254j])))
   True


.. py:function:: calculate_signal_power(audio_fft: numpy.ndarray) -> numpy.ndarray

   Calculate the power of the audio signal from its FFT.

   Args:
       audio_fft: The FFT of the audio signal.

   Returns:
       The power of the audio signal.

   Examples:
   >>> audio_fft = np.array([1+2j, 2+3j, 3+4j, 4+5j])
   >>> power = calculate_signal_power(audio_fft)
   >>> np.allclose(power, np.array([5, 13, 25, 41]))
   True


.. py:function:: discrete_cosine_transform(dct_filter_num: int, filter_num: int) -> numpy.ndarray

   Compute the Discrete Cosine Transform (DCT) basis matrix.

   Args:
       dct_filter_num: The number of DCT filters to generate.
       filter_num: The number of the fbank filters.

   Returns:
       The DCT basis matrix.

   Examples:
   >>> float(round(discrete_cosine_transform(3, 5)[0][0], 5))
   0.44721


.. py:function:: example(wav_file_path: str = './path-to-file/sample.wav') -> numpy.ndarray

   Example function to calculate Mel Frequency Cepstral Coefficients
   (MFCCs) from an audio file.

   Args:
       wav_file_path: The path to the WAV audio file.

   Returns:
       np.ndarray: The computed MFCCs for the audio.


.. py:function:: freq_to_mel(freq: float) -> float

   Convert a frequency in Hertz to the mel scale.

   Args:
       freq: The frequency in Hertz.

   Returns:
       The frequency in mel scale.

   Examples:
   >>> float(round(freq_to_mel(1000), 2))
   999.99


.. py:function:: get_filter_points(sample_rate: int, freq_min: int, freq_high: int, mel_filter_num: int = 10, ftt_size: int = 1024) -> tuple[numpy.ndarray, numpy.ndarray]

   Calculate the filter points and frequencies for mel frequency filters.

   Args:
       sample_rate: The sample rate of the audio.
       freq_min: The minimum frequency in Hertz.
       freq_high: The maximum frequency in Hertz.
       mel_filter_num: The number of mel filters (default is 10).
       ftt_size: The size of the FFT (default is 1024).

   Returns:
       Filter points and corresponding frequencies.

   Examples:
   >>> filter_points = get_filter_points(8000, 0, 4000, mel_filter_num=4, ftt_size=512)
   >>> filter_points[0]
   array([  0,  20,  51,  95, 161, 256])
   >>> filter_points[1]
   array([   0.        ,  324.46707094,  799.33254207, 1494.30973963,
          2511.42581671, 4000.        ])


.. py:function:: get_filters(filter_points: numpy.ndarray, ftt_size: int) -> numpy.ndarray

   Generate filters for audio processing.

   Args:
       filter_points: A list of filter points.
       ftt_size: The size of the FFT.

   Returns:
       A matrix of filters.

   Examples:
   >>> get_filters(np.array([0, 20, 51, 95, 161, 256], dtype=int), 512).shape
   (4, 257)


.. py:function:: mel_spaced_filterbank(sample_rate: int, mel_filter_num: int = 10, ftt_size: int = 1024) -> numpy.ndarray

   Create a Mel-spaced filter bank for audio processing.

   Args:
       sample_rate: The sample rate of the audio.
       mel_filter_num: The number of mel filters (default is 10).
       ftt_size: The size of the FFT (default is 1024).

   Returns:
       Mel-spaced filter bank.

   Examples:
   >>> float(round(mel_spaced_filterbank(8000, 10, 1024)[0][1], 10))
   0.0004603981


.. py:function:: mel_to_freq(mels: float) -> float

   Convert a frequency in the mel scale to Hertz.

   Args:
       mels: The frequency in mel scale.

   Returns:
       The frequency in Hertz.

   Examples:
   >>> round(mel_to_freq(999.99), 2)
   1000.01


.. py:function:: mfcc(audio: numpy.ndarray, sample_rate: int, ftt_size: int = 1024, hop_length: int = 20, mel_filter_num: int = 10, dct_filter_num: int = 40) -> numpy.ndarray

   Calculate Mel Frequency Cepstral Coefficients (MFCCs) from an audio signal.

   Args:
       audio: The input audio signal.
       sample_rate: The sample rate of the audio signal (in Hz).
       ftt_size: The size of the FFT window (default is 1024).
       hop_length: The hop length for frame creation (default is 20ms).
       mel_filter_num: The number of Mel filters (default is 10).
       dct_filter_num: The number of DCT filters (default is 40).

   Returns:
       A matrix of MFCCs for the input audio.

   Raises:
       ValueError: If the input audio is empty.

   Example:
   >>> sample_rate = 44100  # Sample rate of 44.1 kHz
   >>> duration = 2.0  # Duration of 1 second
   >>> t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)
   >>> audio = 0.5 * np.sin(2 * np.pi * 440.0 * t)  # Generate a 440 Hz sine wave
   >>> mfccs = mfcc(audio, sample_rate)
   >>> mfccs.shape
   (40, 101)


.. py:function:: normalize(audio: numpy.ndarray) -> numpy.ndarray

   Normalize an audio signal by scaling it to have values between -1 and 1.

   Args:
       audio: The input audio signal.

   Returns:
       The normalized audio signal.

   Examples:
   >>> audio = np.array([1, 2, 3, 4, 5])
   >>> normalized_audio = normalize(audio)
   >>> float(np.max(normalized_audio))
   1.0
   >>> float(np.min(normalized_audio))
   0.2