Files
FluidAudio/Sources/FastClusterWrapper/include/FastClusterWrapper.h
T
Brandon Weng 7fd5ac5446 pyannote community-1 model for offline speaker diarization pipeline (#150)
### Why is this change needed?
<!-- Explain the motivation for this change. What problem does it solve?
-->

Keeping the streaming one around as the VBx and AHC clustering gets
pretty expensive after 30mins of audio and running it constantly gets
expensive. Its still possible to support clustering between files but
will save that for another PR.

Pyannote's Bench mark is around 11% - i increased steps to 0.2s instead
of 0.1 to double the speed but also selective fp16 results in more
operations to run on ANE but also means that we lose some precision.

```
Average DER: 14.95% | Median DER: 10.89% | Average JER: 39.27% | Median JER: 40.74% (collar=0.25s, ignoreOverlap=True)
Average RTFx: 139.63 (from 232 clips)
Metrics summary saved to: /Users/brandonweng/FluidAudioDatasets/voxconverse/metrics/test_metrics_release.json
Completed. New results: 232, Skipped existing: 0, Total attempted: 232
```

See benchmark.md for more info but compared to Pytorch model, we are
100x faster than the CPU version and ~6x faster compared to the mps
backend on mb pro 4

---------

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: Brandon Weng <BrandonWeng@users.noreply.github.com>
Co-authored-by: Alex <36247722+Alex-Wengg@users.noreply.github.com>
Co-authored-by: Alex-Wengg <hanweng9@gmail.com>
2025-10-22 15:11:57 -04:00

47 lines
1.4 KiB
C

#ifndef FASTCLUSTER_WRAPPER_H
#define FASTCLUSTER_WRAPPER_H
#include <stddef.h>
#ifdef __cplusplus
extern "C" {
#endif
// Error codes returned by fastcluster wrapper.
typedef enum {
FASTCLUSTER_WRAPPER_SUCCESS = 0,
FASTCLUSTER_WRAPPER_INVALID_ARGUMENT = 1,
FASTCLUSTER_WRAPPER_INDEX_OVERFLOW = 2,
FASTCLUSTER_WRAPPER_OUTPUT_TOO_SMALL = 3,
FASTCLUSTER_WRAPPER_ALLOCATION_FAILURE = 4,
FASTCLUSTER_WRAPPER_RUNTIME_ERROR = 5,
FASTCLUSTER_WRAPPER_UNKNOWN_ERROR = 255
} fastcluster_wrapper_status;
/// Compute centroid linkage dendrogram for the provided feature matrix.
///
/// - Parameters:
/// - data: Pointer to `pointCount * dimension` doubles laid out row-major.
/// - pointCount: Number of vectors (>= 1).
/// - dimension: Feature dimension (> 0).
/// - dendrogramOut: Output buffer receiving `(pointCount - 1) * 4` doubles in SciPy
/// linkage format (columns: left, right, distance, sample_count).
/// - dendrogramLength: Length of `dendrogramOut` in elements.
///
/// - Returns:
/// - `FASTCLUSTER_WRAPPER_SUCCESS` on success.
/// - One of the error codes above otherwise.
fastcluster_wrapper_status fastcluster_compute_centroid_linkage(
const double *data,
size_t pointCount,
size_t dimension,
double *dendrogramOut,
size_t dendrogramLength
);
#ifdef __cplusplus
} // extern "C"
#endif
#endif // FASTCLUSTER_WRAPPER_H