

You should consider these implications before use. Ethical considerationsĭeploying a Speech-to-Text model into any production setting has ethical implications. The validation ("dev") sets were cleaned and generated from Common Voice 9.0 Persian. In total approximately ~271 hours of data. This model was trained on the following corpora: Common Voice 9.0 Persian (cleaned and with custom train/dev/test splits). Model typeĪpproaches to uncertainty and variabilityĬonfidence scores and multiple paths from the decoding beam can be used to measure model uncertainty and provide multiple, variable transcripts for any processed audio. 72 Model Sizeįor STT, you always must deploy an acoustic model, and it is often the case you also will want to deploy an application-specific language model. The exact real-time factor of an STT model will depend on the hardware setup, so you may experience a different RTF. Real-Time Factor (RTF) is defined as proccesing-time / length-of-audio.

STT models are usually evaluated in terms of their transcription accuracy, deployment Real-Time Factor, and model size on disk. Read more about STT performance factors here. Performance Factorsįactors relevant to Speech-to-Text performance include but are not limited to speaker demographics, recording quality, and background noise. Speech-to-Text for the Persian Language on 16kHz, mono-channel audio. Where to send questions or comments about the model: You can leave an issue on STT issues, open a new discussion on STT discussions, or chat with us on Gitter.
#Github perian daata license#
License: GNU Lesser General Public License v3.0.Model language: Persian / Farsi / fa, fa-IR.Person or organization developing model: Maintained by oct4pie.

Model card for Persian STT v0.1.0 Model details
#Github perian daata how to#
