vortica.blogg.se - Github perian daata

#Github perian daata how to#
#Github perian daata license#

You should consider these implications before use. Ethical considerationsĭeploying a Speech-to-Text model into any production setting has ethical implications. The validation ("dev") sets were cleaned and generated from Common Voice 9.0 Persian. In total approximately ~271 hours of data. This model was trained on the following corpora: Common Voice 9.0 Persian (cleaned and with custom train/dev/test splits). Model typeĪpproaches to uncertainty and variabilityĬonfidence scores and multiple paths from the decoding beam can be used to measure model uncertainty and provide multiple, variable transcripts for any processed audio. 72 Model Sizeįor STT, you always must deploy an acoustic model, and it is often the case you also will want to deploy an application-specific language model. The exact real-time factor of an STT model will depend on the hardware setup, so you may experience a different RTF. Real-Time Factor (RTF) is defined as proccesing-time / length-of-audio.

STT models are usually evaluated in terms of their transcription accuracy, deployment Real-Time Factor, and model size on disk. Read more about STT performance factors here. Performance Factorsįactors relevant to Speech-to-Text performance include but are not limited to speaker demographics, recording quality, and background noise. Speech-to-Text for the Persian Language on 16kHz, mono-channel audio. Where to send questions or comments about the model: You can leave an issue on STT issues, open a new discussion on STT discussions, or chat with us on Gitter.

#Github perian daata license#

License: GNU Lesser General Public License v3.0.Model language: Persian / Farsi / fa, fa-IR.Person or organization developing model: Maintained by oct4pie.

Model card for Persian STT v0.1.0 Model details

The acoustic and language models are available in releases.

Thankfully, the scorer is able to handle most of these cases, according to the context.

For example, the model is unable to distinguish between the following words by themselves:

It is worth noting that many of the errors are due to the language syntax.

Then, the dropout rate was decreased as the learning rate was decreasing due to the plateau to reach the minimum loss values.

Scorer_path = '/content/kenlm-persian.scorer',

The model was trained with the following configuration:.

Tranfer learning from the STT English model was enabled (with drop_source_layers=2).

Training was stopped at 1896648 steps after optimizing the model.

Learning_rate was increased for the second run from 0.00012 to 0.0004.

The model was trained 2 separate times with force_initialize_learning_rate.

Then, the corpus was normalized, tokenized, and cleaned using the persianify function and _HTMLStripper in the notebook.

The raw text corpus was obtained from here.

The generated validated.csv was passed to auto_input_dataset=metadata in the configuation step.

Using STT's bin/import_cv2.py script, the data was matched to the alphabet converted to CSV files.

The Common-Voice dataset was obtained from here.

#Github perian daata how to#

persian_stt.ipynb contains a notebook that demonstrates how to preprocess the data.

Approaches to uncertainty and variability.

This is a Persian DeepSpeech model that is currently in development.