Search

Items tagged with: CommonVoice

blub

1 year ago

blub
1 year ago

Die @openhomefoundation freut sich übrigens über Sprachschnipsel eurer Stimme. Es soll nur "OK NABU" eingesprochen werden, also viel einfacher als etwa bei #commonvoice
Gerne so viele unterschiedliche Sprecher wie möglich, damit später dies wakeword zuverlässig erkannt wird
ohf-voice.github.io/wake-word-…

#stt #stimme #voice #nabu #crowdsourcing #commons #nlp #ml #homeassistant #hass #iot #smarthome

#homeassistant #iot #commons #ML #smarthome #nlp #commonvoice #voice #hass #stt #stimme #crowdsourcing #nabu @Open Home Foundation

Please wait

View in context

Kathy Reid

1 year ago

Kathy Reid
1 year ago

If you're a #language nerd like I am, then you won't have missed the @mozilla #CommonVoice v19 #speech #dataset release - which now features 131 languages! Here's my #dataviz, done in @observablehq of the v19 #metadata coverage.

I've updated the visualisation this time around with human-readable language names instead of their ISO-639 or BCP-47 language codes to make it it easier to read.

There's some interesting observations:

▶ Catalan (ca) continues to be leader in terms of data - speaking volumes about the efforts to revitalise culture and language in Catalunya. It's also one of the few languages that has data for all age groups, particularly older speakers - this sort of data is missing for most other languages.

▶ Kiswahili (sw) is one of the languages where there is more data for female-identifying speakers than for male-identifying speakers ♀ - although Japanese (ja), Western Mari (mrj) and Luganda (lg) do pretty well here, too!

▶ Sentence domains can now be categorised, and although most new sentences are "general", Albanian (sq) has a lot of sentences related to law and government.

▶ Tsonga (ts), a Bantu language spoken in Southern Africa, has dethroned Icelandic (is) as the language with the highest average utterance duration. I don't know enough about Tsonga to speculate why - it's a somewhat agglutinative language, but many Tsonga works are generally short.

▶ Bengali / Bangla (bn) has a significant amount of data that is not yet validated, and therefore does not appear in training / dev / test splits. There is a similar case for many languages new to Common Voice - it takes time to validate.

▶ The language with the highest number of average contributions per speaker is Taita (dav), a Bantu language from Kenya.

What do you make of the data visualisation? Are there any other insights you can see?

Big thanks to the CV team for all their efforts - EM, Jessica Rose, Dmitrij Feller and Justin Grant.

#linguistics

observablehq.com/@kathyreid/mo…

#linguistics #dataviz #language #metadata #speech #commonvoice #dataset @Observable @Mozilla

Please wait

View in context

Juan P.

2 years ago

Juan P.
2 years ago

📣 ¡Emocionantes noticias! Common Voice, de Mozilla, nos invita a un emocionante proyecto.

Localizamos el wayuunaiki en una base de datos pública para que los desarrolladores creen aplicaciones de reconocimiento de voz en lenguas de bajos recursos. ¡Únete a la comunidad wayuunaiki-tic y hagamos que el wayuunaiki se escuche en la era digital!

🌐Más info: chat.whatsapp.com/FmAesbq8m8Q9…

#CommonVoice #Wayuunaiki #ReconocimientoDeVoz

Wayuunaiki-tic

Pozvánka do skupiny WhatsApp

^WhatsApp.com

#commonvoice #wayuunaiki #reconocimientodevoz

Please wait

View in context

devSJR

3 years ago

devSJR
3 years ago

Heard about #commonvoice (commonvoice.mozilla.org/)? This is an easy way to contribute to #FLOSS. Such projects need contributors (e.g. people who give data samples or review data).
We used this for teaching statistics (#rstats). Students should find out if there are gender differences by counting male vs. female contributors. There were significantly more men in our sample. Bad for unbiased #AI! If more use such projects, we can also grow the open datasets faster and fairer. #genderinequality

Mozilla Common Voice

^{commonvoice.mozilla.org}

#floss #rstats #AI #commonvoice #genderinequality

Please wait

View in context

⇧

Search

Items tagged with: CommonVoice

blub 1 year ago

blub 1 year ago

Kathy Reid 1 year ago

Kathy Reid 1 year ago

Juan P. 2 years ago

Juan P. 2 years ago

Wayuunaiki-tic

devSJR 3 years ago

devSJR 3 years ago

Mozilla Common Voice

blub

1 year ago

blub
1 year ago

Kathy Reid

1 year ago

Kathy Reid
1 year ago

Juan P.

2 years ago

Juan P.
2 years ago

devSJR

3 years ago

devSJR
3 years ago