Items tagged with: linguistics

Search

Items tagged with: linguistics




If you're a #language nerd like I am, then you won't have missed the @mozilla #CommonVoice v19 #speech #dataset release - which now features 131 languages! Here's my #dataviz, done in @observablehq of the v19 #metadata coverage.

I've updated the visualisation this time around with human-readable language names instead of their ISO-639 or BCP-47 language codes to make it it easier to read.

There's some interesting observations:

▶ Catalan (ca) continues to be leader in terms of data - speaking volumes about the efforts to revitalise culture and language in Catalunya. It's also one of the few languages that has data for all age groups, particularly older speakers - this sort of data is missing for most other languages.

▶ Kiswahili (sw) is one of the languages where there is more data for female-identifying speakers than for male-identifying speakers ♀ - although Japanese (ja), Western Mari (mrj) and Luganda (lg) do pretty well here, too!

▶ Sentence domains can now be categorised, and although most new sentences are "general", Albanian (sq) has a lot of sentences related to law and government.

▶ Tsonga (ts), a Bantu language spoken in Southern Africa, has dethroned Icelandic (is) as the language with the highest average utterance duration. I don't know enough about Tsonga to speculate why - it's a somewhat agglutinative language, but many Tsonga works are generally short.

▶ Bengali / Bangla (bn) has a significant amount of data that is not yet validated, and therefore does not appear in training / dev / test splits. There is a similar case for many languages new to Common Voice - it takes time to validate.

▶ The language with the highest number of average contributions per speaker is Taita (dav), a Bantu language from Kenya.

What do you make of the data visualisation? Are there any other insights you can see?

Big thanks to the CV team for all their efforts - EM, Jessica Rose, Dmitrij Feller and Justin Grant.

#linguistics

observablehq.com/@kathyreid/mo…


Can #typos make lies seem less deceptive or make true statements seem less true?

“true statements with grammatical errors and unusual word choices were seen as more deceitful, and lie statements with the same language were seen as less deceptive” and “I discovered a new brain response that is sensitive to the difference between perceived truths and lies.”

Dissertation permalink: digitalcommons.usu.edu/etd2023…

DOI: doi.org/10.26076/068f-415c

#ethics #xPhi #decisionScience #neuroscience #psychology #business #communication #linguistics


A Language Log post about nonbinary honorifics and the etymology of Miss and Missus (from Latin Magistra "mistress" not Latin Magus and its Old Iranian relatives) lead me to this essay on how the connotations of each term have changed since the 18th century cam.ac.uk/research/news/mistre… #philology #linguistics




I am super excited about this mini-conference on #reproducibility in #linguistics that I am organising this evening: Four of my M.A. students will be reporting on their attempts to reproduce the results of four published quantitative linguistics papers for which the data is available, but not the code!

Colleagues, they have *a lot* of things to report! So, if you're in the area (Cologne), do come along! There will be #ReproducibiliTea and Christmas biscuits! 🍵 🍪 #OpenScience


This has been going around on Twitter, but I neglected my community here :) I'm sorry about this :)
Tomorrow at noon EST, I will give a #talk on the #accessibility of #language #learning and #linguistics in general for #screenReader users as part of the a11yTalks event. This will be a public event with no need to register so if this is something any of you are interested in, here's the link :) a11ytalks.com/posts/2023-MAY/ #speaker #a11y



I have opened recruitment for an online experiment investigating cross linguistic perceptions of iconicity! If you are or know a deaf signer of Norwegian sign language please send them this announcement on the Norwegian Deaf Association webpage! #linguistics #signlanguage doveforbundet.no/nyheter/2023/…


Fascinating article!

"Unearthing a Long Ignored African Writing System, One Researcher Finds African History, by Africans: BU anthropologist Fallou Ngom discovered Ajami, a modified Arabic script, in a box of his late father’s old papers" posted December 21, 2022, written by Molly Callahan

bu.edu/articles/2022/fallou-ng…

#linguistics #Ajami #Africa


I’m writing an article that expands my microblog entry on stylometric fingerprinting to give more comprehensive advice. I am partially walking back on my recommendation not to use machine translation and adding information about reading levels, among other things. Would anybody familiar with #stylometry, or with a #linguistics background experienced with close-reading, be up for reviewing a rough draft next week?

I’d also be interested in how people may describe my own stylometric fingerprint (signature phrases, grammar quirks, etc), to use as an example.

Boosts appreciated.


I am:
🔵a #book lover
🔵mildly #burnedout but trying to take it easy 😊
🔵a #cat lover (sharing my place with 1 fluffy terrorist)
🔵#choleric (actively working on my tantrums though) 😂
🔵a good #communicator
🔵#compassionate
🔵#curious
🔵#Czech
🔵an #effectivealtruist (I translated Toby Ord's The Precipice into Czech)
🔵#empathic
🔵a #feminist
🔵#friendly
🔵a #linguistics nerd
🔵a #liberal
🔵a language #teacher
🔵a literary #translator
🔵very much in love with @stepan
🔵 a #woman