Дневник мелочей

Postcard from Korea: Hanja
[info]dnevnikmelochej
Life in Korea post graduation from Yale has been pretty good.  Learning Korean can be a lot of fun, though I don't have a lot of time learning since I am also teaching as well.  Right now, during the busy Christmas season, I'm teaching mostly SAT classes (particularly the critical reading and writing parts) and TOEFL classes, while editing essays on the side.  Not necessarily the most exciting job in the world, but it earns enough to keep it interesting, should I ever need to go to graduate school to live a similar life.  When I'm not busy, unlike now, I have been reading some books in psychology.  Might this help me in my quest to focus on linguistics/neurolinguistics/whatever?  We'll see. 

Anyways, I found this cool gadget while talking to my Korean language tutors here in Seoul.  The site is hanja.naver.com.  Turn your Sino-Korean words into Chinese characters.  So, for example,  정저지와 -> 井底之蛙.  Awesome! I wonder if the feature works in reverse.  To all my readers, have a happy holidays!
  • Add to Memories

Postcard from Taiwan: Onomatopoeia and Ideophones
[info]dnevnikmelochej
I came across an interesting children's story here in Taiwan.  It recounts the old Chinese story of Old Brother Cat (貓大哥) and Young Brother Mouse (鼠弟弟).  In many traditions, a heavenly divinity (or Buddha) decides to make a calendar with 12 animals representing the 12 year cycle, and in order to determine which 12 animals shall be on the calendar,  the heavenly divinity gathers the animals and announces a running competition to determine the top 12 winners. The mouse, being the clever little animal it is, tricks the cat out of its place in the Chinese Zodiac. And, that, of course is the reason why cats hate mice today.

The particular version of the Cat and Mouse story I have is interesting, not only from a cultural viewpoint but from a linguistic viewpoint as well.  Like all good children stories, this story provides a lot of onomatopoeia and ideophones which are tough to translate to other languages. Here are some examples:

)(ㄐㄧ)(ㄐㄧ)(ㄓㄚ)(ㄓㄚ)

Jijichacha -- sound of chatter

(ㄆㄨ!)

Pu! -- popping sound (?)

(ㄆㄥ)

Peng
-- sound of a shot, start of race

(ㄏㄨ) () (ㄏㄨ) ()

Huchihuchi -- sound of panting, heavy breath

(ㄇㄧ) () (ㄇㄧ) ()

Miwumiwu -- sound of a cat mewing

(ㄏㄨㄛˋ) (ㄌㄤˊ) (ㄏㄨㄛˋ) (ㄌㄤˊ)

Huolanghuolang -- sound of gritted teeth

() (ㄐㄧ) () (ㄐㄧ)

Zijiziji -- sound of mouse squeaking

There are some interesting examples onomatopoeia/ideophones, the variety of which is comparable to the sound effects you might see in Japanese manga.  Heck, there's even one that doesn't have a Chinese character shorthand for it, and it's good old "Pu!" This nearly imitates the Japanese practice of using the syllabary system (like Hiragana) instead of the ideographic system (Kanji) to emphasize a sound effect, perhaps even serving to stretch it out.  I wonder if anyone has done a study on such parallels.

I'm already starting to miss New Haven, and can't wait to return in the fall!
  • Add to Memories

Last Day in New Haven, part 2
[info]dnevnikmelochej
So the truism of tempus fugit holds yet again-- as I feel like I was just starting to get the hang of summer life in New Haven when all of a sudden it comes to a close. Yet looking back I realize something. Even though I feel like a lot remains in the "to do" pile in Room 306, a lot nevertheless was accomplished in just a few summer weeks.  For example, breaking the 20,000 entries mark in the database and phonemicizations of several languages.

One of the main tasks during the summer was semantic standardization (glosses, general semantic category, and part of speech data). There was an even logical development to this process that took the following steps:
  • First, this task required getting a list of standardized glosses with which we would append to each of the database entries.
  • So a preliminary list of standardized glosses came from a nice printed-out list from Sutton & Walsh (S&W). Meanwhile, Professor B and I discussed what categories and parts of speech there should be in the database-- this was an issue during the school year that was known for a while and needed to be addressed.
  • Manual input of standardizing glosses was initially unwieldy, since the items were not initially easily accessible on the computer.  There were also many terms that was a "near synonym" with another S&W entry, and I started a spreadsheet containing such a list of near synonymous glosses. 
  • FInding the digital S&W files (as spreadsheets) and doing searches on the computer made things much more manageable, but there was still room for improvement.
  • We soon noticed that each entry in the S&W list could be easily tagged for semantic category and part of speech data as well, and I managed to do such tagging manually.
  • Professor B decided to incorporate the S&W list (tagged by category and part of speech) directly into our database, and gloss standardization became significantly easier, since putting in any item of this S&W list will automatically fill in the category/part of speech information. I particularly liked the fill-ins, because it was very intuitive although it takes a little while to get used to. After all, you need to know what the glosses look like. There are some general patterns and principles, like "bird" versus "bird spp.", or "teeth" being glossed as the singular "tooth", plural "things" instead of "thing", but some other patterns were more arbitrary (like "light" as a noun versus "light (weight)" as an adjective). Another problem of automatic fill-ins occurred with cases like "dream" (which can either be a noun or verb) and "drown" (either transitive or intransitive) where sometimes you would prefer that they leave the part of speech information blank.
  • It soon became clear that S&W would not be the final word for many entries in the database that would remain unglossed because they didn't fall into any good standardized gloss in S&W. For instance, the first version of S&W lacked in its vocabulary many function words ("because"), discourse particles ("hello", "goodbye"), abstract concepts ("god", "dreamtime"), specific terms for organisms ("oyster"), and all the complex kinship relations ("mother or mother's sister", "uncle-cousin pair"). Fortunately, the database is very good at allowing new entries to be manually added, and this even applies to the S&W list which continued to grow day by day.  It quickly became very exciting when we could surpass S&W's established standardized glosses and find some other terms worthy of standardization.
Another major accomplishment of course was the geo-coding, which resulted in a preliminary map of locations where various Aboriginal dialects/languages were spoken.  You've already seen the results of it, so I won't go into much detail.  It only took 3 days (perhaps 20 or so hours) for me to finish with Australia, and another couple days for the professor to make the changes necessary to make it version 1.0.  Version 1.0 was very exciting to make and was definitely the highlight of that work week.

Despite all these accomplishments that have happened in just a few weeks, there are still many things that remain to be done.  I hope to work with Professor B over the course of my senior year.  I just hope I can get through that thesis! We'll see...

  • I briefly talked to Professor B about phonemicization, and how hard it was to correct the inconsistencies in orthography of various sources during phonemicization with no knowledge of the phonemic inventory of differing languages.  It would be nice if in the database each language/dialect had a nice reference to an IPA chart with interesting features of the language noted (number of rhotics, palatal/apical distinction, etc).
  • Although we have spent this summer trying to rigorously delimit the "parts of speech", "generalized glosses", "phonemicized form", and "semantic field", there are still other fields in the database that, although perhaps non-essential for the purposes what we are interested in, also needed to be delimited with the same vigour.
    •  "Variant" vs. "synonym" field.  I've put orthographic variants (i.e. "wadly" instead of "wadli") under the "variant" field, while "synonyms" are more obviously different words with an approximately similar meaning.
    • "Source page" vs. "source note" field.
    • "General note" vs. "other note" field. Under "general notes" I would include parts of the extended definition of a word-- like if one is talking about a ritual, then the general note will go into more depth about how the ritual is done, who does it, etc.  The "other note" field would be for other observations, like "reduplicated form is X, meaning Y"
    • Also "PScopy" vs. "Part of Speech".  I know that "PScopy" is in fact a copy of the "Part of Speech" field... but when should we get rid of the copy?
    • "Form note" versus "paradigm note" field.
  • Add to Memories

Last day in New Haven, part 1
[info]dnevnikmelochej
It's the last day for me in New Haven, and as I come back all soaked from the rain I contemplate the extent of the work I've done for the last couple weeks.  After today, I will go to Taiwan for the rest of the summer for another language program there.

One of the major things I was working on the past few weeks was naturally working on standardizing of glosses.  By the end of the week I looked back at the standardized glosses with the most hits.  Here are the 15 top contenders for most frequently used standardized glosses so far, with Sutton & Walsh codes where applicable:

15. fish (K1) - 576 entries
14. mother (B53) - 581 entries
13. 1sg - 591 entries
12. mouth (A35) - 597 entries
11. head (A1) - 604 entries
10. two (N2) - 604 entries
9. fire (E26) - 619 entries
8. tongue (A41) - 626 entries
7. ear (A28) - 638 entries
6. foot (A127) - 638 entries
5. moon (G28) - 651 entries
4. hand (A59) - 656 entries
3. tooth  (A39) - 668 entries
2. water - 668 entries
1. eye (A14) - 672 entries

Not surprising that about half of these top 15 contenders are body parts, reliably the most common sort of item that appears on wordlists, that and kinship relations-- although here surprisingly there is only one kinship term "mother" on the top 15.  Of course, I am not finished tagging every last entry with a standardized gloss.  The addition of a couple thousand entries to the 20,000 or so entries (either imported or manually added) over the last weekend or two seemed to ensure that this would happen.  Although I don't have exact statistics, I would nevertheless say that we're off to a good start on standardizing things that we probably are most interested in (including body parts and kinship relations, conveniently enough). 

Of the Moreton Bay vocabulary I was working this week to put into the database, we have the following dialects entered as of today (from the Bannister source):

1. Coobenpil
2. Guwar
3. Djandai
4. Yaggara
5. "Moreton Bay" vocabulary

This by no means exhausts the wordlist data in Bannister, and it is already too late to finish... but perhaps this can be one of the things we could start with in the fall, as well as putting in Yinjilanji data as well.
  • Add to Memories

Midweek updates
[info]dnevnikmelochej
Database updates for midweek:

- Duungidjawu input done
- NSF database has about 280 additions to the Sutton & Walsh list
- Lots of entries in the "NewCommonUnglossed" file have been crossed out (i.e. done with)-- including the ones that Professor B already imported into the database

Still got some work to do on the standardization of glosses, among other things.  Apparently we also got a comment about our Pama-Nyungan Google Earth map, so at least one person has looked at it besides Professor B and me-- yay blog publicity!  With any luck, we can come out with another version, with the help of helpful commenters (kind of like a Wikipedia-style project of its own, now that I think of it!)
  • Add to Memories

Start of my last week
[info]dnevnikmelochej

So today began the start of my last week in New Haven.  Professor B and I had a talk this morning about what we would like to get done before the beginning of the fall semester, when other Yalies will start helping out on the grants again.

We set several things in motion, such as tracking which languages should be "adopted" by various student researchers in the fall, getting sound files appropriate for my senior thesis, and seeing what work I could do in the meantime. Today I worked on putting in some more Duungidjawu entries that Barry had started.  I think I got around 95% finished with this language at the end of the day. 

  • Add to Memories

Making sense of Yaggara?
[info]dnevnikmelochej
So, today I finished putting the Coobenpil data into the database, and started on the related dialects/languages: Yaggara (Yakara) and Djandai (Stradbroke Island language).  I became quickly puzzled trying to make sense of the Yaggara mini-grammar provided by some "Dr. Lauterer".  In his first account of the phonetics of Yaggara Dr. Lauterer writes: "The consonants are: B, C acute, D, F (which does not occur in any other Australian language), G, K, L, M, N, N acute, NG (sometimes pronounced like an H), P, R, S acute, SH, T, W (which is not used in other Australian languages), Y (as in the English word yet)."

I'm trying to make sense of what phonemic system Dr. Lauterer has in mind when I read about Dr. Lauterer's second account on the phonetics of Yaggara.  There are twelve consonants this time: "K, N, N', D, T, B, P, M, W, R, L, S' (TS' is by English authors given in a clumsy way by the consonant J).  The English S is utterly unknown in the Yaggara language."

In both accounts, Dr. Lauterer propses a six vowel system as follows: "A as in father, E as in ten, I as in kiss, O as in go; U as in true, AE as the A in happy, OE as the U in but, besides the Y in yes."

That being said, try your luck at phonemicizing the following words that Dr. Lauterer then goes on to offer in his wordlist:

ts'unbul (araucarian tree)
tseruse (trousers)
dziundahl (girl)
ds'undahl (woman)
citne (foot)
adyoe (good-bye)
tshasin (sister)
wadly-wadly (very bad)
doehgehr (white man)
bingpas'agon (lip)

Truly a philologist's worst nightmare!
  • Add to Memories

GIS updates and more!
[info]dnevnikmelochej
Rather than repeating what progress we've made with the Pama-Nyungan grant, I'll just offer a link explaining what has been accomplished: http://pamanyungan.wordpress.com/2009/06/16/map-of-pama-nyungan-languages/

We're planning to consult several colleagues before we publish our geo-coordinates online (as version 1.0) .  In the meantime, today Professor B got some more maps of Aboriginal Australia (who knew there would be more than one map?) and I helped stitch together another Aboriginal Australia map crafted by Tindale.  It's still not the best Tindale map but until the professor can get ahold of the real map this will have to suffice.  I guess the ultimate goal is to create a map on Google Earth that is an amalgamation of what Australianists have been working on for decades: a comprehensive indigenous language/dialect map of the whole continent-- and now, of course, it's digital!

Today I also managed to actually look at the NSF database again for perhaps the first time this week.  And I continued to chip away at what still needed to be done: add several more items to the S&W list, adding more Goinbal (Coobenpil) entries from the Bannister soruce, and updating the standardized glosses.  One of the things that I've been finding increasingly hard to work with are the ASEDA/Javier-imported entries that have been entered in multiple times. Karrwa, for instance, often had the same lexical entry in triplicate, leading me to accidentally gloss the entry three times and as a result ending up only 1/3 as efficient as usual.  It would be nice to just find a way to have FileMaker pro automatically recognize the repeats and delete them-- but if that doesn't work, perhaps manual deletion would be best?  It would definitely make the database look less cluttered as it does now.
  • Add to Memories

Done with GIS coding!
[info]dnevnikmelochej


The GIS coding for the Pama-Nyungan languages (including Garrwan and Yolngu Matha) is done. I have attached both a screenshot of what it currently looks like on Google Earth. As you may have noticed I have started color-coding according to Obata's language family classifications-- but Google Earth currently does not supply enough differently colored pushpins to completely classify every Pama-Nyungan language according to family.  So I just colored a fraction of the languages for easy reference points on the Australian map.

Pink = Western Desert (Wati)
Blue = South-West
Light blue = (dialect of another neighboring language)
Orange = Thura-Yura
Yellow = Karnic
Red = Maric
Purple = Yolngu Matha
Apple Green = Torres Strait languages
Light Grey = Non-Pama Nyungan languages
Green = everything else

Perhaps I was thinking in the final stage of things we can have green stand for small language families or language isolates, so for instance, Purple and Apple Green can be used for other things besides Yolngu Matha and Torres Strait languages.
Tags:
  • Add to Memories

GIS coding, part 3
[info]dnevnikmelochej
I'm not quite finished mapping all of the Pama-Nyungan languages to geo-coordinates.  It apparently would take more than a day to cover all the little languages that are squished into various corners of the Australian coastline. That being said, the following language groups are finished today:

- Southwest (Nyungic)
- Desert languages (including Arandic, Warluwarric, Ngumpin-Yapa, and Karajarri)
- Yolngu-Matha cluster
- Thura-Yura languages
- Karnic languages

Still on the to-do list:
-Northern Cape York
- Southeast/ Central North South Wales/ Southern Queensland
- Karnic languages
- Tangkic family

I also came across several names of languages that are not identified in Obata's list of Australian languages-- I have noted them in "GIS coding" subfolder of the shared Pama-Nyungan folder, under the file "GIScoding_problem_names.xls".
Tags:
  • Add to Memories

You are viewing [info]dnevnikmelochej's journal