Cukurova NLP Research Group |
News and notification
As the CuNLP group, we have a word-sense vector model based on WordNet relationships that we call SemSpace. We have obtained quite successful results with this method in several studies. We are now looking for collaborations to disseminate this work using non-English WordNet's and perhaps improve the method. If anyone is interested in the subject, we are ready to meet in this context.
The Undergraduate and graduate students will be employed within the scope of the NLP-based flipped learning project. The student's interest in NLP area is what we care most about, but it is preferable that he has taken Machine Learning and Natural Language Processing at the undergraduate level and has a little experience in .NET MAUI and/or Python.
What is Natural language processing (NLP)?
According to Wikipedia, NLP is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The result is a computer capable of ‘understanding’ the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.
What about the Turkish NLP studies?
Although preliminary studies in Turkish NLP are promising, unfortunately the studies in Turkish still lag far behind the English literature. At this point, the Turkish Language Association and the departments of Computer Science and Turkish Language of the universities have great duties. However, it is an opportunity for us. It can be said that there is a lot of work to be done for Turkish NLP researchers.
Our Funded Research Projects
- NLP-based Flipped Learning, CU, 2022-... (just started)
- Live Turkish Dictionary, TUBITAK, 2016-2019 <click for details>
The project is based on a system design that analyzes texts collected from various internet news sites and performs the following operations: Identifying the new words entering Turkish, Determining a numerical relationship from the union of these new words with known words, Using this new words and detected relationships to improve the synset space calculated from a Turkish WordNet. As a result of the project; 3 journal articles were prepared, 4 symposium procedings were presented, 2 students were funded throughout their education duration.
- Learning Word-Vector Quantization: A Study In Morphological Disambiguation of Turkish, CU-BAP, 2015-2020
It introduces a new classifier named Learning Word-vector Quantization (LWQ) to solve morphological
ambiguities in Turkish, which is an agglutinative language. Also, a new and morphologically annotated
corpus, and then its datasets are prepared with a series of processes.
Graduate Prospective Students
Candidate students must have a bachelor's degree in computer science. Apart from that, preferably, it is expected to have experience in Python coding and to have taken Natural Language Processing and Machine Learning courses in undergraduate education.
Resources ad Tools
- CENG-bot (a chatbot prepared for Cukurova University Computer Engineering Department Students)
We are still developing it using new approaches, but now it works if you want to use it.
- Turkish Corpus for Morphological Disambiguation
- If you would use this corpus in your publication, please make sure you cite:
U Orhan, E Arslan. Learning Word-Vector Quantization: A Case Study in Morphological Disambiguation, Transactions on Asian and Low-Resource Language Information Processing, 19, 5, 72, 2020.
- Learning Word-vector Quantization (in Matlab)
- If you would use this corpus in your publication, please make sure you cite:
U Orhan, E Arslan. Learning Word-Vector Quantization: A Case Study in Morphological Disambiguation, Transactions on Asian and Low-Resource Language Information Processing, 19, 5, 72, 2020.
- CU-NLP dataset for Automatic Short Answer Grading
- If you would use this corpus in your publication, please make sure you cite:
CN Tulu, O Ozkaya, U Orhan. Short Answer Grading with SemSpace Sense Vectors and MaLSTM, IEEE Access,
9, 19270-19280, 2021.
- Synset vectors computed by Generalized SemSpace
- If you would use this corpus in your publication, please make sure you cite:
U Orhan, EG Tosun, O Ozkaya. Intent Detection Using Contextualized Deep SemSpace, Arabian Journal for Science and Engineering,, 2022.
- E Turan (PhD), "Automatic Synset Detection from Turkish Dictionary using Confidence Indexing", April 2020.
- E Arslan (PhD), "Learning Word-Vector Quantization: A Study in Morphological Disambiguation of Turkish", January 2020.
- CN Tulu (PhD), "A Semantic Vector Space Model Using Euclidean Distance Based Relatedness", June 2019.
- U Orhan, EG Tosun, O Ozkaya. Intent Detection Using Contextualized Deep SemSpace, Arabian Journal for Science and Engineering,, 2022.
- E Turan, U Orhan. Confidence Indexing of Automated Detected Synsets: A Case Study on Contemporary Turkish Dictionary,
Transactions on Asian and Low-Resource Language Information Processing, 21 (1), Article No.: 18, pp 1–19, 2021.
- U Orhan, CN Tulu. A Novel Embedding Approach to Learn Word Vectors by Weighting Semantic Relations: SemSpace, Expert
Systems with Applications, 180 (115146), 2021
- CN Tulu, O Ozkaya, U Orhan. Automatic Short Answer Grading with SemSpace Sense Vectors and MaLSTM, IEEE Access,
9, 19270-19280, 2021.
- E Turan, E Arslan, CN Tulu, U Orhan. A Comparison of Graph Centrality Algorithms For Semantic Distance.
Lapseki Vocational School Applied Research Journal, 1(2), 61-70, 2020.
- U Orhan, E Arslan. Learning Word-Vector Quantization: A Case Study in Morphological Disambiguation, Transactions on
Asian and Low-Resource Language Information Processing, 19(5), 72, 2020.
- E Arslan, U Orhan. Identification of OOV words in Turkish texts, Gaziosmanpasa Scientific Research Journal, 8(2),
35-48, 2019.
- C Tulu, U Orhan, E Turan. Semantic Relation’s Weight Determination on a Graph Based WordNet, Gumushane University Fen
Bilimleri Enstitusu Dergisi, 92, pp.67-78, 2018.
- E Arslan, U Orhan, BT Tahiroglu. Morphological Disambiguation of Turkish with Free-order Co-occurrence Statistics,
Gumushane University Fen Bilimleri Enstitusu Dergisi, 92, pp.46-52, 2018.
- C Tulu, U Orhan, E Turan. Determination of Semantic Relations Weight’s on WordNet, 3rd International Conference on
Computational Mathematics and Engineering, 2018.
- E Turan, U Orhan, C Tulu. Using Graph Connectivity Measures for Distance in Semantic Networks, 3rd International
Conference on Computational Mathematics and Engineering, 2018.
- E Arslan, U Orhan, BT Tahiroglu. Morphological Disambiguation of Turkish with Free-Order Co-Occurrence Statistics, 3rd
International Conference on Computational Mathematics and Engineering, 2018.
- E Turan, U Orhan. Building a Turkish Semantic Network and Connecting Synonym Senses Bidirectionally, INnovations in
Intelligent SysTems and Applications (INISTA), 2018.
- C Tulu, U Orhan. PageRank based semantic similarity measure on a graph based Turkish WordNet, International Conference
on Computer Science and Engineering (UBMK), 2017.
- E Arslan, U Orhan. Using Graphs in Construction of a Lemmatization Model for Turkish, International Mediterranean
Science and Engineering Congress, 2017.
- E Arslan, U Orhan. Graph-based lemmatization of Turkish words by using morphological similarity, International
Symposium on INnovations in Intelligent SysTems and Applications (INISTA), 2016.