Difference between revisions of "Google Summer of Code 2019 proposed ideas"

From Ελεύθερο Λογισμικό / Λογισμικό ανοιχτού κώδικα
Jump to: navigation, search
Line 104: Line 104:
  
 
==== Mentors:  Alexandra Betouni alexandra.betouni@extenly.com  ====
 
==== Mentors:  Alexandra Betouni alexandra.betouni@extenly.com  ====
 +
 +
== Development of a Tool for Extracting Quantitative Text Profiles ==
 +
 +
==== Brief Explanation ====
 +
Quantitative text analysis is the basis of nearly every computational approach to text management and processing. All advanced Natural
 +
Language Processing (NLP) tasks including information retrieval, sentiment analysis, computational stylistics etc. involve the quantification of texts across a huge number of linguistic features and transform text into vectors. In many programming languages, e.g. R, Python, Java etc., there are numerous open source scripts, tools, packages and libraries that can transform texts to vectors of word frequencies, character and word
 +
n-gram frequencies, stylometric features etc. However, each of these tools covers only a restricted subset of the possible linguistic features.
 +
 +
Moreover, the available tools are written in different languages and require considerable efforts to be combined so that the user can extract a unified file of results. Due to the fragmentary nature of the programing environments and the highly technical skills that are
 +
required to operate the tools and combine their results, they can’t be used by large communities of scientists with humanities and sociopolitical background.
 +
 +
For the above
 +
reasons, we envisage the development of a user-friendly Graphical User Interface (GUI) based tool that shall provide integrated access
 +
to existing open NLP software. The new tool shall support the
 +
quantitative analysis of multilingual texts and produce quantitativetext profiles that can be used as input for further analysis,
 +
visualization, machine learning and other advanced computational
 +
processing. Such a tool does not exist to date and it will boost
 +
research in all scientific areas that require computational processing of large amounts of text.
 +
 +
==== Expected Results ====
 +
The outcome of this project would be an open-source software with the following specifications:
 +
* User-friendly GUI that can guide intuitively its users to select the features they want to count in their text collections.
 +
* Large set of linguistic features that include at least:
 +
 +
** Most frequent words of the texts analyzed
 +
** User-specified word lists
 +
** Word and Character n-grams of arbitrary length
 +
** Different stylometric features such as vocabulary diversity indices, readability indices, quantitative linguistic indices.
 +
* UTF-8 support
 +
* Corpus management features using text metadata
 +
 +
=== Related  repositories ===
 +
https://github.com/gmikros/Author_Multilevel_Ngram_Profiles
 +
 +
https://github.com/quanteda/quanteda
 +
 +
https://github.com/unDocUMeantIt/koRpus
 +
 +
https://miroslavkubat.webnode.cz/software/
 +
 +
https://github.com/bnosac/udpipe
 +
 +
https://github.com/explosion/spaCy
 +
 +
=== Knowledge Prerequisites ===
 +
Good knowledge of the languages R, Java, Python and skills for GUI
 +
interfaces development. Good understanding of NLP concepts and tools.
 +
 +
==== Mentors:   ====
 +
* <span lang="de-DE">[https://github.com/gmikros George Mikros] </span>
 +
* [https://github.com/fitsilisf Fotis Fitsilis]
 +
* [https://github.com/sleventis Sotiris Leventis]
 +
* [https://github.com/mfitsilis Michael Fitsilis ]   

Revision as of 12:44, 8 January 2019

GFOSS project proposals for GSOC 2019

Students interested to participate should check which of the following projects fits their interests and skills. For practical information for students visit this page. For additional information, please subscribe to this list and post your questions. The full list archives are available here.

Proposed Projects for GSOC 2019. The GSOC Projects implemented in 2018 & 2017 are available here.


Addition of Greek glyphs in Open Source Fonts

Brief Explanation

Many of the Open Source fonts (e.g., available at https://fonts.google.com), do not include glyphs for Greek letters and are therefore useless for using in a Greek environment.

The aim of this project is to imporve this situation and add the missing glyphs in the correct Unicode codepoints. The exact set of fonts to be completed will be determined in discussions between the student and the mentor(s).

This is not a typical programming project. If you have never designed fonts before, it is probably not for you.

Expected Results

Full support for Greek text in a number of Open Source fonts.

Knowledge Prerequisites

Type design, font technologies. Please note that this is a special project, where coding, in the traditional sense, will not be enough.

Mentors: Alexios ZavrasIrene Vlachou Εmilios Τheofanous

Symplegma

Brief Explanation

"Symplegma" stands for the combination of appropriate libraries for numerical computing with specialization to computational mechanics and orientation to educational and research purposes. Existing libraries, like "Apache Common Maths" for standard mathematics and statistics components, "FuturEye" a Java based Finite Element Method (FEM) Toolkit, "SymJava" for fast symbolic-numeric computation, among others, are combined with the in-house "Climax" library. "Climax" is a Java implementation of computational mechanics methods, e.g., the Boundary Element Method ("jbem" package) and the Finite Element Method ("jfem" package).

A simple IDE for manipulation of the above mentioned libraries, and possible extensions, has been developed in Java while it takes advantage of Apache Groovy, a powerful, optionally typed and dynamic language. That platform, under the acronym SDE, standing for Symplegma Development Environment.

Both educational and research activities are to be considered.

Expected Results

Toolbox development oriented to specific courses of higher education, Graphical User Environment update, extension of ploting capabilities.

Related  repositories

http://symplegma.org/ https://github.com/symplegma

Knowledge Prerequisites

numerical methods, computational mechanics, java, groovy

Mentors: George Manolis (gdm@civil.auth.gr), Christos Panagiotopoulos (pchr76@gmail.com)

clio — Software Components and IP Management System

Brief Explanation

clio is a web-based system to manage data on software components and their relations. It started out as a GSoC 2018 project. For the 2019 GSoC, the main goals would be: - improvement of the UI - integration of SDPX data - extension to covering of file info (time permitting)

Expected Results

improvements to clio

Related  repositories

code at https://github.com/eellak/clio demo at https://clio.ellak.gr/

Knowledge Prerequisites

Python, web front-end

Mentors: Alexios Zavras,

Replacement of LTSP

Brief Explanation

LTSP (Linux Terminal Service Project) allows diskless workstations to be netbooted from a single server image, with centralized authentication and home directories. But the project shows its age; the initial thin-client focused design is no longer suitable for the netbooted fat client/wayland era, and it contains a lot of stale source code. This GSoC project is about designing and implementing a modern replacement of LTSP.

Expected Results

A modern replacement of LTSP should be implemented, as outlined in http://wiki.ltsp.org/wiki/Dev:GSoC. It should be ready for inclusion in Debian/Ubuntu, for LTSP users to be able to slowly migrate to it.

Related  repositories

http://www.ltsp.org/ http://wiki.ltsp.org/wiki/Dev:GSoC


Knowledge Prerequisites

Netbooting internals, shell, python, git, debian packaging

Mentors: Yannis Siahos , Foteini Tsiami - Unofficial mentor, Debian & LTSP developer: Vagrant Cascadian

Port Qt Quick Controls Calendar widget to Qt Quick Controls 2 module

Brief Explanation

Qt is an open source cross platform framework facilitating GUI applications development, for mobile, desktop and embedded devices. Nowadays it is widely used in applications from a variety of industries like automotive or medical.  Although the framework is written in C++, it brings with it a meta-language (or modelling language), QML which’s purpose is to be used for creating the visual parts of the application easily and fast, thanks to its flexibility and clarity. To accelerate UI development, QML provides the Qt Quick Controls module with ready made widget types, each supported by a C++ class, like Button or Switch, ready to be styled and modified at our project needs. The module is currently on version 2.4 but there is no support for Calendar in the latest version, to be more specific, the Calendar was lastly provided in version 1.4 of the Qt Quick Controls module that was released with the Qt 5.3 version.

Expected Results

The Qt Calendar widget is updated, modified accordingly and ported into Qt 5.12 and Qt Quick Controls 2 current version. Ideally it will be upstreamed to Qt, contributing this way to the Qt ecosystem.

Related  repositories

https://github.com/extenly/qtqc2_calendar

Knowledge Prerequisites

* Qt, QML * C++, JavaScript

Mentors:  Alexandra Betouni alexandra.betouni@extenly.com

Development of a Tool for Extracting Quantitative Text Profiles

Brief Explanation

Quantitative text analysis is the basis of nearly every computational approach to text management and processing. All advanced Natural Language Processing (NLP) tasks including information retrieval, sentiment analysis, computational stylistics etc. involve the quantification of texts across a huge number of linguistic features and transform text into vectors. In many programming languages, e.g. R, Python, Java etc., there are numerous open source scripts, tools, packages and libraries that can transform texts to vectors of word frequencies, character and word n-gram frequencies, stylometric features etc. However, each of these tools covers only a restricted subset of the possible linguistic features.

Moreover, the available tools are written in different languages and require considerable efforts to be combined so that the user can extract a unified file of results. Due to the fragmentary nature of the programing environments and the highly technical skills that are required to operate the tools and combine their results, they can’t be used by large communities of scientists with humanities and sociopolitical background.

For the above reasons, we envisage the development of a user-friendly Graphical User Interface (GUI) based tool that shall provide integrated access to existing open NLP software. The new tool shall support the quantitative analysis of multilingual texts and produce quantitativetext profiles that can be used as input for further analysis, visualization, machine learning and other advanced computational processing. Such a tool does not exist to date and it will boost research in all scientific areas that require computational processing of large amounts of text.

Expected Results

The outcome of this project would be an open-source software with the following specifications: * User-friendly GUI that can guide intuitively its users to select the features they want to count in their text collections. * Large set of linguistic features that include at least:

** Most frequent words of the texts analyzed ** User-specified word lists ** Word and Character n-grams of arbitrary length ** Different stylometric features such as vocabulary diversity indices, readability indices, quantitative linguistic indices. * UTF-8 support * Corpus management features using text metadata

Related  repositories

https://github.com/gmikros/Author_Multilevel_Ngram_Profiles

https://github.com/quanteda/quanteda

https://github.com/unDocUMeantIt/koRpus

https://miroslavkubat.webnode.cz/software/

https://github.com/bnosac/udpipe

https://github.com/explosion/spaCy

Knowledge Prerequisites

Good knowledge of the languages R, Java, Python and skills for GUI interfaces development. Good understanding of NLP concepts and tools.

Mentors: 

* Fotis Fitsilis * Sotiris Leventis * Michael Fitsilis