Google Summer of Code 2018 Accepted projects

Από Ελεύθερο Λογισμικό / Λογισμικό ανοιχτού κώδικα
Μετάβαση σε: πλοήγηση, αναζήτηση

Adding Greek language on NLP library Spacy.io

Description

We live in the era of data. Every minute, 3.8 billion internet users, produce content; more than 120 million emails , 500.000 Facebook comments, 3 million Google searches. If we want to process that amount of data efficiently, we need to process natural language. Open source projects such as spaCy, textblob, or NLTK contribute signifficantly to that direction and thus they need to be reinforced.

This project is about improving the quality of Natural Language Processing of Greek Language. The first step is to integrate Greek Language to spaCy. During that process, innovative approaches will be used. It is of vital importance for the writer and for the mentors of the program to identify which of them are of practical use for spaCy and to share the results in order to support any other open source enthusiast who is interested. In the fortunate scenario of successful integration of Greek Language to spaCy, the greek model will be trained and used for extraction of valuable information such as emotions detection in Greek texts, entity extraction, etc.

This projects aims to achieve the following goals:

1. Integration of Greek language to spaCy.io platform

2. Natural Language Processing of Greek documents in order to extract valuable information such as named entities, sentiment analysis, tags, etc.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-spacy

Student

Ioannis Daras

Mentors

Markos Gogoulos, Panos Louridas


Extraction of Responsibilities per unit in public sector organizations from the Government Gazette

Description

The objective of this project is to extend existing Government Gazette (GG) text mining code with Named Entity Recognition features that will allow the identification of Government Directorates and Divisions with the responsibilities assigned to them, the types of services they are required to provide according to their legal framework published in http://www.et.gr/ and the extraction of this information with related metadata (decision number, date of the GG issue).

The aim is to link the management units with assigned roles and services per unit (Directorates, Divisions & Sections) and codify this specific information, which is hidden in the GG issue raw text.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-GG-extraction

Student

Chris Karageorg Kaneen

Mentors

Iraklis Varlamis, Sarantos Kapidakis, Dionysios Moschopoulos Theodoros Karounos


Epoptes

Description

Epoptes (Επόπτης a Greek word for overseer) is an open source computer lab management and monitoring tool. It allows for screen broadcasting and monitoring, remote command execution, message sending, imposing restrictions like screen locking or sound muting the clients and much more! It can be installed in Ubuntu, Debian and openSUSE based labs that may contain any combination of the following: LTSP servers, thin and fat clients, non LTSP servers, standalone workstations, NX or XDMCP clients etc.

Epoptes has been undermaintained for the last couple of years. It's currently powered by Python 2 and GTK 2, while unfortunately a number of bugs have crept in due to major updates in Linux distribution packages (systemd, consolekit, VNC…).

This project aims at reviving Epoptes with Python 3 and GTK 3 support, while also addressing several outstanding issues.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-epoptes

Student

Alkis Georgopoulos

Mentors

Fotis Tsiamis, Avgoustos Tsinakos


Government Gazette text mining, cross linking, and codification

Description

In the recent years plenty of attention has been gathering around analyzing public sector texts via text mining methods enabled by modern libraries, algorithms and practices and bought to to the forefront by open source projects such as textblob, spaCy, SciPy, Tensorflow and NLTK. These collaborative productive efforts seem to be a shift towards more efficient understanding of natural language by machines which can be used in conjunction with public documents in order to provide a more robust organization and codification in the legal sector. This project aims to extend the existing Government Gazette (GG) text mining code by implementing features in order to organize and cross)-link GG texts with legal texts and detect the signatories via heuristic and machine learning methods. This will enable elimination of bureaucratic processes and huge time savings for jurists who for example seek legal documents in the ISOKRATIS database of legal texts (which is an applicable case study).

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-3gm

Student

Marios Papachristou

Mentors

Diomidis Spinellis Alexios Zavras Sarantos Kapidakis Dionysios Moschopoulos


Libreoffice customization and creation of legal Templates for LibreOffice

Description

A set of modules and templates for LibreOffice Suite that ease the transition from Microsoft Office as well as ready to use templates that automate the creation of Greek Legal Documents. Those templates aim to encounter time consuming tasks by removing the formatting and layout procedures from employee work-flow. Furthermore, an interface to access all those templates will be developed. All steps will be documented during the process and afterwards for future reference and development.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-librecust

Student

Christos Arvanitis

Mentors

Kostas Papadimas Theodoros Karounos Diomidis Spinellis


Software components and IP management

Description

Clio is a web based system for maintaining (meta-)information on software components.

Nowadays every piece of software is including and using many other software components, each one coming with their own license.

The goal of this project is to build a simple web system to be able to (manually) input and maintain this information!

This is a brand-new project; some analysis has been done but no code is available yet.

More details in the separate page Clio.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-clio

Student

Gopalakrishnan.V

Mentors

Alexios Zavras, Georgia Kapitsaki


WSO2 Identity Server Userstore using Web Services to get claims

Description

WSO2 Identity and Access Management Server is open source popular identity and access management server throughout the world, plus WSO2 Identity Server efficiently undertakes the complex task of identity management across enterprise applications, services, and APIs.

This project is based on the WSO2 Identity server version 5.4. Currently, the WSO2 identity server is consisting of SOAP services and in the near future, there will be REST API's which support for all functionalities and which is more effective. In current environment most It supports for different user stores like LDAP, JDBC, and MySQL as primary and secondary user stores.

WSO2 Identity server allows configuring multiple user stores to the system that are used to store users and roles. AS there are 2 types of user stores as a primary user store (mandatory) and secondary user store (optional). And all the user information is peristing on a single user store in this version. From this implementation it will separate as credential userstore and attribute user store. Attribute user store is simply used to store claims details which can be accessed by providing the user credential and secrete.With the having facility of creating a new user store the primary data which are saved to primary user store can be separated to different user stores as one for user details and other one is for user attribute (claims) details which can be accessed by providing user credentials and secrete.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-wso2

Student

Isuri Anuradha

Mentors

Panagiotis Kranidiotis Stamelos Ioannis


Python PenTest Library (PyPen)

A collection of tools supporting penetration testers.

Description

Development of a Python library for penetration testers. The library will include a set of tools for performing the basic tasks for attacking a remote host. It will include reconnaissance tools such as modules that will be able to collect data for a specific target either through the web or through user input. Moreover, other tools will be developed to create custom dictionaries for username and password attacks. Other attack techniques that will be supported include DoS attack, BruteForce attack as well as Inclusion attack. The library will also include various statistical functions for extracting additional information from a captured host.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-pypen

Student

Konstantinos Liosis

Mentors

Antonios Andreatos, Panagiotis Karampelas, Christos Pavlatos


Addition of Greek glyphs in the Open Source Fonts ArimaMadurai

Description

This project aims to extend the collection of fonts supporting Greek script in the Google Fonts Catalog. Indeed, today 19 serif fonts, 6 monospace fonts and 10 sans-serif fonts supporting Greek script are available. Moreover, only 2 fonts are explicitly intended for display text.

Arima Madurai is a font created by Natanael Gana and Joana Correia of NDISCOVER — a Portuguese type foundry. It is a multiscripts display font with 8 weights from thin to black and have a strong calligraphic influence. It has a lot of personality so it can be recognisable in headlines or brand names uses. I value the quality of the design and thanks to its low contrasts, it allows a good legibility and rendering on screen.

Regarding the history of Greek script, it is interesting and challenging to design a typeface with a calligraphic feel: in terms of design but also in terms of study. There are remarkable examples of Greek punch cutting from the most talented historical figures. The challenge will be to respect that history while keeping a well anchored contemporary form.

Arima Madurai already supports Tamil, Malayalam and Latin scripts and I would like to add Greek script to the glyphset. The fact that the font already supports multi scripts is a real benefit to the project: Arima Madurai already acts in non latin typographic environment and therefore displays a large set of shapes that can be used to match the Greek glyphs with the other ones.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-arimamadurai

Student

Rosalie Wagner

Mentors

Alexios Zavras, Irene Vlachou Εmilios Τheofanous


Addition of Greek glyphs in the Open Source Fonts Cantarell

Description

Cantarell is a humanist sans serif typeface optimized for on-screen reading. It was originally developed by Dave Crossland in the MA Typeface Design class of 2009 at the University of Reading using free software. Subsequently, it was licensed under an SIL Open Font License and has been the standard UI typeface for the open-source desktop environment GNOME since version 3.0 in 2010.

The fonts have been redesigned for the release of GNOME 3.28 in March 2018. Post-script outline quality improved significantly, spacing has been reworked and new weights have been added.

The family is currently growing to support additional writing systems. After initially applying with extending another typeface I was invited to change my project and add Monotonic and Polytonic Greek to the three Roman masters of Cantarell during GSoC 2018.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-cantarell

Student

Florian Fecher

Mentors

Alexios Zavras, Irene Vlachou Εmilios Τheofanous