Difference between revisions of "Google Summer of Code 2018 Accepted projects"

From Ελεύθερο Λογισμικό / Λογισμικό ανοιχτού κώδικα
Jump to navigation Jump to search
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
== Adding Greek language on NLP library Spacy.io ==
  
=== Google Summer of Code 2018 Accepted Projects ===
+
=== Description ===
 +
We live in the era of data. Every minute, 3.8 billion internet users, produce content; more than 120 million emails , 500.000 Facebook comments, 3 million Google searches. If we want to process that amount of data efficiently, we need to process natural language. Open source projects such as spaCy, textblob, or NLTK contribute signifficantly to that direction and thus they need to be reinforced.
  
 +
This project is about improving the quality of Natural Language Processing of Greek Language. The first step is to integrate Greek Language to spaCy. During that process, innovative approaches will be used. It is of vital importance for the writer and for the mentors of the program to identify which of them are of practical use for spaCy and to share the results in order to support any other open source enthusiast who is interested. In the fortunate scenario of successful integration of Greek Language to spaCy, the greek model will be trained and used for extraction of valuable information such as emotions detection in Greek texts, entity extraction, etc.
  
 +
This projects aims to achieve the following goals:
  
= Adding Greek language on NLP library Spacy.io ==
+
1. Integration of Greek language to spaCy.io platform
  
=== Brief Explanation: ===
+
2. Natural Language Processing of Greek documents in order to extract valuable information such as named entities, sentiment analysis, tags, etc.
Spacy is an open-source Python library for advanced Natural Language Processing. It's a very powerful and modern tool for applying NLP to real world problems. Among other functionality it provides Named Entity Recognition, deep learning integration, part-of-speech tagging and includes built in visualizers for syntax and NER. Spacy supports more than 25 languages but not Greek. Adding the Greek language will provide massive improvements on applying NLP on the Greek language, and allow for actions as Named entity recognition and Part-of-speech tagging
 
  
The procedure is well specified on https://spacy.io/usage/adding-languages, custom language data (stop words, tokenizer exceptions, punctuation rules etc) need to be added and tested.
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-spacy
  
=== Expected Results ===
+
=== Student ===
The vocabulary, syntax, entities and word vectors for the Greek language. These will be produced with Spacy/gensim, after the language information is successfully added.
+
[https://github.com/giannisdaras Ioannis Daras]
  
The Greek language model with then be added to Spacy.io for usage as a supported language model.
+
=== Mentors ===
 +
[https://github.com/mgogoulos Markos Gogoulos],  [https://github.com/louridas Panos Louridas]
  
As a real world scenario in order to test the language model, analysis on a large number of Official Greek Government's Gazette (FEK-ΦΕΚ) is proposed, in order to extract entities and categorize these documents.
 
  
=== Related repositories ===
+
   
https://github.com/explosion/spaCy
 
 
 
=== Knowledge Prerequisites ===
 
Strong knowledge of the Greek language, Python language fluency and Regular Expressions knowledge are necessary for this.
 
 
 
=== Mentors: [https://github.com/mgogoulos Markos Gogoulos]  [https://github.com/louridas Panos Louridas] ===
 
  
 
== Extraction of Responsibilities per unit in public sector organizations from the Government Gazette ==
 
== Extraction of Responsibilities per unit in public sector organizations from the Government Gazette ==
  
==== Brief Explanation: ====
+
=== Description ===
The objective of this project is to extend existing Government Gazette(GG) text mining code with Named Entity Recognition features that will allow the identification of Government Directorates and Divisions with the responsibilities assigned to them and the types of services they are required to provide according with their legal framework published in http://www.et.gr/ and the extraction of this information with related metadata (decision number, date of the GG issue). The aim is to link the management units with assigned roles and services per unit(Directorates, Divisions & Sections) and codify this specific information, which is hidden in the GG issue raw text. For this, the PDFs must be downloaded, converted into text and cleaned. Then, syntactic-based heuristics and/or machine learning techniques must be applied to identify specific Named Entities types with references to assigned responsibilities-services per unit(Directorates, Divisions & Sections) and links between the two must be extracted. Metadata concerning the GG issue and decision and/or law number will be also associated with the extracted association. The produced associations will be extracted in a machine usable/structured format (e.g. as RDF triples).
+
The objective of this project is to extend existing  
 +
Government Gazette (GG) text mining code with Named Entity Recognition  
 +
features that will allow the identification of Government Directorates  
 +
and Divisions with the responsibilities assigned to them, the types of  
 +
services they are required to provide according to their legal framework
 +
<nowiki> published in http://www.et.gr/</nowiki> and the extraction of this information with related metadata (decision number, date of the GG issue).
  
==== '''<br> '''Expected Results ====
+
The aim is to link the management units with assigned roles and
* A module for manually annotating related entities and responsibilities-services assignment sections in raw text
+
services per unit (Directorates, Divisions & Sections) and codify
* A NER module, with trained models for detecting  Governmental Directions and Divisions in raw text
+
this specific information, which is hidden in the GG issue raw text.
* A module that associates entities with responsibilities and extracts related metadata from the GG issue
 
  
==== Related  repositories ====
+
=== GSOC-2018 repositories ===
https://github.com/arisp8/gazette-analysis
+
https://github.com/eellak/gsoc2018-GG-extraction
  
==== '''<br> '''Knowledge Prerequisites ====
+
=== Student ===
Python, Java, Machine Learning
+
[https://github.com/ckarageorgkaneen Chris Karageorg Kaneen]  
 
 
==== Mentors: [https://www.dit.hua.gr/~varlamis/ Iraklis Varlamis], [https://users.ionio.gr/~sarantos/en.html Sarantos Kapidakis], [http://thalassa.ionio.gr/staff/moschopoulos/cv.pdf Dionysios Moschopoulos] [http://www.karounos.gr/blog/bio Theodoros Karounos]  ====
 
  
 +
=== Mentors ===
 +
[https://www.dit.hua.gr/~varlamis/ Iraklis Varlamis], [https://users.ionio.gr/~sarantos/en.html Sarantos Kapidakis], [http://thalassa.ionio.gr/staff/moschopoulos/cv.pdf Dionysios Moschopoulos] [http://www.karounos.gr/blog/bio Theodoros Karounos]
 +
  
 
== Epoptes ==
 
== Epoptes ==
  
==== Brief Explanation: ====
+
=== Description ===
'''Epoptes ('''Επόπτης  - a Greek word for overseer) is an open source computer lab management and monitoring tool. It allows for screen broadcasting  
+
Epoptes (Επόπτης  a Greek word for overseer) is an open source computer lab management and monitoring tool. It allows for screen broadcasting and monitoring, remote command execution, message sending, imposing restrictions like screen locking or sound muting the clients and much more! It can be installed in Ubuntu, Debian and openSUSE based labs that may contain any combination of the following: LTSP servers, thin and fat clients, non LTSP servers, standalone workstations, NX or XDMCP clients etc.
and monitoring, remote command execution, message sending, imposing restrictions like screen locking or sound muting the clients and much more! It can be installed in Ubuntu, Debian and openSUSE based labs that may contain ''any'' combination of the following: LTSP servers, thin and fat clients, non LTSP servers, standalone workstations, NX or XDMCP clients etc.                                                  
 
  
==== Related GitHub repositories ====
+
Epoptes has been undermaintained for the last couple of years. It's currently powered by Python 2 and GTK 2, while unfortunately a number of bugs have crept in due to major updates in Linux distribution packages (systemd, consolekit, VNC…).
https://www.github.com/Epoptes/epoptes
 
  
==== Expected Results ====
+
This project aims at reviving Epoptes with Python 3 and GTK 3 support, while also addressing several outstanding issues.
Rewrite Epoptes with Python 3 support  
 
  
Gtk3 with GObject Introspection instead of pygtk2
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-epoptes
  
Improvements in the code structure ( Break existing code into python modules/packages)
+
=== Student ===
 
+
[https://github.com/alkisg Alkis Georgopoulos]
==== Knowledge Prerequisites ====
 
Python
 
 
 
GTK
 
 
 
==== Mentors:  [https://github.com/ftsamis Fotis Tsiamis], [http://cde.athabascau.ca/ourpeople/instructors/tsinakos.php Avgoustos Tsinakos] ====
 
  
 +
=== Mentors ===
 +
[https://github.com/ftsamis Fotis Tsiamis], [http://cde.athabascau.ca/ourpeople/instructors/tsinakos.php Avgoustos Tsinakos]
 +
 +
 
== Government Gazette text mining, cross linking, and codification ==
 
== Government Gazette text mining, cross linking, and codification ==
  
==== Brief Explanation ====
+
=== Description ===
The objective of this project is to extend existing Government Gazette text mining code to cross-link legal texts and detect the ministers that sign them. For this the text PDFs need to be downloaded and converted into text. Then, heuristic rules must be applied to detect references to other legal texts, which will be converted into hypertext form. Similar techniques will be used to detect the competent ministers. Two possible extensions are proposed. First, detect amendments incorporated within another law. Second, implement a prototype for editing a law in its codified form (e.g. on GitHub) and automatically creating from the changes the text to be legislated (the differences from the original law).
+
In the recent years plenty of attention has been gathering around analyzing public sector texts via text mining methods enabled by modern libraries, algorithms and practices and bought to to the forefront by open source projects such as textblob, spaCy, SciPy, Tensorflow and NLTK. These collaborative productive efforts seem to be a shift towards more efficient understanding of natural language by machines which can be used in conjunction with public documents in order to provide a more robust organization and codification in the legal sector.
 
+
This project aims to extend the existing Government Gazette (GG) text mining code by implementing features in order to organize and cross)-link GG texts with legal texts and detect the signatories via heuristic and machine learning methods. This will enable elimination of bureaucratic processes and huge time savings for jurists who for example seek legal documents in the ISOKRATIS database of legal texts (which is an applicable case study).
==== Related GitHub repositories ====
 
https://github.com/arisp8/gazette-analysis
 
  
==== Expected Results ====
+
=== GSOC-2018 repositories ===
Detection of references to other laws; detection of competent ministers; codified legislation prototype
+
https://github.com/eellak/gsoc2018-3gm
  
==== Knowledge Prerequisites ====
+
=== Student ===
Python
+
[https://github.com/papachristoumarios Marios Papachristou]
  
==== Mentors: [https://www.spinellis.gr Diomidis Spinellis] [https://github.com/zvr Alexios Zavras]  [https://users.ionio.gr/~sarantos/en.html Sarantos Kapidakis] [http://thalassa.ionio.gr/staff/moschopoulos/cv.pdf Dionysios Moschopoulos] ====
+
=== Mentors ===
 +
[https://www.spinellis.gr/ Diomidis Spinellis] [https://github.com/zvr Alexios Zavras]  [https://users.ionio.gr/~sarantos/en.html Sarantos Kapidakis] [http://thalassa.ionio.gr/staff/moschopoulos/cv.pdf Dionysios Moschopoulos]
 +
  
 
== Libreoffice customization and creation of legal Templates for LibreOffice ==
 
== Libreoffice customization and creation of legal Templates for LibreOffice ==
  
==== Brief Explanation ====
+
=== Description ===
LibreOffice customization in order to achieve a "familiar" look and menus for users that convert from MS Office 2013, and creation of specific templates for the Greek Legal system. The customization and templates should follow the development guidelines at https://wiki.documentfoundation.org/Development/GetInvolved .  
+
A set of modules and templates for LibreOffice Suite that ease the transition from Microsoft Office as well as ready to use templates that automate the creation of Greek Legal Documents. Those templates aim to encounter time consuming tasks by removing the formatting and layout procedures from employee work-flow. Furthermore, an interface to access all those templates will be developed. All steps will be documented during the process and afterwards for future reference and development.
 
 
==== Expected results ====
 
* Development of specific menu customizations through the use of [https://api.libreoffice.org/ Libreoffice Software Development Kit 6.0] in various modules of Libreoffice (eg https://api.libreoffice.org/docs/idl/ref/namespacecom_1_1sun_1_1star_1_1ui.html) 
 
* Design and development of Templates and LibreOffice applications that request/get and fill specific information in the templates through the use of  APIs for the Greek legal system
 
  
Customization and Templates  should be accompanied with detailed documentation and instructions for developers and end users.
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-librecust
  
==== Knowledge Prerequisites<br> ====
+
=== Student ===
* C
+
[https://github.com/arvchristos Christos Arvanitis]
* C++
 
* Java
 
* Python
 
* Bash
 
* Perl
 
* Libreoffice Software Development Kit 6.0
 
  
==== Mentors: [https://github.com/pkst-ellak Kostas Papadimas] [http://www.karounos.gr/blog/bio Theodoros Karounos] [https://www.spinellis.gr Diomidis Spinellis] ====
+
=== Mentors ===
 +
[https://github.com/pkst-ellak Kostas Papadimas] [http://www.karounos.gr/blog/bio Theodoros Karounos] [https://www.spinellis.gr/ Diomidis Spinellis]
 +
  
 
== Software components and IP management ==
 
== Software components and IP management ==
  
More details in the separate page [https://ellak.gr/wiki/index.php?title=Clio Clio].
+
=== Description ===
 
+
Clio is a web based system for maintaining (meta-)information on software components.
==== Brief Explanation ====
 
  
A web-based system to manage data on software components and their relations.
+
Nowadays every piece of software is including and using many other software components, each one coming with their own license.  
 
 
Nowadays every piece of software is including and using many other software components, each one coming with their own license.
 
  
 
The goal of this project is to build a simple web system to be able to (manually) input and maintain this information!
 
The goal of this project is to build a simple web system to be able to (manually) input and maintain this information!
Line 120: Line 109:
 
This is a brand-new project; some analysis has been done but no code is available yet.
 
This is a brand-new project; some analysis has been done but no code is available yet.
  
==== Expected Results ====
+
More details in the separate page [https://ellak.gr/wiki/index.php?title=Clio Clio].
  
A complete web-based system to manage the above-mentioned data.
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-clio
  
==== Knowledge Prerequisites ====
+
=== Student ===
 +
[https://github.com/gopuvenkat Gopalakrishnan.V]
  
Web (any technology welcome)
+
=== Mentors ===
 
+
[https://github.com/zvr Alexios Zavras], Georgia Kapitsaki
 
+
==== Mentors: [https://github.com/zvr Alexios Zavras] Georgia Kapitsaki ====
 
  
 
== WSO2 Identity Server Userstore using Web Services  to get claims ==
 
== WSO2 Identity Server Userstore using Web Services  to get claims ==
  
==== Brief Explanation ====
+
=== Description ===
 +
WSO2 Identity and Access Management Server is open source popular identity and access management server throughout the world, plus WSO2 Identity Server efficiently undertakes the complex task of identity management across enterprise applications, services, and APIs.
  
WSO2 Identity Server provides secure identity management for enterprise web applications, services, and APIs by managing identity and entitlements of the users securely and efficiently. The Identity Server enables enterprise architects and developers to reduce identity provisioning time, guarantee secure online interactions, and deliver a reduced single sign-on environment. WSO2 Identity Server is fully open source and is released under Apache Software License Version 2.0.  
+
This project is based on the WSO2 Identity server version 5.4. Currently, the WSO2 identity server is consisting of SOAP services and in the near future, there will be REST API's which support for all functionalities and which is more effective. In current environment most It supports for different user stores like LDAP, JDBC, and MySQL as primary and secondary user stores.
  
The aim of this project is to create a new type of userstore where credentials will be separeted from attirbutes and attributes (claims) will be able to be configured from the web UI as a SOAP or REST web service. The end-user should be able to  
+
WSO2 Identity server allows configuring multiple user stores to the system that are used to store users and roles. AS there are 2 types of user stores as a primary user store  (mandatory) and secondary user store (optional). And all the user information is peristing on a single user store in this version. From this implementation it will separate as credential userstore and attribute user store. Attribute user store is simply used to store claims details which can be accessed by providing the user credential and secrete.With the having facility of creating a new user store the primary data which are saved to primary user store can be separated to different user stores as one for user details and other one is for user attribute (claims) details which can be accessed by providing user credentials and
* configure credentials for LDAP or JDBC
+
<nowiki> </nowiki>secrete.
* configure web service authentication
 
* configure claims to consume the above web service
 
'''Expected Results'''
 
  
A new userstore where end-user can configure using existing web interface, user claims through web services  client. The appropriate changes in the source code should be uploaded in the upstream branch of the latest version (5.4.0)
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-wso2
  
==== Related GitHub repository ====
+
=== Student ===
https://wso2.github.io/
+
[https://github.com/isuri97 Isuri Anuradha]
  
https://wso2.github.io/using-maven.html
+
=== Mentors ===
 
+
[https://www.linkedin.com/in/kranidiotis/ Panagiotis Kranidiotis] [http://www.csd.auth.gr/en/staff/faculty?view=user&ro=1&id=14 Stamelos Ioannis]
https://wso2.github.io/github-repositories.html#IS
+
 
 
https://wso2.github.io/github-repositories.html
 
 
 
==== Knowledge Prerequisites ====
 
 
 
* Java JSP
 
* JSTL
 
* Maven
 
* OSGI Framework
 
* A modern development framework for interactive web content
 
 
 
==== Mentors: [https://www.linkedin.com/in/kranidiotis/ Panagiotis Kranidiotis] [http://www.csd.auth.gr/en/staff/faculty?view=user&ro=1&id=14 Stamelos Ioannis] ====
 
  
 
== Python PenTest Library (PyPen) ==
 
== Python PenTest Library (PyPen) ==
 +
A collection of tools supporting penetration testers.
  
A collection of tools supporting penetration testers
+
=== Description ===
 
 
==== Brief Explanation ====
 
 
 
 
Development of a Python library for penetration testers. The library will include a set of tools for performing the basic tasks for attacking a remote host. It will include reconnaissance tools such as modules that will be able to collect data for a specific target either through the web or through user input. Moreover, other tools will be developed to create custom dictionaries for username and password attacks. Other attack techniques that will be supported include DoS attack, BruteForce attack as well as Inclusion attack. The library will also include various statistical functions for extracting additional information from a captured host.
 
Development of a Python library for penetration testers. The library will include a set of tools for performing the basic tasks for attacking a remote host. It will include reconnaissance tools such as modules that will be able to collect data for a specific target either through the web or through user input. Moreover, other tools will be developed to create custom dictionaries for username and password attacks. Other attack techniques that will be supported include DoS attack, BruteForce attack as well as Inclusion attack. The library will also include various statistical functions for extracting additional information from a captured host.
  
==== Related GitHub repositories ====
+
=== GSOC-2018 repositories ===
 
+
https://github.com/eellak/gsoc2018-pypen
https://github.com/jmortega/python-pentesting
 
 
 
==== Expected Results ====
 
 
 
Development of an independent Python library which will also integrate other existing and well consolidated tools such as CUPP (already in Kali Linux) for assisting in penetration testing.
 
 
 
Proposed tools
 
 
 
A. User Reconnaissance & Information gathering
 
 
 
Α.1/ PyFBSniff: Facebook scraper
 
 
 
Α.2/ PyGenUser: Username list creation
 
 
 
Α.3/ PyDic: Dictionary creation
 
 
 
Future extensions will include tools similar to PyFBSniff for other social media such as Twitter and Google+.
 
 
 
 
 
B. Target System Reconnaissance & Information gathering
 
 
 
A collection of supportive tools gathering and presenting information about the Operating System and its processes.
 
 
 
 
 
Β.1/ PyPScanner: Port Scanner
 
  
Β.2/ PyPidStat: Process statistics creation
+
=== Student ===
 +
[https://github.com/stikos Konstantinos Liosis]
  
Β.3/ PySocketStat: Socket statistics creation
+
=== Mentors ===
 +
[https://www.researchgate.net/profile/Antonios_Andreatos Antonios Andreatos], [https://www.linkedin.com/in/panagiotis-karampelas-5868002/ Panagiotis Karampelas], [http://www.cslab.ece.ntua.gr/~pavlatos/ Christos Pavlatos]
 +
  
Β.4/ PyPipeStat: Pipe statistics creation
+
== Addition of Greek glyphs in the Open Source Fonts ArimaMadurai ==
  
Β.5/ PyFileStat: File statistics creation
+
=== Description ===
 +
This project aims to extend the collection of fonts supporting Greek script in the Google Fonts Catalog. Indeed, today 19 serif fonts, 6 monospace fonts and 10 sans-serif fonts supporting Greek script are available. Moreover, only 2 fonts are explicitly intended for display text.
  
 +
Arima Madurai is a font created by Natanael Gana and Joana Correia of NDISCOVER — a Portuguese type foundry. It is a multiscripts display font with 8 weights from thin to black and have a strong calligraphic influence. It has a lot of personality so it can be recognisable in headlines or brand names uses. I value the quality of the design and thanks to its low contrasts, it allows a good legibility and rendering on screen.
  
C. Attack PenTest tools
+
Regarding the history of Greek script, it is interesting and challenging to design a typeface with a calligraphic feel: in terms of design but also in terms of study. There are remarkable examples of Greek punch cutting from the most talented historical figures. The challenge will be to respect that history while keeping a well anchored contemporary form.
  
C.1/ PyDoS : DoS attack by flooding
+
Arima Madurai already supports Tamil, Malayalam and Latin scripts and I would like to add Greek script to the glyphset. The fact that the font already supports multi scripts is a real benefit to the project: Arima Madurai already acts in non latin typographic environment and therefore displays a large set of shapes that can be used to match the Greek glyphs with the other ones.
  
C.2/ PyBruftp: Bruteforce attack to ftp server
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-arimamadurai
  
C.3/ PyRansom: Ransomware script
+
=== Student ===
 +
[https://github.com/RosaWagner Rosalie Wagner]
  
The library will be expandable in order to incorporate more tools in the future.
+
=== Mentors ===
 +
[https://github.com/zvr Alexios Zavras], [https://github.com/irenevl Irene Vlachou] [https://github.com/thynem Εmilios Τheofanous]
 +
  
 +
== Addition of Greek glyphs in the Open Source Fonts Cantarell ==
  
==== Knowledge Prerequisites ====
+
=== Description ===
 +
Cantarell is a humanist sans serif typeface optimized for on-screen reading. It was originally developed by Dave Crossland in the MA Typeface Design class of 2009 at the University of Reading using free software. Subsequently, it was licensed under an SIL Open Font License and has been the standard UI typeface for the open-source desktop environment GNOME since version 3.0 in 2010.
  
Python fluency
+
The fonts have been redesigned for the release of GNOME 3.28 in March 2018. Post-script outline quality improved significantly, spacing has been reworked and new weights have been added.
  
OS basics
+
The family is currently growing to support additional writing systems. After initially applying with extending another typeface I was invited to change my project and add Monotonic and Polytonic Greek to the three Roman masters of Cantarell during GSoC 2018.
  
Networking basics
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-cantarell
  
PenTest basics
+
=== Student ===
 +
[https://github.com/grautesk Florian Fecher]
  
 +
=== Mentors ===
 +
[https://github.com/zvr Alexios Zavras], [https://github.com/irenevl Irene Vlachou] [https://github.com/thynem Εmilios Τheofanous]
  
==== Mentors [https://haf.academia.edu/AntoniosAndreatos Antonios Andreatos], [https://www.linkedin.com/in/panagiotis-karampelas-5868002/ Panagiotis Karampelas], [http://www.cslab.ece.ntua.gr/~pavlatos/ Christos Pavlatos] ====
+
[[Κατηγορία:GSOC2018]] [[Κατηγορία:GSOC]]

Latest revision as of 15:13, 5 February 2020

Adding Greek language on NLP library Spacy.io[edit | edit source]

Description[edit | edit source]

We live in the era of data. Every minute, 3.8 billion internet users, produce content; more than 120 million emails , 500.000 Facebook comments, 3 million Google searches. If we want to process that amount of data efficiently, we need to process natural language. Open source projects such as spaCy, textblob, or NLTK contribute signifficantly to that direction and thus they need to be reinforced.

This project is about improving the quality of Natural Language Processing of Greek Language. The first step is to integrate Greek Language to spaCy. During that process, innovative approaches will be used. It is of vital importance for the writer and for the mentors of the program to identify which of them are of practical use for spaCy and to share the results in order to support any other open source enthusiast who is interested. In the fortunate scenario of successful integration of Greek Language to spaCy, the greek model will be trained and used for extraction of valuable information such as emotions detection in Greek texts, entity extraction, etc.

This projects aims to achieve the following goals:

1. Integration of Greek language to spaCy.io platform

2. Natural Language Processing of Greek documents in order to extract valuable information such as named entities, sentiment analysis, tags, etc.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-spacy

Student[edit | edit source]

Ioannis Daras

Mentors[edit | edit source]

Markos Gogoulos, Panos Louridas



Extraction of Responsibilities per unit in public sector organizations from the Government Gazette[edit | edit source]

Description[edit | edit source]

The objective of this project is to extend existing Government Gazette (GG) text mining code with Named Entity Recognition features that will allow the identification of Government Directorates and Divisions with the responsibilities assigned to them, the types of services they are required to provide according to their legal framework published in http://www.et.gr/ and the extraction of this information with related metadata (decision number, date of the GG issue).

The aim is to link the management units with assigned roles and services per unit (Directorates, Divisions & Sections) and codify this specific information, which is hidden in the GG issue raw text.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-GG-extraction

Student[edit | edit source]

Chris Karageorg Kaneen

Mentors[edit | edit source]

Iraklis Varlamis, Sarantos Kapidakis, Dionysios Moschopoulos Theodoros Karounos


Epoptes[edit | edit source]

Description[edit | edit source]

Epoptes (Επόπτης a Greek word for overseer) is an open source computer lab management and monitoring tool. It allows for screen broadcasting and monitoring, remote command execution, message sending, imposing restrictions like screen locking or sound muting the clients and much more! It can be installed in Ubuntu, Debian and openSUSE based labs that may contain any combination of the following: LTSP servers, thin and fat clients, non LTSP servers, standalone workstations, NX or XDMCP clients etc.

Epoptes has been undermaintained for the last couple of years. It's currently powered by Python 2 and GTK 2, while unfortunately a number of bugs have crept in due to major updates in Linux distribution packages (systemd, consolekit, VNC…).

This project aims at reviving Epoptes with Python 3 and GTK 3 support, while also addressing several outstanding issues.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-epoptes

Student[edit | edit source]

Alkis Georgopoulos

Mentors[edit | edit source]

Fotis Tsiamis, Avgoustos Tsinakos


Government Gazette text mining, cross linking, and codification[edit | edit source]

Description[edit | edit source]

In the recent years plenty of attention has been gathering around analyzing public sector texts via text mining methods enabled by modern libraries, algorithms and practices and bought to to the forefront by open source projects such as textblob, spaCy, SciPy, Tensorflow and NLTK. These collaborative productive efforts seem to be a shift towards more efficient understanding of natural language by machines which can be used in conjunction with public documents in order to provide a more robust organization and codification in the legal sector. This project aims to extend the existing Government Gazette (GG) text mining code by implementing features in order to organize and cross)-link GG texts with legal texts and detect the signatories via heuristic and machine learning methods. This will enable elimination of bureaucratic processes and huge time savings for jurists who for example seek legal documents in the ISOKRATIS database of legal texts (which is an applicable case study).

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-3gm

Student[edit | edit source]

Marios Papachristou

Mentors[edit | edit source]

Diomidis Spinellis Alexios Zavras Sarantos Kapidakis Dionysios Moschopoulos


Libreoffice customization and creation of legal Templates for LibreOffice[edit | edit source]

Description[edit | edit source]

A set of modules and templates for LibreOffice Suite that ease the transition from Microsoft Office as well as ready to use templates that automate the creation of Greek Legal Documents. Those templates aim to encounter time consuming tasks by removing the formatting and layout procedures from employee work-flow. Furthermore, an interface to access all those templates will be developed. All steps will be documented during the process and afterwards for future reference and development.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-librecust

Student[edit | edit source]

Christos Arvanitis

Mentors[edit | edit source]

Kostas Papadimas Theodoros Karounos Diomidis Spinellis


Software components and IP management[edit | edit source]

Description[edit | edit source]

Clio is a web based system for maintaining (meta-)information on software components.

Nowadays every piece of software is including and using many other software components, each one coming with their own license.

The goal of this project is to build a simple web system to be able to (manually) input and maintain this information!

This is a brand-new project; some analysis has been done but no code is available yet.

More details in the separate page Clio.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-clio

Student[edit | edit source]

Gopalakrishnan.V

Mentors[edit | edit source]

Alexios Zavras, Georgia Kapitsaki


WSO2 Identity Server Userstore using Web Services to get claims[edit | edit source]

Description[edit | edit source]

WSO2 Identity and Access Management Server is open source popular identity and access management server throughout the world, plus WSO2 Identity Server efficiently undertakes the complex task of identity management across enterprise applications, services, and APIs.

This project is based on the WSO2 Identity server version 5.4. Currently, the WSO2 identity server is consisting of SOAP services and in the near future, there will be REST API's which support for all functionalities and which is more effective. In current environment most It supports for different user stores like LDAP, JDBC, and MySQL as primary and secondary user stores.

WSO2 Identity server allows configuring multiple user stores to the system that are used to store users and roles. AS there are 2 types of user stores as a primary user store (mandatory) and secondary user store (optional). And all the user information is peristing on a single user store in this version. From this implementation it will separate as credential userstore and attribute user store. Attribute user store is simply used to store claims details which can be accessed by providing the user credential and secrete.With the having facility of creating a new user store the primary data which are saved to primary user store can be separated to different user stores as one for user details and other one is for user attribute (claims) details which can be accessed by providing user credentials and secrete.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-wso2

Student[edit | edit source]

Isuri Anuradha

Mentors[edit | edit source]

Panagiotis Kranidiotis Stamelos Ioannis


Python PenTest Library (PyPen)[edit | edit source]

A collection of tools supporting penetration testers.

Description[edit | edit source]

Development of a Python library for penetration testers. The library will include a set of tools for performing the basic tasks for attacking a remote host. It will include reconnaissance tools such as modules that will be able to collect data for a specific target either through the web or through user input. Moreover, other tools will be developed to create custom dictionaries for username and password attacks. Other attack techniques that will be supported include DoS attack, BruteForce attack as well as Inclusion attack. The library will also include various statistical functions for extracting additional information from a captured host.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-pypen

Student[edit | edit source]

Konstantinos Liosis

Mentors[edit | edit source]

Antonios Andreatos, Panagiotis Karampelas, Christos Pavlatos


Addition of Greek glyphs in the Open Source Fonts ArimaMadurai[edit | edit source]

Description[edit | edit source]

This project aims to extend the collection of fonts supporting Greek script in the Google Fonts Catalog. Indeed, today 19 serif fonts, 6 monospace fonts and 10 sans-serif fonts supporting Greek script are available. Moreover, only 2 fonts are explicitly intended for display text.

Arima Madurai is a font created by Natanael Gana and Joana Correia of NDISCOVER — a Portuguese type foundry. It is a multiscripts display font with 8 weights from thin to black and have a strong calligraphic influence. It has a lot of personality so it can be recognisable in headlines or brand names uses. I value the quality of the design and thanks to its low contrasts, it allows a good legibility and rendering on screen.

Regarding the history of Greek script, it is interesting and challenging to design a typeface with a calligraphic feel: in terms of design but also in terms of study. There are remarkable examples of Greek punch cutting from the most talented historical figures. The challenge will be to respect that history while keeping a well anchored contemporary form.

Arima Madurai already supports Tamil, Malayalam and Latin scripts and I would like to add Greek script to the glyphset. The fact that the font already supports multi scripts is a real benefit to the project: Arima Madurai already acts in non latin typographic environment and therefore displays a large set of shapes that can be used to match the Greek glyphs with the other ones.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-arimamadurai

Student[edit | edit source]

Rosalie Wagner

Mentors[edit | edit source]

Alexios Zavras, Irene Vlachou Εmilios Τheofanous


Addition of Greek glyphs in the Open Source Fonts Cantarell[edit | edit source]

Description[edit | edit source]

Cantarell is a humanist sans serif typeface optimized for on-screen reading. It was originally developed by Dave Crossland in the MA Typeface Design class of 2009 at the University of Reading using free software. Subsequently, it was licensed under an SIL Open Font License and has been the standard UI typeface for the open-source desktop environment GNOME since version 3.0 in 2010.

The fonts have been redesigned for the release of GNOME 3.28 in March 2018. Post-script outline quality improved significantly, spacing has been reworked and new weights have been added.

The family is currently growing to support additional writing systems. After initially applying with extending another typeface I was invited to change my project and add Monotonic and Polytonic Greek to the three Roman masters of Cantarell during GSoC 2018.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-cantarell

Student[edit | edit source]

Florian Fecher

Mentors[edit | edit source]

Alexios Zavras, Irene Vlachou Εmilios Τheofanous