Difference between revisions of "Google Summer of Code 2018 Accepted projects"

From Ελεύθερο Λογισμικό / Λογισμικό ανοιχτού κώδικα
Jump to: navigation, search
Line 1: Line 1:
 
 
 
== Adding Greek language on NLP library Spacy.io ==
 
== Adding Greek language on NLP library Spacy.io ==
  
=== Brief Explanation: ===
+
=== Description ===
Spacy is an open-source Python library for advanced Natural Language Processing. It's a very powerful and modern tool for applying NLP to real world problems. Among other functionality it provides Named Entity Recognition, deep learning integration, part-of-speech tagging and includes built in visualizers for syntax and NER. Spacy supports more than 25 languages but not Greek. Adding the Greek language will provide massive improvements on applying NLP on the Greek language, and allow for actions as Named entity recognition and Part-of-speech tagging
+
We live in the era of data. Every minute, 3.8 billion internet users, produce content; more than 120 million emails , 500.000 Facebook comments, 3 million Google searches. If we want to process that amount of data efficiently, we need to process natural language. Open source projects such as spaCy, textblob, or NLTK contribute signifficantly to that direction and thus they need to be reinforced.
  
The procedure is well specified on https://spacy.io/usage/adding-languages, custom language data (stop words, tokenizer exceptions, punctuation rules etc) need to be added and tested.
+
This project is about improving the quality of Natural Language Processing of Greek Language. The first step is to integrate Greek Language to spaCy. During that process, innovative approaches will be used. It is of vital importance for the writer and for the mentors of the program to identify which of them are of practical use for spaCy and to share the results in order to support any other open source enthusiast who is interested. In the fortunate scenario of successful integration of Greek Language to spaCy, the greek model will be trained and used for extraction of valuable information such as emotions detection in Greek texts, entity extraction, etc.
  
=== Expected Results ===
+
This projects aims to achieve the following goals:
The vocabulary, syntax, entities and word vectors for the Greek language. These will be produced with Spacy/gensim, after the language information is successfully added.
+
  
The Greek language model with then be added to Spacy.io for usage as a supported language model.
+
1. Integration of Greek language to spaCy.io platform
  
As a real world scenario in order to test the language model, analysis on a large number of Official Greek Government's Gazette (FEK-ΦΕΚ) is proposed, in order to extract entities and categorize these documents.
+
2. Natural Language Processing of Greek documents in order to extract valuable information such as named entities, sentiment analysis, tags, etc.
  
=== Related  repositories ===
+
=== GSOC-2018 repositories ===
https://github.com/explosion/spaCy
+
https://github.com/eellak/gsoc2018-spacy
  
=== Knowledge Prerequisites ===
+
=== Student ===
Strong knowledge of the Greek language, Python language fluency and Regular Expressions knowledge are necessary for this.
+
[https://github.com/giannisdaras Ioannis Daras]
  
=== Mentors: [https://github.com/mgogoulos Markos Gogoulos]  [https://github.com/louridas Panos Louridas] ===
+
=== Mentors ===
 +
[https://github.com/mgogoulos Markos Gogoulos], [https://github.com/louridas Panos Louridas]
 +
  
 
== Extraction of Responsibilities per unit in public sector organizations from the Government Gazette ==
 
== Extraction of Responsibilities per unit in public sector organizations from the Government Gazette ==
  
==== Brief Explanation: ====
+
=== Description ===
The objective of this project is to extend existing Government Gazette(GG) text mining code with Named Entity Recognition features that will allow the identification of Government Directorates and Divisions with the responsibilities assigned to them and the types of services they are required to provide according with their legal framework published in http://www.et.gr/ and the extraction of this information with related metadata (decision number, date of the GG issue). The aim is to link the management units with assigned roles and services per unit(Directorates, Divisions & Sections) and codify this specific information, which is hidden in the GG issue raw text. For this, the PDFs must be downloaded, converted into text and cleaned. Then, syntactic-based heuristics and/or machine learning techniques must be applied to identify specific Named Entities types with references to assigned responsibilities-services per unit(Directorates, Divisions & Sections) and links between the two must be extracted. Metadata concerning the GG issue and decision and/or law number will be also associated with the extracted association. The produced associations will be extracted in a machine usable/structured format (e.g. as RDF triples).
+
The objective of this project is to extend existing  
 +
Government Gazette (GG) text mining code with Named Entity Recognition  
 +
features that will allow the identification of Government Directorates  
 +
and Divisions with the responsibilities assigned to them, the types of  
 +
services they are required to provide according to their legal framework
 +
<nowiki> published in http://www.et.gr/</nowiki> and the extraction of this information with related metadata (decision number, date of the GG issue).
  
==== '''<br> '''Expected Results ====
+
The aim is to link the management units with assigned roles and
* A module for manually annotating related entities and responsibilities-services assignment sections in raw text
+
services per unit (Directorates, Divisions & Sections) and codify
* A NER module, with trained models for detecting  Governmental Directions and Divisions in raw text
+
this specific information, which is hidden in the GG issue raw text.
* A module that associates entities with responsibilities and extracts related metadata from the GG issue
+
  
==== Related  repositories ====
+
=== GSOC-2018 repositories ===
https://github.com/arisp8/gazette-analysis
+
https://github.com/eellak/gsoc2018-GG-extraction
  
==== '''<br> '''Knowledge Prerequisites ====
+
=== Student ===
Python, Java, Machine Learning
+
[https://github.com/ckarageorgkaneen Chris Karageorg Kaneen]  
 
+
==== Mentors: [https://www.dit.hua.gr/~varlamis/ Iraklis Varlamis], [https://users.ionio.gr/~sarantos/en.html Sarantos Kapidakis], [http://thalassa.ionio.gr/staff/moschopoulos/cv.pdf Dionysios Moschopoulos] [http://www.karounos.gr/blog/bio Theodoros Karounos]  ====
+
  
 +
=== Mentors ===
 +
[https://www.dit.hua.gr/~varlamis/ Iraklis Varlamis], [https://users.ionio.gr/~sarantos/en.html Sarantos Kapidakis], [http://thalassa.ionio.gr/staff/moschopoulos/cv.pdf Dionysios Moschopoulos] [http://www.karounos.gr/blog/bio Theodoros Karounos]
 +
  
 
== Epoptes ==
 
== Epoptes ==
  
==== Brief Explanation: ====
+
=== Description ===
'''Epoptes ('''Επόπτης  - a Greek word for overseer) is an open source computer lab management and monitoring tool. It allows for screen broadcasting  
+
Epoptes (Επόπτης  a Greek word for overseer) is an open source computer lab management and monitoring tool. It allows for screen broadcasting and monitoring, remote command execution, message sending, imposing restrictions like screen locking or sound muting the clients and much more! It can be installed in Ubuntu, Debian and openSUSE based labs that may contain any combination of the following: LTSP servers, thin and fat clients, non LTSP servers, standalone workstations, NX or XDMCP clients etc.
and monitoring, remote command execution, message sending, imposing restrictions like screen locking or sound muting the clients and much more! It can be installed in Ubuntu, Debian and openSUSE based labs that may contain ''any'' combination of the following: LTSP servers, thin and fat clients, non LTSP servers, standalone workstations, NX or XDMCP clients etc.                                                  
+
  
==== Related GitHub repositories ====
+
Epoptes has been undermaintained for the last couple of years. It's currently powered by Python 2 and GTK 2, while unfortunately a number of bugs have crept in due to major updates in Linux distribution packages (systemd, consolekit, VNC…).
https://www.github.com/Epoptes/epoptes
+
  
==== Expected Results ====
+
This project aims at reviving Epoptes with Python 3 and GTK 3 support, while also addressing several outstanding issues.
Rewrite Epoptes with Python 3 support  
+
  
Gtk3 with GObject Introspection instead of pygtk2
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-epoptes
  
Improvements in the code structure ( Break existing code into python modules/packages)
+
=== Student ===
 
+
[https://github.com/alkisg Alkis Georgopoulos]
==== Knowledge Prerequisites ====
+
Python
+
 
+
GTK
+
 
+
==== Mentors:  [https://github.com/ftsamis Fotis Tsiamis], [http://cde.athabascau.ca/ourpeople/instructors/tsinakos.php Avgoustos Tsinakos] ====
+
  
 +
=== Mentors ===
 +
[https://github.com/ftsamis Fotis Tsiamis], [http://cde.athabascau.ca/ourpeople/instructors/tsinakos.php Avgoustos Tsinakos]
 +
 +
 
== Government Gazette text mining, cross linking, and codification ==
 
== Government Gazette text mining, cross linking, and codification ==
  
==== Brief Explanation ====
+
=== Description ===
The objective of this project is to extend existing Government Gazette text mining code to cross-link legal texts and detect the ministers that sign them. For this the text PDFs need to be downloaded and converted into text. Then, heuristic rules must be applied to detect references to other legal texts, which will be converted into hypertext form. Similar techniques will be used to detect the competent ministers. Two possible extensions are proposed. First, detect amendments incorporated within another law. Second, implement a prototype for editing a law in its codified form (e.g. on GitHub) and automatically creating from the changes the text to be legislated (the differences from the original law).
+
In the recent years plenty of attention has been gathering around analyzing public sector texts via text mining methods enabled by modern libraries, algorithms and practices and bought to to the forefront by open source projects such as textblob, spaCy, SciPy, Tensorflow and NLTK. These collaborative productive efforts seem to be a shift towards more efficient understanding of natural language by machines which can be used in conjunction with public documents in order to provide a more robust organization and codification in the legal sector.
 +
This project aims to extend the existing Government Gazette (GG) text mining code by implementing features in order to organize and cross)-link GG texts with legal texts and detect the signatories via heuristic and machine learning methods. This will enable elimination of bureaucratic processes and huge time savings for jurists who for example seek legal documents in the ISOKRATIS database of legal texts (which is an applicable case study).
  
==== Related GitHub repositories ====
+
=== GSOC-2018 repositories ===
https://github.com/arisp8/gazette-analysis
+
https://github.com/eellak/gsoc2018-3gm
  
==== Expected Results ====
+
=== Student ===
Detection of references to other laws; detection of competent ministers; codified legislation prototype
+
[https://github.com/papachristoumarios Marios Papachristou]
  
==== Knowledge Prerequisites ====
+
=== Mentors ===
Python
+
[https://www.spinellis.gr/ Diomidis Spinellis] [https://github.com/zvr Alexios Zavras]  [https://users.ionio.gr/~sarantos/en.html Sarantos Kapidakis] [http://thalassa.ionio.gr/staff/moschopoulos/cv.pdf Dionysios Moschopoulos]
 
+
==== Mentors: [https://www.spinellis.gr Diomidis Spinellis] [https://github.com/zvr Alexios Zavras]  [https://users.ionio.gr/~sarantos/en.html Sarantos Kapidakis] [http://thalassa.ionio.gr/staff/moschopoulos/cv.pdf Dionysios Moschopoulos] ====
+
  
 
== Libreoffice customization and creation of legal Templates for LibreOffice ==
 
== Libreoffice customization and creation of legal Templates for LibreOffice ==
  
==== Brief Explanation ====
+
=== Description ===
LibreOffice customization in order to achieve a "familiar" look and menus for users that convert from MS Office 2013, and creation of specific templates for the Greek Legal system. The customization and templates should follow the development guidelines at https://wiki.documentfoundation.org/Development/GetInvolved .   
+
A set of modules and templates for LibreOffice Suite that ease the transition from Microsoft Office as well as ready to use templates that automate the creation of Greek Legal Documents. Those templates aim to encounter time consuming tasks by removing the formatting and layout procedures from employee work-flow. Furthermore, an interface to access all those templates will be developed. All steps will be documented during the process and afterwards for future reference and development.
 
+
==== Expected results ====
+
* Development of specific menu customizations through the use of [https://api.libreoffice.org/ Libreoffice Software Development Kit 6.0] in various modules of Libreoffice (eg https://api.libreoffice.org/docs/idl/ref/namespacecom_1_1sun_1_1star_1_1ui.html) 
+
* Design and development of Templates and LibreOffice applications that request/get and fill specific information in the templates through the use of  APIs for the Greek legal system
+
  
Customization and Templates  should be accompanied with detailed documentation and instructions for developers and end users.
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-librecust
  
==== Knowledge Prerequisites<br> ====
+
=== Student ===
* C
+
[https://github.com/arvchristos Christos Arvanitis]
* C++
+
* Java
+
* Python
+
* Bash
+
* Perl
+
* Libreoffice Software Development Kit 6.0
+
  
==== Mentors: [https://github.com/pkst-ellak Kostas Papadimas] [http://www.karounos.gr/blog/bio Theodoros Karounos] [https://www.spinellis.gr Diomidis Spinellis] ====
+
=== Mentors ===
 +
[https://github.com/pkst-ellak Kostas Papadimas] [http://www.karounos.gr/blog/bio Theodoros Karounos] [https://www.spinellis.gr/ Diomidis Spinellis]
 +
  
 
== Software components and IP management ==
 
== Software components and IP management ==
  
More details in the separate page [https://ellak.gr/wiki/index.php?title=Clio Clio].
+
=== Description ===
 +
Clio is a web based system for maintaining (meta-)information on software components.
  
==== Brief Explanation ====
+
Nowadays every piece of software is including and using many other software components, each one coming with their own license.  
 
+
A web-based system to manage data on software components and their relations.
+
 
+
Nowadays every piece of software is including and using many other software components, each one coming with their own license.
+
  
 
The goal of this project is to build a simple web system to be able to (manually) input and maintain this information!
 
The goal of this project is to build a simple web system to be able to (manually) input and maintain this information!
Line 117: Line 107:
 
This is a brand-new project; some analysis has been done but no code is available yet.
 
This is a brand-new project; some analysis has been done but no code is available yet.
  
==== Expected Results ====
+
More details in the separate page [https://ellak.gr/wiki/index.php?title=Clio Clio].
  
A complete web-based system to manage the above-mentioned data.
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-clio
  
==== Knowledge Prerequisites ====
+
=== Student ===
 +
[https://github.com/gopuvenkat Gopalakrishnan.V]
  
Web (any technology welcome)
+
=== Mentors ===
 
+
[https://github.com/zvr Alexios Zavras], Georgia Kapitsaki
 
+
==== Mentors: [https://github.com/zvr Alexios Zavras] Georgia Kapitsaki ====
+
  
 
== WSO2 Identity Server Userstore using Web Services  to get claims ==
 
== WSO2 Identity Server Userstore using Web Services  to get claims ==
  
==== Brief Explanation ====
+
=== Description ===
 +
WSO2 Identity and Access Management Server is open source popular identity and access management server throughout the world, plus WSO2 Identity Server efficiently undertakes the complex task of identity management across enterprise applications, services, and APIs.
  
WSO2 Identity Server provides secure identity management for enterprise web applications, services, and APIs by managing identity and entitlements of the users securely and efficiently. The Identity Server enables enterprise architects and developers to reduce identity provisioning time, guarantee secure online interactions, and deliver a reduced single sign-on environment. WSO2 Identity Server is fully open source and is released under Apache Software License Version 2.0.  
+
This project is based on the WSO2 Identity server version 5.4. Currently, the WSO2 identity server is consisting of SOAP services and in the near future, there will be REST API's which support for all functionalities and which is more effective. In current environment most It supports for different user stores like LDAP, JDBC, and MySQL as primary and secondary user stores.
  
The aim of this project is to create a new type of userstore where credentials will be separeted from attirbutes and attributes (claims) will be able to be configured from the web UI as a SOAP or REST web service. The end-user should be able to  
+
WSO2 Identity server allows configuring multiple user stores to the system that are used to store users and roles. AS there are 2 types of user stores as a primary user store  (mandatory) and secondary user store (optional). And all the user information is peristing on a single user store in this version. From this implementation it will separate as credential userstore and attribute user store. Attribute user store is simply used to store claims details which can be accessed by providing the user credential and secrete.With the having facility of creating a new user store the primary data which are saved to primary user store can be separated to different user stores as one for user details and other one is for user attribute (claims) details which can be accessed by providing user credentials and
* configure credentials for LDAP or JDBC
+
<nowiki> </nowiki>secrete.
* configure web service authentication
+
* configure claims to consume the above web service
+
'''Expected Results'''
+
  
A new userstore where end-user can configure using existing web interface, user claims through web services  client. The appropriate changes in the source code should be uploaded in the upstream branch of the latest version (5.4.0)
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-wso2
  
==== Related GitHub repository ====
+
=== Student ===
https://wso2.github.io/
+
[https://github.com/isuri97 Isuri Anuradha]
  
https://wso2.github.io/using-maven.html
+
=== Mentors ===
 
+
[https://www.linkedin.com/in/kranidiotis/ Panagiotis Kranidiotis] [http://www.csd.auth.gr/en/staff/faculty?view=user&ro=1&id=14 Stamelos Ioannis]
https://wso2.github.io/github-repositories.html#IS
+
 
+
https://wso2.github.io/github-repositories.html
+
 
+
==== Knowledge Prerequisites ====
+
 
+
* Java JSP
+
* JSTL
+
* Maven
+
* OSGI Framework
+
* A modern development framework for interactive web content
+
 
+
==== Mentors: [https://www.linkedin.com/in/kranidiotis/ Panagiotis Kranidiotis] [http://www.csd.auth.gr/en/staff/faculty?view=user&ro=1&id=14 Stamelos Ioannis] ====
+
  
 
== Python PenTest Library (PyPen) ==
 
== Python PenTest Library (PyPen) ==
 +
A collection of tools supporting penetration testers.
  
A collection of tools supporting penetration testers
+
=== Description ===
 
+
==== Brief Explanation ====
+
 
+
 
Development of a Python library for penetration testers. The library will include a set of tools for performing the basic tasks for attacking a remote host. It will include reconnaissance tools such as modules that will be able to collect data for a specific target either through the web or through user input. Moreover, other tools will be developed to create custom dictionaries for username and password attacks. Other attack techniques that will be supported include DoS attack, BruteForce attack as well as Inclusion attack. The library will also include various statistical functions for extracting additional information from a captured host.
 
Development of a Python library for penetration testers. The library will include a set of tools for performing the basic tasks for attacking a remote host. It will include reconnaissance tools such as modules that will be able to collect data for a specific target either through the web or through user input. Moreover, other tools will be developed to create custom dictionaries for username and password attacks. Other attack techniques that will be supported include DoS attack, BruteForce attack as well as Inclusion attack. The library will also include various statistical functions for extracting additional information from a captured host.
  
==== Related GitHub repositories ====
+
=== GSOC-2018 repositories ===
 
+
https://github.com/eellak/gsoc2018-pypen
https://github.com/jmortega/python-pentesting
+
 
+
==== Expected Results ====
+
 
+
Development of an independent Python library which will also integrate other existing and well consolidated tools such as CUPP (already in Kali Linux) for assisting in penetration testing.
+
 
+
Proposed tools
+
 
+
A. User Reconnaissance & Information gathering
+
 
+
Α.1/ PyFBSniff: Facebook scraper
+
 
+
Α.2/ PyGenUser: Username list creation
+
 
+
Α.3/ PyDic: Dictionary creation
+
 
+
Future extensions will include tools similar to PyFBSniff for other social media such as Twitter and Google+.
+
 
+
 
+
B. Target System Reconnaissance & Information gathering
+
 
+
A collection of supportive tools gathering and presenting information about the Operating System and its processes.
+
 
+
 
+
Β.1/ PyPScanner: Port Scanner
+
 
+
Β.2/ PyPidStat: Process statistics creation
+
 
+
Β.3/ PySocketStat: Socket statistics creation
+
 
+
Β.4/ PyPipeStat: Pipe statistics creation
+
 
+
Β.5/ PyFileStat: File statistics creation
+
 
+
 
+
C. Attack PenTest tools
+
 
+
C.1/ PyDoS : DoS attack by flooding
+
 
+
C.2/ PyBruftp: Bruteforce attack to ftp server
+
 
+
C.3/ PyRansom: Ransomware script
+
 
+
The library will be expandable in order to incorporate more tools in the future.
+
 
+
 
+
==== Knowledge Prerequisites ====
+
 
+
Python fluency
+
 
+
OS basics
+
 
+
Networking basics
+
 
+
PenTest basics
+
  
 +
=== Student ===
 +
[https://github.com/stikos Konstantinos Liosis]
  
==== Mentors [https://www.researchgate.net/profile/Antonios_Andreatos Antonios Andreatos], [https://www.linkedin.com/in/panagiotis-karampelas-5868002/ Panagiotis Karampelas], [http://www.cslab.ece.ntua.gr/~pavlatos/ Christos Pavlatos] ====
+
=== Mentors ===
 +
[https://www.researchgate.net/profile/Antonios_Andreatos Antonios Andreatos], [https://www.linkedin.com/in/panagiotis-karampelas-5868002/ Panagiotis Karampelas], [http://www.cslab.ece.ntua.gr/~pavlatos/ Christos Pavlatos]
 +
  
 
== Addition of Greek glyphs in the Open Source Fonts ArimaMadurai ==
 
== Addition of Greek glyphs in the Open Source Fonts ArimaMadurai ==
  
==== Brief Explanation ====
+
=== Description ===
Many of the Open Source fonts (e.g., available at https://fonts.google.com), do not include glyphs for Greek letters and are therefore useless for using in a Greek environment.
+
This project aims to extend the collection of fonts supporting Greek script in the Google Fonts Catalog. Indeed, today 19 serif fonts, 6 monospace fonts and 10 sans-serif fonts supporting Greek script are available. Moreover, only 2 fonts are explicitly intended for display text.
  
The aim of this project is to imporve this situation and add the missing glyphs in the correct Unicode codepoints. The exact set of fonts to be completed will be determined in discussions between the student and the mentor(s).
+
Arima Madurai is a font created by Natanael Gana and Joana Correia of NDISCOVER — a Portuguese type foundry. It is a multiscripts display font with 8 weights from thin to black and have a strong calligraphic influence. It has a lot of personality so it can be recognisable in headlines or brand names uses. I value the quality of the design and thanks to its low contrasts, it allows a good legibility and rendering on screen.
  
 +
Regarding the history of Greek script, it is interesting and challenging to design a typeface with a calligraphic feel: in terms of design but also in terms of study. There are remarkable examples of Greek punch cutting from the most talented historical figures. The challenge will be to respect that history while keeping a well anchored contemporary form.
  
==== Expected Results ====
+
Arima Madurai already supports Tamil, Malayalam and Latin scripts and I would like to add Greek script to the glyphset. The fact that the font already supports multi scripts is a real benefit to the project: Arima Madurai already acts in non latin typographic environment and therefore displays a large set of shapes that can be used to match the Greek glyphs with the other ones.
Full support for Greek text in a number of Open Source fonts.
+
  
==== Knowledge Prerequisites ====
+
=== GSOC-2018 repositories ===
Type design, font technologies. Please note that this is a special project, where coding, in the traditional sense, will not be enough.
+
https://github.com/eellak/gsoc2018-arimamadurai
  
==== Mentors: [https://github.com/zvr Alexios Zavras], [https://github.com/irenevl Irene Vlachou] [https://github.com/thynem Εmilios Τheofanous] ====
+
=== Student ===
 +
[https://github.com/RosaWagner Rosalie Wagner]
  
 +
=== Mentors ===
 +
[https://github.com/zvr Alexios Zavras], [https://github.com/irenevl Irene Vlachou] [https://github.com/thynem Εmilios Τheofanous]
 +
  
== Addition of Greek glyphs in the Open Source Fonts WorkSans ==
+
== Addition of Greek glyphs in the Open Source Fonts Cantarell ==
  
==== Brief Explanation ====
+
=== Description ===
Many of the Open Source fonts (e.g., available at https://fonts.google.com), do not include glyphs for Greek letters and are therefore useless for using in a Greek environment.
+
Cantarell is a humanist sans serif typeface optimized for on-screen reading. It was originally developed by Dave Crossland in the MA Typeface Design class of 2009 at the University of Reading using free software. Subsequently, it was licensed under an SIL Open Font License and has been the standard UI typeface for the open-source desktop environment GNOME since version 3.0 in 2010.
  
The aim of this project is to imporve this situation and add the missing glyphs in the correct Unicode codepoints. The exact set of fonts to be completed will be determined in discussions between the student and the mentor(s).
+
The fonts have been redesigned for the release of GNOME 3.28 in March 2018. Post-script outline quality improved significantly, spacing has been reworked and new weights have been added.
  
 +
The family is currently growing to support additional writing systems. After initially applying with extending another typeface I was invited to change my project and add Monotonic and Polytonic Greek to the three Roman masters of Cantarell during GSoC 2018.
  
==== Expected Results ====
+
=== GSOC-2018 repositories ===
Full support for Greek text in a number of Open Source fonts.
+
https://github.com/eellak/gsoc2018-cantarell
  
==== Knowledge Prerequisites ====
+
=== Student ===
Type design, font technologies. Please note that this is a special project, where coding, in the traditional sense, will not be enough.
+
[ https://github.com/grautesk Florian Fecher]
  
==== Mentors: [https://github.com/zvr Alexios Zavras], [https://github.com/irenevl Irene Vlachou] [https://github.com/thynem Εmilios Τheofanous] ====
+
=== Mentors ===
 +
[https://github.com/zvr Alexios Zavras], [https://github.com/irenevl Irene Vlachou] [https://github.com/thynem Εmilios Τheofanous]
  
[[Κατηγορία:GSOC2018]]
+
[[Κατηγορία:Google Summer of Code 2018]]

Revision as of 10:00, 13 July 2018

Adding Greek language on NLP library Spacy.io

Description

We live in the era of data. Every minute, 3.8 billion internet users, produce content; more than 120 million emails , 500.000 Facebook comments, 3 million Google searches. If we want to process that amount of data efficiently, we need to process natural language. Open source projects such as spaCy, textblob, or NLTK contribute signifficantly to that direction and thus they need to be reinforced.

This project is about improving the quality of Natural Language Processing of Greek Language. The first step is to integrate Greek Language to spaCy. During that process, innovative approaches will be used. It is of vital importance for the writer and for the mentors of the program to identify which of them are of practical use for spaCy and to share the results in order to support any other open source enthusiast who is interested. In the fortunate scenario of successful integration of Greek Language to spaCy, the greek model will be trained and used for extraction of valuable information such as emotions detection in Greek texts, entity extraction, etc.

This projects aims to achieve the following goals:

1. Integration of Greek language to spaCy.io platform

2. Natural Language Processing of Greek documents in order to extract valuable information such as named entities, sentiment analysis, tags, etc.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-spacy

Student

Ioannis Daras

Mentors

Markos Gogoulos, Panos Louridas


Extraction of Responsibilities per unit in public sector organizations from the Government Gazette

Description

The objective of this project is to extend existing Government Gazette (GG) text mining code with Named Entity Recognition features that will allow the identification of Government Directorates and Divisions with the responsibilities assigned to them, the types of services they are required to provide according to their legal framework published in http://www.et.gr/ and the extraction of this information with related metadata (decision number, date of the GG issue).

The aim is to link the management units with assigned roles and services per unit (Directorates, Divisions & Sections) and codify this specific information, which is hidden in the GG issue raw text.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-GG-extraction

Student

Chris Karageorg Kaneen

Mentors

Iraklis Varlamis, Sarantos Kapidakis, Dionysios Moschopoulos Theodoros Karounos


Epoptes

Description

Epoptes (Επόπτης a Greek word for overseer) is an open source computer lab management and monitoring tool. It allows for screen broadcasting and monitoring, remote command execution, message sending, imposing restrictions like screen locking or sound muting the clients and much more! It can be installed in Ubuntu, Debian and openSUSE based labs that may contain any combination of the following: LTSP servers, thin and fat clients, non LTSP servers, standalone workstations, NX or XDMCP clients etc.

Epoptes has been undermaintained for the last couple of years. It's currently powered by Python 2 and GTK 2, while unfortunately a number of bugs have crept in due to major updates in Linux distribution packages (systemd, consolekit, VNC…).

This project aims at reviving Epoptes with Python 3 and GTK 3 support, while also addressing several outstanding issues.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-epoptes

Student

Alkis Georgopoulos

Mentors

Fotis Tsiamis, Avgoustos Tsinakos


Government Gazette text mining, cross linking, and codification

Description

In the recent years plenty of attention has been gathering around analyzing public sector texts via text mining methods enabled by modern libraries, algorithms and practices and bought to to the forefront by open source projects such as textblob, spaCy, SciPy, Tensorflow and NLTK. These collaborative productive efforts seem to be a shift towards more efficient understanding of natural language by machines which can be used in conjunction with public documents in order to provide a more robust organization and codification in the legal sector. This project aims to extend the existing Government Gazette (GG) text mining code by implementing features in order to organize and cross)-link GG texts with legal texts and detect the signatories via heuristic and machine learning methods. This will enable elimination of bureaucratic processes and huge time savings for jurists who for example seek legal documents in the ISOKRATIS database of legal texts (which is an applicable case study).

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-3gm

Student

Marios Papachristou

Mentors

Diomidis Spinellis Alexios Zavras Sarantos Kapidakis Dionysios Moschopoulos


Libreoffice customization and creation of legal Templates for LibreOffice

Description

A set of modules and templates for LibreOffice Suite that ease the transition from Microsoft Office as well as ready to use templates that automate the creation of Greek Legal Documents. Those templates aim to encounter time consuming tasks by removing the formatting and layout procedures from employee work-flow. Furthermore, an interface to access all those templates will be developed. All steps will be documented during the process and afterwards for future reference and development.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-librecust

Student

Christos Arvanitis

Mentors

Kostas Papadimas Theodoros Karounos Diomidis Spinellis


Software components and IP management

Description

Clio is a web based system for maintaining (meta-)information on software components.

Nowadays every piece of software is including and using many other software components, each one coming with their own license.

The goal of this project is to build a simple web system to be able to (manually) input and maintain this information!

This is a brand-new project; some analysis has been done but no code is available yet.

More details in the separate page Clio.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-clio

Student

Gopalakrishnan.V

Mentors

Alexios Zavras, Georgia Kapitsaki


WSO2 Identity Server Userstore using Web Services to get claims

Description

WSO2 Identity and Access Management Server is open source popular identity and access management server throughout the world, plus WSO2 Identity Server efficiently undertakes the complex task of identity management across enterprise applications, services, and APIs.

This project is based on the WSO2 Identity server version 5.4. Currently, the WSO2 identity server is consisting of SOAP services and in the near future, there will be REST API's which support for all functionalities and which is more effective. In current environment most It supports for different user stores like LDAP, JDBC, and MySQL as primary and secondary user stores.

WSO2 Identity server allows configuring multiple user stores to the system that are used to store users and roles. AS there are 2 types of user stores as a primary user store (mandatory) and secondary user store (optional). And all the user information is peristing on a single user store in this version. From this implementation it will separate as credential userstore and attribute user store. Attribute user store is simply used to store claims details which can be accessed by providing the user credential and secrete.With the having facility of creating a new user store the primary data which are saved to primary user store can be separated to different user stores as one for user details and other one is for user attribute (claims) details which can be accessed by providing user credentials and secrete.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-wso2

Student

Isuri Anuradha

Mentors

Panagiotis Kranidiotis Stamelos Ioannis


Python PenTest Library (PyPen)

A collection of tools supporting penetration testers.

Description

Development of a Python library for penetration testers. The library will include a set of tools for performing the basic tasks for attacking a remote host. It will include reconnaissance tools such as modules that will be able to collect data for a specific target either through the web or through user input. Moreover, other tools will be developed to create custom dictionaries for username and password attacks. Other attack techniques that will be supported include DoS attack, BruteForce attack as well as Inclusion attack. The library will also include various statistical functions for extracting additional information from a captured host.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-pypen

Student

Konstantinos Liosis

Mentors

Antonios Andreatos, Panagiotis Karampelas, Christos Pavlatos


Addition of Greek glyphs in the Open Source Fonts ArimaMadurai

Description

This project aims to extend the collection of fonts supporting Greek script in the Google Fonts Catalog. Indeed, today 19 serif fonts, 6 monospace fonts and 10 sans-serif fonts supporting Greek script are available. Moreover, only 2 fonts are explicitly intended for display text.

Arima Madurai is a font created by Natanael Gana and Joana Correia of NDISCOVER — a Portuguese type foundry. It is a multiscripts display font with 8 weights from thin to black and have a strong calligraphic influence. It has a lot of personality so it can be recognisable in headlines or brand names uses. I value the quality of the design and thanks to its low contrasts, it allows a good legibility and rendering on screen.

Regarding the history of Greek script, it is interesting and challenging to design a typeface with a calligraphic feel: in terms of design but also in terms of study. There are remarkable examples of Greek punch cutting from the most talented historical figures. The challenge will be to respect that history while keeping a well anchored contemporary form.

Arima Madurai already supports Tamil, Malayalam and Latin scripts and I would like to add Greek script to the glyphset. The fact that the font already supports multi scripts is a real benefit to the project: Arima Madurai already acts in non latin typographic environment and therefore displays a large set of shapes that can be used to match the Greek glyphs with the other ones.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-arimamadurai

Student

Rosalie Wagner

Mentors

Alexios Zavras, Irene Vlachou Εmilios Τheofanous


Addition of Greek glyphs in the Open Source Fonts Cantarell

Description

Cantarell is a humanist sans serif typeface optimized for on-screen reading. It was originally developed by Dave Crossland in the MA Typeface Design class of 2009 at the University of Reading using free software. Subsequently, it was licensed under an SIL Open Font License and has been the standard UI typeface for the open-source desktop environment GNOME since version 3.0 in 2010.

The fonts have been redesigned for the release of GNOME 3.28 in March 2018. Post-script outline quality improved significantly, spacing has been reworked and new weights have been added.

The family is currently growing to support additional writing systems. After initially applying with extending another typeface I was invited to change my project and add Monotonic and Polytonic Greek to the three Roman masters of Cantarell during GSoC 2018.

GSOC-2018 repositories

https://github.com/eellak/gsoc2018-cantarell

Student

[ https://github.com/grautesk Florian Fecher]

Mentors

Alexios Zavras, Irene Vlachou Εmilios Τheofanous