Difference between revisions of "Google Summer of Code 2018 Accepted projects"

From Ελεύθερο Λογισμικό / Λογισμικό ανοιχτού κώδικα
Jump to navigation Jump to search
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
== Adding Greek language on NLP library Spacy.io ==
  
 +
=== Description ===
 +
We live in the era of data. Every minute, 3.8 billion internet users, produce content; more than 120 million emails , 500.000 Facebook comments, 3 million Google searches. If we want to process that amount of data efficiently, we need to process natural language. Open source projects such as spaCy, textblob, or NLTK contribute signifficantly to that direction and thus they need to be reinforced.
  
== Adding Greek language on NLP library Spacy.io ==
+
This project is about improving the quality of Natural Language Processing of Greek Language. The first step is to integrate Greek Language to spaCy. During that process, innovative approaches will be used. It is of vital importance for the writer and for the mentors of the program to identify which of them are of practical use for spaCy and to share the results in order to support any other open source enthusiast who is interested. In the fortunate scenario of successful integration of Greek Language to spaCy, the greek model will be trained and used for extraction of valuable information such as emotions detection in Greek texts, entity extraction, etc.
  
=== Brief Explanation: ===
+
This projects aims to achieve the following goals:
Spacy is an open-source Python library for advanced Natural Language Processing. It's a very powerful and modern tool for applying NLP to real world problems. Among other functionality it provides Named Entity Recognition, deep learning integration, part-of-speech tagging and includes built in visualizers for syntax and NER. Spacy supports more than 25 languages but not Greek. Adding the Greek language will provide massive improvements on applying NLP on the Greek language, and allow for actions as Named entity recognition and Part-of-speech tagging
 
  
The procedure is well specified on https://spacy.io/usage/adding-languages, custom language data (stop words, tokenizer exceptions, punctuation rules etc) need to be added and tested.
+
1. Integration of Greek language to spaCy.io platform
  
=== Expected Results ===
+
2. Natural Language Processing of Greek documents in order to extract valuable information such as named entities, sentiment analysis, tags, etc.
The vocabulary, syntax, entities and word vectors for the Greek language. These will be produced with Spacy/gensim, after the language information is successfully added.
 
  
The Greek language model with then be added to Spacy.io for usage as a supported language model.
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-spacy
  
As a real world scenario in order to test the language model, analysis on a large number of Official Greek Government's Gazette (FEK-ΦΕΚ) is proposed, in order to extract entities and categorize these documents.
+
=== Student ===
 +
[https://github.com/giannisdaras Ioannis Daras]
  
=== Related  repositories ===
+
=== Mentors ===
https://github.com/explosion/spaCy
+
[https://github.com/mgogoulos Markos Gogoulos],  [https://github.com/louridas Panos Louridas]
  
=== Knowledge Prerequisites ===
 
Strong knowledge of the Greek language, Python language fluency and Regular Expressions knowledge are necessary for this.
 
  
=== Mentors: [https://github.com/mgogoulos Markos Gogoulos] [https://github.com/louridas Panos Louridas] ===
+
   
  
 
== Extraction of Responsibilities per unit in public sector organizations from the Government Gazette ==
 
== Extraction of Responsibilities per unit in public sector organizations from the Government Gazette ==
  
==== Brief Explanation: ====
+
=== Description ===
The objective of this project is to extend existing Government Gazette(GG) text mining code with Named Entity Recognition features that will allow the identification of Government Directorates and Divisions with the responsibilities assigned to them and the types of services they are required to provide according with their legal framework published in http://www.et.gr/ and the extraction of this information with related metadata (decision number, date of the GG issue). The aim is to link the management units with assigned roles and services per unit(Directorates, Divisions & Sections) and codify this specific information, which is hidden in the GG issue raw text. For this, the PDFs must be downloaded, converted into text and cleaned. Then, syntactic-based heuristics and/or machine learning techniques must be applied to identify specific Named Entities types with references to assigned responsibilities-services per unit(Directorates, Divisions & Sections) and links between the two must be extracted. Metadata concerning the GG issue and decision and/or law number will be also associated with the extracted association. The produced associations will be extracted in a machine usable/structured format (e.g. as RDF triples).
+
The objective of this project is to extend existing  
 
+
Government Gazette (GG) text mining code with Named Entity Recognition  
==== '''<br> '''Expected Results ====
+
features that will allow the identification of Government Directorates  
* A module for manually annotating related entities and responsibilities-services assignment sections in raw text
+
and Divisions with the responsibilities assigned to them, the types of  
* A NER module, with trained models for detecting  Governmental Directions and Divisions in raw text
+
services they are required to provide according to their legal framework
* A module that associates entities with responsibilities and extracts related metadata from the GG issue
+
<nowiki> published in http://www.et.gr/</nowiki> and the extraction of this information with related metadata (decision number, date of the GG issue).
  
==== Related  repositories ====
+
The aim is to link the management units with assigned roles and
https://github.com/arisp8/gazette-analysis
+
services per unit (Directorates, Divisions & Sections) and codify
 +
this specific information, which is hidden in the GG issue raw text.
  
==== '''<br> '''Knowledge Prerequisites ====
+
=== GSOC-2018 repositories ===
Python, Java, Machine Learning
+
https://github.com/eellak/gsoc2018-GG-extraction
  
==== Mentors: [https://www.dit.hua.gr/~varlamis/ Iraklis Varlamis], [https://users.ionio.gr/~sarantos/en.html Sarantos Kapidakis], [http://thalassa.ionio.gr/staff/moschopoulos/cv.pdf Dionysios Moschopoulos] [http://www.karounos.gr/blog/bio Theodoros Karounos]  ====
+
=== Student ===
 +
[https://github.com/ckarageorgkaneen Chris Karageorg Kaneen]  
  
 +
=== Mentors ===
 +
[https://www.dit.hua.gr/~varlamis/ Iraklis Varlamis], [https://users.ionio.gr/~sarantos/en.html Sarantos Kapidakis], [http://thalassa.ionio.gr/staff/moschopoulos/cv.pdf Dionysios Moschopoulos] [http://www.karounos.gr/blog/bio Theodoros Karounos]
 +
  
 
== Epoptes ==
 
== Epoptes ==
  
==== Brief Explanation: ====
+
=== Description ===
'''Epoptes ('''Επόπτης  - a Greek word for overseer) is an open source computer lab management and monitoring tool. It allows for screen broadcasting  
+
Epoptes (Επόπτης  a Greek word for overseer) is an open source computer lab management and monitoring tool. It allows for screen broadcasting and monitoring, remote command execution, message sending, imposing restrictions like screen locking or sound muting the clients and much more! It can be installed in Ubuntu, Debian and openSUSE based labs that may contain any combination of the following: LTSP servers, thin and fat clients, non LTSP servers, standalone workstations, NX or XDMCP clients etc.
and monitoring, remote command execution, message sending, imposing restrictions like screen locking or sound muting the clients and much more! It can be installed in Ubuntu, Debian and openSUSE based labs that may contain ''any'' combination of the following: LTSP servers, thin and fat clients, non LTSP servers, standalone workstations, NX or XDMCP clients etc.                                                  
 
  
==== Related GitHub repositories ====
+
Epoptes has been undermaintained for the last couple of years. It's currently powered by Python 2 and GTK 2, while unfortunately a number of bugs have crept in due to major updates in Linux distribution packages (systemd, consolekit, VNC…).
https://www.github.com/Epoptes/epoptes
 
  
==== Expected Results ====
+
This project aims at reviving Epoptes with Python 3 and GTK 3 support, while also addressing several outstanding issues.
Rewrite Epoptes with Python 3 support  
 
  
Gtk3 with GObject Introspection instead of pygtk2
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-epoptes
  
Improvements in the code structure ( Break existing code into python modules/packages)
+
=== Student ===
 
+
[https://github.com/alkisg Alkis Georgopoulos]
==== Knowledge Prerequisites ====
 
Python
 
 
 
GTK
 
 
 
==== Mentors:  [https://github.com/ftsamis Fotis Tsiamis], [http://cde.athabascau.ca/ourpeople/instructors/tsinakos.php Avgoustos Tsinakos] ====
 
  
 +
=== Mentors ===
 +
[https://github.com/ftsamis Fotis Tsiamis], [http://cde.athabascau.ca/ourpeople/instructors/tsinakos.php Avgoustos Tsinakos]
 +
 +
 
== Government Gazette text mining, cross linking, and codification ==
 
== Government Gazette text mining, cross linking, and codification ==
  
==== Brief Explanation ====
+
=== Description ===
The objective of this project is to extend existing Government Gazette text mining code to cross-link legal texts and detect the ministers that sign them. For this the text PDFs need to be downloaded and converted into text. Then, heuristic rules must be applied to detect references to other legal texts, which will be converted into hypertext form. Similar techniques will be used to detect the competent ministers. Two possible extensions are proposed. First, detect amendments incorporated within another law. Second, implement a prototype for editing a law in its codified form (e.g. on GitHub) and automatically creating from the changes the text to be legislated (the differences from the original law).
+
In the recent years plenty of attention has been gathering around analyzing public sector texts via text mining methods enabled by modern libraries, algorithms and practices and bought to to the forefront by open source projects such as textblob, spaCy, SciPy, Tensorflow and NLTK. These collaborative productive efforts seem to be a shift towards more efficient understanding of natural language by machines which can be used in conjunction with public documents in order to provide a more robust organization and codification in the legal sector.
 +
This project aims to extend the existing Government Gazette (GG) text mining code by implementing features in order to organize and cross)-link GG texts with legal texts and detect the signatories via heuristic and machine learning methods. This will enable elimination of bureaucratic processes and huge time savings for jurists who for example seek legal documents in the ISOKRATIS database of legal texts (which is an applicable case study).
  
==== Related GitHub repositories ====
+
=== GSOC-2018 repositories ===
https://github.com/arisp8/gazette-analysis
+
https://github.com/eellak/gsoc2018-3gm
  
==== Expected Results ====
+
=== Student ===
Detection of references to other laws; detection of competent ministers; codified legislation prototype
+
[https://github.com/papachristoumarios Marios Papachristou]
  
==== Knowledge Prerequisites ====
+
=== Mentors ===
Python
+
[https://www.spinellis.gr/ Diomidis Spinellis] [https://github.com/zvr Alexios Zavras]  [https://users.ionio.gr/~sarantos/en.html Sarantos Kapidakis] [http://thalassa.ionio.gr/staff/moschopoulos/cv.pdf Dionysios Moschopoulos]
 
+
==== Mentors: [https://www.spinellis.gr Diomidis Spinellis] [https://github.com/zvr Alexios Zavras]  [https://users.ionio.gr/~sarantos/en.html Sarantos Kapidakis] [http://thalassa.ionio.gr/staff/moschopoulos/cv.pdf Dionysios Moschopoulos] ====
 
  
 
== Libreoffice customization and creation of legal Templates for LibreOffice ==
 
== Libreoffice customization and creation of legal Templates for LibreOffice ==
  
==== Brief Explanation ====
+
=== Description ===
LibreOffice customization in order to achieve a "familiar" look and menus for users that convert from MS Office 2013, and creation of specific templates for the Greek Legal system. The customization and templates should follow the development guidelines at https://wiki.documentfoundation.org/Development/GetInvolved .  
+
A set of modules and templates for LibreOffice Suite that ease the transition from Microsoft Office as well as ready to use templates that automate the creation of Greek Legal Documents. Those templates aim to encounter time consuming tasks by removing the formatting and layout procedures from employee work-flow. Furthermore, an interface to access all those templates will be developed. All steps will be documented during the process and afterwards for future reference and development.
  
==== Expected results ====
+
=== GSOC-2018 repositories ===
* Development of specific menu customizations through the use of [https://api.libreoffice.org/ Libreoffice Software Development Kit 6.0] in various modules of Libreoffice (eg https://api.libreoffice.org/docs/idl/ref/namespacecom_1_1sun_1_1star_1_1ui.html) 
+
https://github.com/eellak/gsoc2018-librecust
* Design and development of Templates and LibreOffice applications that request/get and fill specific information in the templates through the use of  APIs for the Greek legal system
 
  
Customization and Templates  should be accompanied with detailed documentation and instructions for developers and end users.
+
=== Student ===
 +
[https://github.com/arvchristos Christos Arvanitis]
  
==== Knowledge Prerequisites<br> ====
+
=== Mentors ===
* C
+
[https://github.com/pkst-ellak Kostas Papadimas] [http://www.karounos.gr/blog/bio Theodoros Karounos] [https://www.spinellis.gr/ Diomidis Spinellis]
* C++
+
* Java
 
* Python
 
* Bash
 
* Perl
 
* Libreoffice Software Development Kit 6.0
 
 
 
==== Mentors: [https://github.com/pkst-ellak Kostas Papadimas] [http://www.karounos.gr/blog/bio Theodoros Karounos] [https://www.spinellis.gr Diomidis Spinellis] ====
 
  
 
== Software components and IP management ==
 
== Software components and IP management ==
  
More details in the separate page [https://ellak.gr/wiki/index.php?title=Clio Clio].
+
=== Description ===
 
+
Clio is a web based system for maintaining (meta-)information on software components.
==== Brief Explanation ====
 
 
 
A web-based system to manage data on software components and their relations.
 
  
Nowadays every piece of software is including and using many other software components, each one coming with their own license.
+
Nowadays every piece of software is including and using many other software components, each one coming with their own license.  
  
 
The goal of this project is to build a simple web system to be able to (manually) input and maintain this information!
 
The goal of this project is to build a simple web system to be able to (manually) input and maintain this information!
Line 117: Line 109:
 
This is a brand-new project; some analysis has been done but no code is available yet.
 
This is a brand-new project; some analysis has been done but no code is available yet.
  
==== Expected Results ====
+
More details in the separate page [https://ellak.gr/wiki/index.php?title=Clio Clio].
 
 
A complete web-based system to manage the above-mentioned data.
 
 
 
==== Knowledge Prerequisites ====
 
  
Web (any technology welcome)
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-clio
  
 +
=== Student ===
 +
[https://github.com/gopuvenkat Gopalakrishnan.V]
  
==== Mentors: [https://github.com/zvr Alexios Zavras] Georgia Kapitsaki ====
+
=== Mentors ===
 +
[https://github.com/zvr Alexios Zavras], Georgia Kapitsaki
 +
  
 
== WSO2 Identity Server Userstore using Web Services  to get claims ==
 
== WSO2 Identity Server Userstore using Web Services  to get claims ==
  
==== Brief Explanation ====
+
=== Description ===
 
+
WSO2 Identity and Access Management Server is open source popular identity and access management server throughout the world, plus WSO2 Identity Server efficiently undertakes the complex task of identity management across enterprise applications, services, and APIs.
WSO2 Identity Server provides secure identity management for enterprise web applications, services, and APIs by managing identity and entitlements of the users securely and efficiently. The Identity Server enables enterprise architects and developers to reduce identity provisioning time, guarantee secure online interactions, and deliver a reduced single sign-on environment. WSO2 Identity Server is fully open source and is released under Apache Software License Version 2.0.
 
 
 
The aim of this project is to create a new type of userstore where credentials will be separeted from attirbutes and attributes (claims) will be able to be configured from the web UI as a SOAP or REST web service. The end-user should be able to
 
* configure credentials for LDAP or JDBC
 
* configure web service authentication
 
* configure claims to consume the above web service
 
'''Expected Results'''
 
  
A new userstore where end-user can configure using existing web interface, user claims through web services client. The appropriate changes in the source code should be uploaded in the upstream branch of the latest version (5.4.0)
+
This project is based on the WSO2 Identity server version 5.4. Currently, the WSO2 identity server is consisting of SOAP services and in the near future, there will be REST API's which support for all functionalities and which is more effective. In current environment most It supports for different user stores like LDAP, JDBC, and MySQL as primary and secondary user stores.
  
==== Related GitHub repository ====
+
WSO2 Identity server allows configuring multiple user stores to the system that are used to store users and roles. AS there are 2 types of user stores as a primary user store  (mandatory) and secondary user store (optional). And all the user information is peristing on a single user store in this version. From this implementation it will separate as credential userstore and attribute user store. Attribute user store is simply used to store claims details which can be accessed by providing the user credential and secrete.With the having facility of creating a new user store the primary data which are saved to primary user store can be separated to different user stores as one for user details and other one is for user attribute (claims) details which can be accessed by providing user credentials and
https://wso2.github.io/
+
<nowiki> </nowiki>secrete.
  
https://wso2.github.io/using-maven.html
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-wso2
  
https://wso2.github.io/github-repositories.html#IS
+
=== Student ===
 +
[https://github.com/isuri97 Isuri Anuradha]
  
https://wso2.github.io/github-repositories.html
+
=== Mentors ===
 
+
[https://www.linkedin.com/in/kranidiotis/ Panagiotis Kranidiotis] [http://www.csd.auth.gr/en/staff/faculty?view=user&ro=1&id=14 Stamelos Ioannis]
==== Knowledge Prerequisites ====
+
 
 
* Java JSP
 
* JSTL
 
* Maven
 
* OSGI Framework
 
* A modern development framework for interactive web content
 
 
 
==== Mentors: [https://www.linkedin.com/in/kranidiotis/ Panagiotis Kranidiotis] [http://www.csd.auth.gr/en/staff/faculty?view=user&ro=1&id=14 Stamelos Ioannis] ====
 
  
 
== Python PenTest Library (PyPen) ==
 
== Python PenTest Library (PyPen) ==
 +
A collection of tools supporting penetration testers.
  
A collection of tools supporting penetration testers
+
=== Description ===
 
 
==== Brief Explanation ====
 
 
 
 
Development of a Python library for penetration testers. The library will include a set of tools for performing the basic tasks for attacking a remote host. It will include reconnaissance tools such as modules that will be able to collect data for a specific target either through the web or through user input. Moreover, other tools will be developed to create custom dictionaries for username and password attacks. Other attack techniques that will be supported include DoS attack, BruteForce attack as well as Inclusion attack. The library will also include various statistical functions for extracting additional information from a captured host.
 
Development of a Python library for penetration testers. The library will include a set of tools for performing the basic tasks for attacking a remote host. It will include reconnaissance tools such as modules that will be able to collect data for a specific target either through the web or through user input. Moreover, other tools will be developed to create custom dictionaries for username and password attacks. Other attack techniques that will be supported include DoS attack, BruteForce attack as well as Inclusion attack. The library will also include various statistical functions for extracting additional information from a captured host.
  
==== Related GitHub repositories ====
+
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-pypen
  
https://github.com/jmortega/python-pentesting
+
=== Student ===
 +
[https://github.com/stikos Konstantinos Liosis]
  
==== Expected Results ====
+
=== Mentors ===
 +
[https://www.researchgate.net/profile/Antonios_Andreatos Antonios Andreatos], [https://www.linkedin.com/in/panagiotis-karampelas-5868002/ Panagiotis Karampelas], [http://www.cslab.ece.ntua.gr/~pavlatos/ Christos Pavlatos]
 +
  
Development of an independent Python library which will also integrate other existing and well consolidated tools such as CUPP (already in Kali Linux) for assisting in penetration testing.
+
== Addition of Greek glyphs in the Open Source Fonts ArimaMadurai ==
 
 
Proposed tools
 
 
 
A. User Reconnaissance & Information gathering
 
 
 
Α.1/ PyFBSniff: Facebook scraper
 
 
 
Α.2/ PyGenUser: Username list creation
 
 
 
Α.3/ PyDic: Dictionary creation
 
 
 
Future extensions will include tools similar to PyFBSniff for other social media such as Twitter and Google+.
 
 
 
 
 
B. Target System Reconnaissance & Information gathering
 
 
 
A collection of supportive tools gathering and presenting information about the Operating System and its processes.
 
 
 
 
 
Β.1/ PyPScanner: Port Scanner
 
 
 
Β.2/ PyPidStat: Process statistics creation
 
 
 
Β.3/ PySocketStat: Socket statistics creation
 
 
 
Β.4/ PyPipeStat: Pipe statistics creation
 
 
 
Β.5/ PyFileStat: File statistics creation
 
 
 
 
 
C. Attack PenTest tools
 
 
 
C.1/ PyDoS : DoS attack by flooding
 
 
 
C.2/ PyBruftp: Bruteforce attack to ftp server
 
 
 
C.3/ PyRansom: Ransomware script
 
  
The library will be expandable in order to incorporate more tools in the future.
+
=== Description ===
 
+
This project aims to extend the collection of fonts supporting Greek script in the Google Fonts Catalog. Indeed, today 19 serif fonts, 6 monospace fonts and 10 sans-serif fonts supporting Greek script are available. Moreover, only 2 fonts are explicitly intended for display text.
 
 
==== Knowledge Prerequisites ====
 
 
 
Python fluency
 
 
 
OS basics
 
 
 
Networking basics
 
 
 
PenTest basics
 
 
 
 
 
==== Mentors [https://haf.academia.edu/AntoniosAndreatos Antonios Andreatos], [https://www.linkedin.com/in/panagiotis-karampelas-5868002/ Panagiotis Karampelas], [http://www.cslab.ece.ntua.gr/~pavlatos/ Christos Pavlatos] ====
 
 
 
== Addition of Greek glyphs in the Open Source Fonts ArimaMadurai ==
 
  
==== Brief Explanation ====
+
Arima Madurai is a font created by Natanael Gana and Joana Correia of NDISCOVER — a Portuguese type foundry. It is a multiscripts display font with 8 weights from thin to black and have a strong calligraphic influence. It has a lot of personality so it can be recognisable in headlines or brand names uses. I value the quality of the design and thanks to its low contrasts, it allows a good legibility and rendering on screen.
Many of the Open Source fonts (e.g., available at https://fonts.google.com), do not include glyphs for Greek letters and are therefore useless for using in a Greek environment.
 
  
The aim of this project is to imporve this situation and add the missing glyphs in the correct Unicode codepoints. The exact set of fonts to be completed will be determined in discussions between the student and the mentor(s).
+
Regarding the history of Greek script, it is interesting and challenging to design a typeface with a calligraphic feel: in terms of design but also in terms of study. There are remarkable examples of Greek punch cutting from the most talented historical figures. The challenge will be to respect that history while keeping a well anchored contemporary form.
  
 +
Arima Madurai already supports Tamil, Malayalam and Latin scripts and I would like to add Greek script to the glyphset. The fact that the font already supports multi scripts is a real benefit to the project: Arima Madurai already acts in non latin typographic environment and therefore displays a large set of shapes that can be used to match the Greek glyphs with the other ones.
  
==== Expected Results ====
+
=== GSOC-2018 repositories ===
Full support for Greek text in a number of Open Source fonts.
+
https://github.com/eellak/gsoc2018-arimamadurai
  
==== Knowledge Prerequisites ====
+
=== Student ===
Type design, font technologies. Please note that this is a special project, where coding, in the traditional sense, will not be enough.
+
[https://github.com/RosaWagner Rosalie Wagner]
  
==== Mentors: [https://github.com/zvr Alexios Zavras], [https://github.com/irenevl Irene Vlachou] [https://github.com/thynem Εmilios Τheofanous] ====
+
=== Mentors ===
 +
[https://github.com/zvr Alexios Zavras], [https://github.com/irenevl Irene Vlachou] [https://github.com/thynem Εmilios Τheofanous]
 +
  
 +
== Addition of Greek glyphs in the Open Source Fonts Cantarell ==
  
== Addition of Greek glyphs in the Open Source Fonts WorkSans ==
+
=== Description ===
 +
Cantarell is a humanist sans serif typeface optimized for on-screen reading. It was originally developed by Dave Crossland in the MA Typeface Design class of 2009 at the University of Reading using free software. Subsequently, it was licensed under an SIL Open Font License and has been the standard UI typeface for the open-source desktop environment GNOME since version 3.0 in 2010.
  
==== Brief Explanation ====
+
The fonts have been redesigned for the release of GNOME 3.28 in March 2018. Post-script outline quality improved significantly, spacing has been reworked and new weights have been added.
Many of the Open Source fonts (e.g., available at https://fonts.google.com), do not include glyphs for Greek letters and are therefore useless for using in a Greek environment.
 
  
The aim of this project is to imporve this situation and add the missing glyphs in the correct Unicode codepoints. The exact set of fonts to be completed will be determined in discussions between the student and the mentor(s).
+
The family is currently growing to support additional writing systems. After initially applying with extending another typeface I was invited to change my project and add Monotonic and Polytonic Greek to the three Roman masters of Cantarell during GSoC 2018.
  
 +
=== GSOC-2018 repositories ===
 +
https://github.com/eellak/gsoc2018-cantarell
  
==== Expected Results ====
+
=== Student ===
Full support for Greek text in a number of Open Source fonts.
+
[https://github.com/grautesk Florian Fecher]
  
==== Knowledge Prerequisites ====
+
=== Mentors ===
Type design, font technologies. Please note that this is a special project, where coding, in the traditional sense, will not be enough.
+
[https://github.com/zvr Alexios Zavras], [https://github.com/irenevl Irene Vlachou] [https://github.com/thynem Εmilios Τheofanous]
  
==== Mentors: [https://github.com/zvr Alexios Zavras], [https://github.com/irenevl Irene Vlachou] [https://github.com/thynem Εmilios Τheofanous] ====
+
[[Κατηγορία:GSOC2018]] [[Κατηγορία:GSOC]]

Latest revision as of 15:13, 5 February 2020

Adding Greek language on NLP library Spacy.io[edit | edit source]

Description[edit | edit source]

We live in the era of data. Every minute, 3.8 billion internet users, produce content; more than 120 million emails , 500.000 Facebook comments, 3 million Google searches. If we want to process that amount of data efficiently, we need to process natural language. Open source projects such as spaCy, textblob, or NLTK contribute signifficantly to that direction and thus they need to be reinforced.

This project is about improving the quality of Natural Language Processing of Greek Language. The first step is to integrate Greek Language to spaCy. During that process, innovative approaches will be used. It is of vital importance for the writer and for the mentors of the program to identify which of them are of practical use for spaCy and to share the results in order to support any other open source enthusiast who is interested. In the fortunate scenario of successful integration of Greek Language to spaCy, the greek model will be trained and used for extraction of valuable information such as emotions detection in Greek texts, entity extraction, etc.

This projects aims to achieve the following goals:

1. Integration of Greek language to spaCy.io platform

2. Natural Language Processing of Greek documents in order to extract valuable information such as named entities, sentiment analysis, tags, etc.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-spacy

Student[edit | edit source]

Ioannis Daras

Mentors[edit | edit source]

Markos Gogoulos, Panos Louridas



Extraction of Responsibilities per unit in public sector organizations from the Government Gazette[edit | edit source]

Description[edit | edit source]

The objective of this project is to extend existing Government Gazette (GG) text mining code with Named Entity Recognition features that will allow the identification of Government Directorates and Divisions with the responsibilities assigned to them, the types of services they are required to provide according to their legal framework published in http://www.et.gr/ and the extraction of this information with related metadata (decision number, date of the GG issue).

The aim is to link the management units with assigned roles and services per unit (Directorates, Divisions & Sections) and codify this specific information, which is hidden in the GG issue raw text.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-GG-extraction

Student[edit | edit source]

Chris Karageorg Kaneen

Mentors[edit | edit source]

Iraklis Varlamis, Sarantos Kapidakis, Dionysios Moschopoulos Theodoros Karounos


Epoptes[edit | edit source]

Description[edit | edit source]

Epoptes (Επόπτης a Greek word for overseer) is an open source computer lab management and monitoring tool. It allows for screen broadcasting and monitoring, remote command execution, message sending, imposing restrictions like screen locking or sound muting the clients and much more! It can be installed in Ubuntu, Debian and openSUSE based labs that may contain any combination of the following: LTSP servers, thin and fat clients, non LTSP servers, standalone workstations, NX or XDMCP clients etc.

Epoptes has been undermaintained for the last couple of years. It's currently powered by Python 2 and GTK 2, while unfortunately a number of bugs have crept in due to major updates in Linux distribution packages (systemd, consolekit, VNC…).

This project aims at reviving Epoptes with Python 3 and GTK 3 support, while also addressing several outstanding issues.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-epoptes

Student[edit | edit source]

Alkis Georgopoulos

Mentors[edit | edit source]

Fotis Tsiamis, Avgoustos Tsinakos


Government Gazette text mining, cross linking, and codification[edit | edit source]

Description[edit | edit source]

In the recent years plenty of attention has been gathering around analyzing public sector texts via text mining methods enabled by modern libraries, algorithms and practices and bought to to the forefront by open source projects such as textblob, spaCy, SciPy, Tensorflow and NLTK. These collaborative productive efforts seem to be a shift towards more efficient understanding of natural language by machines which can be used in conjunction with public documents in order to provide a more robust organization and codification in the legal sector. This project aims to extend the existing Government Gazette (GG) text mining code by implementing features in order to organize and cross)-link GG texts with legal texts and detect the signatories via heuristic and machine learning methods. This will enable elimination of bureaucratic processes and huge time savings for jurists who for example seek legal documents in the ISOKRATIS database of legal texts (which is an applicable case study).

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-3gm

Student[edit | edit source]

Marios Papachristou

Mentors[edit | edit source]

Diomidis Spinellis Alexios Zavras Sarantos Kapidakis Dionysios Moschopoulos


Libreoffice customization and creation of legal Templates for LibreOffice[edit | edit source]

Description[edit | edit source]

A set of modules and templates for LibreOffice Suite that ease the transition from Microsoft Office as well as ready to use templates that automate the creation of Greek Legal Documents. Those templates aim to encounter time consuming tasks by removing the formatting and layout procedures from employee work-flow. Furthermore, an interface to access all those templates will be developed. All steps will be documented during the process and afterwards for future reference and development.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-librecust

Student[edit | edit source]

Christos Arvanitis

Mentors[edit | edit source]

Kostas Papadimas Theodoros Karounos Diomidis Spinellis


Software components and IP management[edit | edit source]

Description[edit | edit source]

Clio is a web based system for maintaining (meta-)information on software components.

Nowadays every piece of software is including and using many other software components, each one coming with their own license.

The goal of this project is to build a simple web system to be able to (manually) input and maintain this information!

This is a brand-new project; some analysis has been done but no code is available yet.

More details in the separate page Clio.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-clio

Student[edit | edit source]

Gopalakrishnan.V

Mentors[edit | edit source]

Alexios Zavras, Georgia Kapitsaki


WSO2 Identity Server Userstore using Web Services to get claims[edit | edit source]

Description[edit | edit source]

WSO2 Identity and Access Management Server is open source popular identity and access management server throughout the world, plus WSO2 Identity Server efficiently undertakes the complex task of identity management across enterprise applications, services, and APIs.

This project is based on the WSO2 Identity server version 5.4. Currently, the WSO2 identity server is consisting of SOAP services and in the near future, there will be REST API's which support for all functionalities and which is more effective. In current environment most It supports for different user stores like LDAP, JDBC, and MySQL as primary and secondary user stores.

WSO2 Identity server allows configuring multiple user stores to the system that are used to store users and roles. AS there are 2 types of user stores as a primary user store (mandatory) and secondary user store (optional). And all the user information is peristing on a single user store in this version. From this implementation it will separate as credential userstore and attribute user store. Attribute user store is simply used to store claims details which can be accessed by providing the user credential and secrete.With the having facility of creating a new user store the primary data which are saved to primary user store can be separated to different user stores as one for user details and other one is for user attribute (claims) details which can be accessed by providing user credentials and secrete.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-wso2

Student[edit | edit source]

Isuri Anuradha

Mentors[edit | edit source]

Panagiotis Kranidiotis Stamelos Ioannis


Python PenTest Library (PyPen)[edit | edit source]

A collection of tools supporting penetration testers.

Description[edit | edit source]

Development of a Python library for penetration testers. The library will include a set of tools for performing the basic tasks for attacking a remote host. It will include reconnaissance tools such as modules that will be able to collect data for a specific target either through the web or through user input. Moreover, other tools will be developed to create custom dictionaries for username and password attacks. Other attack techniques that will be supported include DoS attack, BruteForce attack as well as Inclusion attack. The library will also include various statistical functions for extracting additional information from a captured host.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-pypen

Student[edit | edit source]

Konstantinos Liosis

Mentors[edit | edit source]

Antonios Andreatos, Panagiotis Karampelas, Christos Pavlatos


Addition of Greek glyphs in the Open Source Fonts ArimaMadurai[edit | edit source]

Description[edit | edit source]

This project aims to extend the collection of fonts supporting Greek script in the Google Fonts Catalog. Indeed, today 19 serif fonts, 6 monospace fonts and 10 sans-serif fonts supporting Greek script are available. Moreover, only 2 fonts are explicitly intended for display text.

Arima Madurai is a font created by Natanael Gana and Joana Correia of NDISCOVER — a Portuguese type foundry. It is a multiscripts display font with 8 weights from thin to black and have a strong calligraphic influence. It has a lot of personality so it can be recognisable in headlines or brand names uses. I value the quality of the design and thanks to its low contrasts, it allows a good legibility and rendering on screen.

Regarding the history of Greek script, it is interesting and challenging to design a typeface with a calligraphic feel: in terms of design but also in terms of study. There are remarkable examples of Greek punch cutting from the most talented historical figures. The challenge will be to respect that history while keeping a well anchored contemporary form.

Arima Madurai already supports Tamil, Malayalam and Latin scripts and I would like to add Greek script to the glyphset. The fact that the font already supports multi scripts is a real benefit to the project: Arima Madurai already acts in non latin typographic environment and therefore displays a large set of shapes that can be used to match the Greek glyphs with the other ones.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-arimamadurai

Student[edit | edit source]

Rosalie Wagner

Mentors[edit | edit source]

Alexios Zavras, Irene Vlachou Εmilios Τheofanous


Addition of Greek glyphs in the Open Source Fonts Cantarell[edit | edit source]

Description[edit | edit source]

Cantarell is a humanist sans serif typeface optimized for on-screen reading. It was originally developed by Dave Crossland in the MA Typeface Design class of 2009 at the University of Reading using free software. Subsequently, it was licensed under an SIL Open Font License and has been the standard UI typeface for the open-source desktop environment GNOME since version 3.0 in 2010.

The fonts have been redesigned for the release of GNOME 3.28 in March 2018. Post-script outline quality improved significantly, spacing has been reworked and new weights have been added.

The family is currently growing to support additional writing systems. After initially applying with extending another typeface I was invited to change my project and add Monotonic and Polytonic Greek to the three Roman masters of Cantarell during GSoC 2018.

GSOC-2018 repositories[edit | edit source]

https://github.com/eellak/gsoc2018-cantarell

Student[edit | edit source]

Florian Fecher

Mentors[edit | edit source]

Alexios Zavras, Irene Vlachou Εmilios Τheofanous