Google Summer of Code 2018 Accepted projects

From Ελεύθερο Λογισμικό / Λογισμικό ανοιχτού κώδικα
Revision as of 09:16, 24 April 2018 by Pkst-1 (Talk | contribs)

Jump to: navigation, search


Contents

Adding Greek language on NLP library Spacy.io =

Brief Explanation:

Spacy is an open-source Python library for advanced Natural Language Processing. It's a very powerful and modern tool for applying NLP to real world problems. Among other functionality it provides Named Entity Recognition, deep learning integration, part-of-speech tagging and includes built in visualizers for syntax and NER. Spacy supports more than 25 languages but not Greek. Adding the Greek language will provide massive improvements on applying NLP on the Greek language, and allow for actions as Named entity recognition and Part-of-speech tagging

The procedure is well specified on https://spacy.io/usage/adding-languages, custom language data (stop words, tokenizer exceptions, punctuation rules etc) need to be added and tested.

Expected Results

The vocabulary, syntax, entities and word vectors for the Greek language. These will be produced with Spacy/gensim, after the language information is successfully added.

The Greek language model with then be added to Spacy.io for usage as a supported language model.

As a real world scenario in order to test the language model, analysis on a large number of Official Greek Government's Gazette (FEK-ΦΕΚ) is proposed, in order to extract entities and categorize these documents.

Related repositories

https://github.com/explosion/spaCy

Knowledge Prerequisites

Strong knowledge of the Greek language, Python language fluency and Regular Expressions knowledge are necessary for this.

Mentors: Markos Gogoulos Panos Louridas

Extraction of Responsibilities per unit in public sector organizations from the Government Gazette

Brief Explanation:

The objective of this project is to extend existing Government Gazette(GG) text mining code with Named Entity Recognition features that will allow the identification of Government Directorates and Divisions with the responsibilities assigned to them and the types of services they are required to provide according with their legal framework published in http://www.et.gr/ and the extraction of this information with related metadata (decision number, date of the GG issue). The aim is to link the management units with assigned roles and services per unit(Directorates, Divisions & Sections) and codify this specific information, which is hidden in the GG issue raw text. For this, the PDFs must be downloaded, converted into text and cleaned. Then, syntactic-based heuristics and/or machine learning techniques must be applied to identify specific Named Entities types with references to assigned responsibilities-services per unit(Directorates, Divisions & Sections) and links between the two must be extracted. Metadata concerning the GG issue and decision and/or law number will be also associated with the extracted association. The produced associations will be extracted in a machine usable/structured format (e.g. as RDF triples).


Expected Results

  • A module for manually annotating related entities and responsibilities-services assignment sections in raw text
  • A NER module, with trained models for detecting Governmental Directions and Divisions in raw text
  • A module that associates entities with responsibilities and extracts related metadata from the GG issue

Related repositories

https://github.com/arisp8/gazette-analysis


Knowledge Prerequisites

Python, Java, Machine Learning

Mentors: Iraklis Varlamis, Sarantos Kapidakis, Dionysios Moschopoulos Theodoros Karounos

Epoptes

Brief Explanation:

Epoptes (Επόπτης - a Greek word for overseer) is an open source computer lab management and monitoring tool. It allows for screen broadcasting and monitoring, remote command execution, message sending, imposing restrictions like screen locking or sound muting the clients and much more! It can be installed in Ubuntu, Debian and openSUSE based labs that may contain any combination of the following: LTSP servers, thin and fat clients, non LTSP servers, standalone workstations, NX or XDMCP clients etc.

Related GitHub repositories

https://www.github.com/Epoptes/epoptes

Expected Results

Rewrite Epoptes with Python 3 support

Gtk3 with GObject Introspection instead of pygtk2

Improvements in the code structure ( Break existing code into python modules/packages)

Knowledge Prerequisites

Python

GTK

Mentors: Fotis Tsiamis, Avgoustos Tsinakos

Government Gazette text mining, cross linking, and codification

Brief Explanation

The objective of this project is to extend existing Government Gazette text mining code to cross-link legal texts and detect the ministers that sign them. For this the text PDFs need to be downloaded and converted into text. Then, heuristic rules must be applied to detect references to other legal texts, which will be converted into hypertext form. Similar techniques will be used to detect the competent ministers. Two possible extensions are proposed. First, detect amendments incorporated within another law. Second, implement a prototype for editing a law in its codified form (e.g. on GitHub) and automatically creating from the changes the text to be legislated (the differences from the original law).

Related GitHub repositories

https://github.com/arisp8/gazette-analysis

Expected Results

Detection of references to other laws; detection of competent ministers; codified legislation prototype

Knowledge Prerequisites

Python

Mentors: Diomidis Spinellis Alexios Zavras Sarantos Kapidakis Dionysios Moschopoulos

Libreoffice customization and creation of legal Templates for LibreOffice

Brief Explanation

LibreOffice customization in order to achieve a "familiar" look and menus for users that convert from MS Office 2013, and creation of specific templates for the Greek Legal system. The customization and templates should follow the development guidelines at https://wiki.documentfoundation.org/Development/GetInvolved .

Expected results

Customization and Templates should be accompanied with detailed documentation and instructions for developers and end users.

Knowledge Prerequisites

  • C
  • C++
  • Java
  • Python
  • Bash
  • Perl
  • Libreoffice Software Development Kit 6.0

Mentors: Kostas Papadimas Theodoros Karounos Diomidis Spinellis

Software components and IP management

More details in the separate page Clio.

Brief Explanation

A web-based system to manage data on software components and their relations.

Nowadays every piece of software is including and using many other software components, each one coming with their own license.

The goal of this project is to build a simple web system to be able to (manually) input and maintain this information!

This is a brand-new project; some analysis has been done but no code is available yet.

Expected Results

A complete web-based system to manage the above-mentioned data.

Knowledge Prerequisites

Web (any technology welcome)


Mentors: Alexios Zavras Georgia Kapitsaki

WSO2 Identity Server Userstore using Web Services to get claims

Brief Explanation

WSO2 Identity Server provides secure identity management for enterprise web applications, services, and APIs by managing identity and entitlements of the users securely and efficiently. The Identity Server enables enterprise architects and developers to reduce identity provisioning time, guarantee secure online interactions, and deliver a reduced single sign-on environment. WSO2 Identity Server is fully open source and is released under Apache Software License Version 2.0.

The aim of this project is to create a new type of userstore where credentials will be separeted from attirbutes and attributes (claims) will be able to be configured from the web UI as a SOAP or REST web service. The end-user should be able to

  • configure credentials for LDAP or JDBC
  • configure web service authentication
  • configure claims to consume the above web service

Expected Results

A new userstore where end-user can configure using existing web interface, user claims through web services client. The appropriate changes in the source code should be uploaded in the upstream branch of the latest version (5.4.0)

Related GitHub repository

https://wso2.github.io/

https://wso2.github.io/using-maven.html

https://wso2.github.io/github-repositories.html#IS

https://wso2.github.io/github-repositories.html

Knowledge Prerequisites

  • Java JSP
  • JSTL
  • Maven
  • OSGI Framework
  • A modern development framework for interactive web content

Mentors: Panagiotis Kranidiotis Stamelos Ioannis

Python PenTest Library (PyPen)

A collection of tools supporting penetration testers

Brief Explanation

Development of a Python library for penetration testers. The library will include a set of tools for performing the basic tasks for attacking a remote host. It will include reconnaissance tools such as modules that will be able to collect data for a specific target either through the web or through user input. Moreover, other tools will be developed to create custom dictionaries for username and password attacks. Other attack techniques that will be supported include DoS attack, BruteForce attack as well as Inclusion attack. The library will also include various statistical functions for extracting additional information from a captured host.

Related GitHub repositories

https://github.com/jmortega/python-pentesting

Expected Results

Development of an independent Python library which will also integrate other existing and well consolidated tools such as CUPP (already in Kali Linux) for assisting in penetration testing.

Proposed tools

A. User Reconnaissance & Information gathering

Α.1/ PyFBSniff: Facebook scraper

Α.2/ PyGenUser: Username list creation

Α.3/ PyDic: Dictionary creation

Future extensions will include tools similar to PyFBSniff for other social media such as Twitter and Google+.


B. Target System Reconnaissance & Information gathering

A collection of supportive tools gathering and presenting information about the Operating System and its processes.


Β.1/ PyPScanner: Port Scanner

Β.2/ PyPidStat: Process statistics creation

Β.3/ PySocketStat: Socket statistics creation

Β.4/ PyPipeStat: Pipe statistics creation

Β.5/ PyFileStat: File statistics creation


C. Attack PenTest tools

C.1/ PyDoS : DoS attack by flooding

C.2/ PyBruftp: Bruteforce attack to ftp server

C.3/ PyRansom: Ransomware script

The library will be expandable in order to incorporate more tools in the future.


Knowledge Prerequisites

Python fluency

OS basics

Networking basics

PenTest basics


Mentors Antonios Andreatos, Panagiotis Karampelas, Christos Pavlatos