GSOC2018 Projects

From Ελεύθερο Λογισμικό / Λογισμικό ανοιχτού κώδικα
Revision as of 18:50, 19 March 2018 by Ellak-editor (Talk | contribs) (Python PenTest Library (PyPen))

Jump to: navigation, search

GFOSS project proposals for GSOC 2018. Students interested to participate should check which of the following projects fits their interests and skills. Practical information for students. If additional information is required, please subscribe at and post your questions regarding GSOC projects there.

Proposed Projects for GSOC 2018


Using Weld for Cryptography in the Zeus e-voting application (

Brief Explanation:

Zeus is an e-voting application, in use by thousands of people. It ensures anonymity and verifiability of the entire voting process by using strong cryptographic primitives. In particular, encrypted voter ballots are shuffled and anonymised in a verifiable way before decryption. This is computationally costly, so the Zeus team is constantly looking for ways to improve performance.

Expected Results

Cryptographic operations are very costly and efficient implementations are essential for real-world production use. For example, when using Python, the underlying cryptography is typically implemented in C/C++, with suitable bindings in Python, so that the actual computations happen at a much greater speed than that afforded by the Python interpreter. In this context, in last year’s GSoC we implemented a Zero-Knowledge Shuffle mechanism in C/C+ and then made it available in Python through Cython. Now we want to try a new optimisation, offered by the use of the Weld framework ( Weld offers speed improvements by minimising the number of calls, and thus memory copy costs, between languages, in our case, between Python and C. In this GSoC project we want to implement our shuffle operations using Weld, measure the performance benefits, and offer it as a possible optimisation in Zeus.

Related GitHub repositories

Knowledge Prerequisites

  • Python
  • C, C++
  • Cryptography

Knowledge of Rust is a plus (Weld is implemented in Rust).

Mentors: Panos Louridas,Georgios Tsoukalas

Addition of Greek glyphs in Open Source Fonts

Brief Explanation

Many of the Open Source fonts (e.g., available at, do not include glyphs for Greek letters and are therefore useless for using in a Greek environment.

The aim of this project is to imporve this situation and add the missing glyphs in the correct Unicode codepoints. The exact set of fonts to be completed will be determined in discussions between the student and the mentor(s).

This is not a typical programming project. If you have never designed fonts before, it is probably not for you.

Expected Results

Full support for Greek text in a number of Open Source fonts.

Knowledge Prerequisites

Type design, font technologies. Please note that this is a special project, where coding, in the traditional sense, will not be enough.

Mentors: Alexios ZavrasIrene Vlachou Εmilios Τheofanous

Adding Greek language on NLP library

Brief Explanation:

Spacy is an open-source Python library for advanced Natural Language Processing. It's a very powerful and modern tool for applying NLP to real world problems. Among other functionality it provides Named Entity Recognition, deep learning integration, part-of-speech tagging and includes built in visualizers for syntax and NER. Spacy supports more than 25 languages but not Greek. Adding the Greek language will provide massive improvements on applying NLP on the Greek language, and allow for actions as Named entity recognition and Part-of-speech tagging

The procedure is well specified on, custom language data (stop words, tokenizer exceptions, punctuation rules etc) need to be added and tested.

Expected Results

The vocabulary, syntax, entities and word vectors for the Greek language. These will be produced with Spacy/gensim, after the language information is successfully added.

The Greek language model with then be added to for usage as a supported language model.

As a real world scenario in order to test the language model, analysis on a large number of Official Greek Government's Gazette (FEK-ΦΕΚ) is proposed, in order to extract entities and categorize these documents.

Related  repositories

Knowledge Prerequisites

Strong knowledge of the Greek language, Python language fluency and Regular Expressions knowledge are necessary for this.

Mentors: Markos Gogoulos Panos Louridas

Extraction of Responsibilities per unit in public sector organizations from the Government Gazette

Brief Explanation:

The objective of this project is to extend existing Government Gazette(GG) text mining code with Named Entity Recognition features that will allow the identification of Government Directorates and Divisions with the responsibilities assigned to them and the types of services they are required to provide according with their legal framework published in and the extraction of this information with related metadata (decision number, date of the GG issue). The aim is to link the management units with assigned roles and services per unit(Directorates, Divisions & Sections) and codify this specific information, which is hidden in the GG issue raw text. For this, the PDFs must be downloaded, converted into text and cleaned. Then, syntactic-based heuristics and/or machine learning techniques must be applied to identify specific Named Entities types with references to assigned responsibilities-services per unit(Directorates, Divisions & Sections) and links between the two must be extracted. Metadata concerning the GG issue and decision and/or law number will be also associated with the extracted association. The produced associations will be extracted in a machine usable/structured format (e.g. as RDF triples).

Expected Results

  • A module for manually annotating related entities and responsibilities-services assignment sections in raw text
  • A NER module, with trained models for detecting  Governmental Directions and Divisions in raw text
  • A module that associates entities with responsibilities and extracts related metadata from the GG issue

Related  repositories

Knowledge Prerequisites

Python, Java, Machine Learning

Mentors: Iraklis Varlamis, Σαράντος Καπιδάκης, Διονύσης Μοσχόπουλος Theodoros Karounos

Open source plugin for math and algebra in Moodle

Brief Description

Make the moodle learning environment more functional in relation to mathematics.   Although there are a lot of math editors available for moodle, there is still a lot of functionalities missing in relation to school algebra as an open source plugin

Expected Results

  • Develop a Moodle plugin with the following functionality.
  • Allow the teacher to design math assignments, which the student has to interactively solve. The plugin will provide real time feedback on each answer, whether it’s correct, partially correct or incorrect.
  • Give the teacher the ability to assign to students conversions of algebraic representations, (ie the student must make simplifications or factorizations), while the plugin monitors the process performed by the student and give him comments on the correctness of each step.
  • The same  functionality must be applied in the case of solving an equation.
  • Interoperate with the Geogebra plugin.
  • In addition to math-focused functionalities, it would be useful to create a page so that the teacher can write in the so-called “Student Progress Notebook” where the teacher comments on the performance of each student.

Related repositories

We will utilize the work implemented by CERN with CERNBOX

Knowledge Prerequisites




Κnowledge of math and algebra

Mentors: Avgoustos Tsinakos, Diomidis Spinellis


Brief Explanation:

Epoptes (Επόπτης  - a Greek word for overseer) is an open source computer lab management and monitoring tool. It allows for screen broadcasting and monitoring, remote command execution, message sending, imposing restrictions like screen locking or sound muting the clients and much more! It can be installed in Ubuntu, Debian and openSUSE based labs that may contain any combination of the following: LTSP servers, thin and fat clients, non LTSP servers, standalone workstations, NX or XDMCP clients etc.                                                   

Related GitHub repositories

Expected Results

Rewrite Epoptes with Python 3 support

Gtk3 with GObject Introspection instead of pygtk2

Improvements in the code structure ( Break existing code into python modules/packages)

Knowledge Prerequisites



Mentors: Fotis Tsiamis, Avgoustos Tsinakos

Government Gazette text mining, cross linking, and codification

Brief Explanation

The objective of this project is to extend existing Government Gazette text mining code to cross-link legal texts and detect the ministers that sign them. For this the text PDFs need to be downloaded and converted into text. Then, heuristic rules must be applied to detect references to other legal texts, which will be converted into hypertext form. Similar techniques will be used to detect the competent ministers. Two possible extensions are proposed. First, detect amendments incorporated within another law. Second, implement a prototype for editing a law in its codified form (e.g. on GitHub) and automatically creating from the changes the text to be legislated (the differences from the original law).

Related GitHub repositories

Expected Results

Detection of references to other laws; detection of competent ministers; codified legislation prototype

Knowledge Prerequisites


Mentors: Diomidis Spinellis Alexios Zavras Σαράντος Καπιδάκης Διονύσης Μοσχόπουλος


Brief Explanation

SCRIPTUM is a Web based Open-Source application that will be used in order to eliminate bureaucracy and document loss, providing the Administration Office of the Greek Republic Vice President an extensible and integrated environment for document publishing, categorization and administration. The overall project involves two basic sub projects:

  1. e-Protocol: for handling incoming/outgoing mail messages and their attachments. Using this system, the e-Protocol users can benefit from the advanced OpenKm properties in order to manipulate the messages. This subsystem provides an easy way to complete forms relevant to the incoming/outgoing mail, form letters from templates and maintains a document repository equipped with document based security.
  2. Case Management: this sub system provides a well established work flow for treating documentation relevant to specific organization 's operations. Assignment Operations provides a standard way for managing and directing specific actions to be taken from the employees Administration Office of the Greek Republic Vice President of in order to complete organization 's operations.

These systems use as a document repository the document management platform OpenKM.

Related GitHub repositories

Expected Results

  1. Interoperability with egov services in Greece and European Union ISA
  2. containerization of scriptum

Knowledge Prerequisites

· Java

- Maven

- JPA 2.1

- ZK 8.x

- Spring 4.x

Mentors: Nick Koskinas,Panagiotis Kranidiotis

Development of an open source Greek Spelling and Grammatical dictionary

Brief Explanation

Development of a spelling- grammatical tool that can work both as a LibreOffice extension and as a stand-alone web service by reusing the AfterTheDeadline API in order to be reused into a wide range of packages and platforms (Firefox, Chrome, Thunderbird, TinyMCE / Wordpress, jquery, etc.).

Expected results

  • Extraction of Greek words from platforms with open licences (Wikipedia, Wikinews - wiki dictionary- Wikipedia revision history etc)
  • Creation of a morphological dictionary of Modern Greek which will include all the extracted  verbs, adjectives into finite state transducers (for the implementation of morphological analyzer and morphological word generator through the tools of Apertium and HFST).
  • Implementation of the tool:

in python3/c+/c++.

as LibreOffice extension


Knowledge Prerequisites

  • C
  • C++
  • Python
  • SQL

Mentors: Kostas Papadimas Diomidis Spinellis

Libreoffice customization and creation of legal Templates for LibreOffice

Brief Explanation

LibreOffice customization in order to achieve a "familiar" look and menus for users that convert from MS Office 2013, and creation of specific templates for the Greek Legal system. The customization and templates should follow the development guidelines at .

Expected results

Customization and Templates should be accompanied with detailed documentation and instructions for developers and end users.

Knowledge Prerequisites

  • C
  • C++
  • Java
  • Python
  • Bash
  • Perl
  • Libreoffice Software Development Kit 6.0

Mentors: Kostas Papadimas Theodoros Karounos Diomidis Spinellis

Software components and IP management

More details in the separate page Clio.

Brief Explanation

A web-based system to manage data on software components and their relations.

Nowadays every piece of software is including and using many other software components, each one coming with their own license.

The goal of this project is to build a simple web system to be able to (manually) input and maintain this information!

This is a brand-new project; some analysis has been done but no code is available yet.

Expected Results

A complete web-based system to manage the above-mentioned data.

Knowledge Prerequisites

Web (any technology welcome)

Mentors: Alexios Zavras Georgia Kapitsaki

Files DB

More details in the separate page FilesDB.

Brief Explanation

A system to keep meta-data for a large number of files.

The files that are to be processed can be on the filesystem, in archives, or parts of a (git) repository. The metadata are mainly information about the contents (size, type, hashes, ...), so that afterwards a number of questions can be asked about the actual files: how many files, how many files larger than X, how similar are two archives/repositories, when was a file introduced in a repository, etc.

This is a brand-new project; some analysis has been done but no code is available yet.

Expected Results

A software to collect the data from the input files and a collection of commands to query the data. These can all be command-line utilities; it would be worthwhile to also provide a Web interface, although it is not necessary.

Knowledge Prerequisites

C and/or Python; SQL; Git (and other systems); web

Mentors: Alexios Zavras

Σαράντος Καπιδάκης

CScout AJAX-based Interface

Brief Explanation

CScout is a source code analyzer and refactoring browser for collections of C programs. It can process workspaces of multiple projects (a project is defined as a collection of C source files that are linked together) mapping the complexity introduced by the C preprocessor back into the original C source code files. CScout takes advantage of modern hardware (fast processors and large memory capacities) to analyze C source code beyond the level of detail and accuracy provided by current compilers and linkers. The analysis CScout performs takes into account the identifier scopes introduced by the C preprocessor and the C language proper scopes and namespaces. CScout has already been applied on projects of tens of thousands of lines to millions of lines, like the Linux, OpenSolaris, and FreeBSD kernels, and the Apache web server.

The aim of this project is to redesign the current web interface, which is based on HTML forms generated by C++ code, into a responsive AJAX-based one. Under this scheme the C++ code will provide JSON data through a RESTful interface, which the JavaScript front-end will use.

Expected Results

A modern responsive web interface offering the current capabilities of CScout. Ideally this would include in-line editing of identifiers.

Related GitHub repository

Knowledge Prerequisites

  • C++
  • JavaScript
  • A modern development framework for interactive web content

Mentor: Diomidis Spinellis Stamelos Ioannis

WSO2 Identity Server Userstore using Web Services to get claims

Brief Explanation

WSO2 Identity Server provides secure identity management for enterprise web applications, services, and APIs by managing identity and entitlements of the users securely and efficiently. The Identity Server enables enterprise architects and developers to reduce identity provisioning time, guarantee secure online interactions, and deliver a reduced single sign-on environment. WSO2 Identity Server is fully open source and is released under Apache Software License Version 2.0.

The aim of this project is to create a new type of userstore where credentials will be separeted from attirbutes and attributes (claims) will be able to be configured from the web UI as a SOAP or REST web service. The end-user should be able to

  • configure credentials for LDAP or JDBC
  • configure web service authentication
  • configure claims to consume the above web service

Expected Results

A new userstore where end-user can configure using existing web interface, user claims through web services client. The appropriate changes in the source code should be uploaded in the upstream branch of the latest version (5.4.0)

Related GitHub repository

Knowledge Prerequisites

  • Java JSP
  • JSTL
  • Maven
  • OSGI Framework
  • A modern development framework for interactive web content

Mentors: Panagiotis Kranidiotis Stamelos Ioannis

WSO2 Identity Server Greek i18n translation

Brief Explanation

WSO2 Identity Server provides secure identity management for enterprise web applications, services, and APIs by managing identity and entitlements of the users securely and efficiently. The Identity Server enables enterprise architects and developers to reduce identity provisioning time, guarantee secure online interactions, and deliver a reduced single sign-on environment. WSO2 Identity Server is fully open source and is released under Apache Software License Version 2.0.

The aim of this project is internationalize in greek both Identity Server management ui and user dashboard.

Expected Results

The appropriate locale files and the changes in the source uploaded in the upstream branch of the latest version (5.4.0).

Related GitHub repository:

Knowledge Prerequisites

  • Java JSP
  • JSTL
  • Maven
  • OSGI Framework
  • Javascript
  • A modern development framework for interactive web content

Mentor: Panagiotis Kranidiotis Kostas Papadimas

The Transparency Program initiative - Diavgeia

Brief Explanation

Diavgeia is an application with a Mysql backend that is used in order to upload administrative decisions to an online public repository. This project participated in GSOC2017 as Diavgeia Redefined and a RDF Schema was implemented, which covers almost every possible decision type and the Notation3 (N3) syntax was adopted.

Related GitHub repositories

Expected Results

  1. Implementation of a query answering system which will translate natural language queries to SPARQL queries. This system will offer ordinary citizens a way to examine the legality and good administration of Diavgeia, without posing SPARQL queries by themselves.
  2. Find a way to ensure the integrity of the SPARQL endpoint. For the time being, decisions are stored both as compressed Notation3 files in the filesystem of Diavgeia and in Jena Apache’s triple store. The Stamper tool which is responsible for storing Notation3 decisions on the bitcoin blockchain, ignores the fact that these decisions are also stored on the triple store. That means that a modification/deletion of a decision from the triple store will go unnoticed. Students may have to implement a “Full Verification Procedure”. This procedure will extend the functionality of the Consistency Verifier tool, which will not only consider the computation of the Merkle Tree of Notation3 decisions, but it will also check that for each Notation3 decision, all related data is also offered through the SPARQL endpoint.
  3. Consistency Verifier tool makes requests to the explorer in order to read the Merkle Root and compare it to the computed Merkle Root of Notation3 decisions. One may assume that the administration of Diavgeia influences somehow the functionality of, and thus “foul play actions” may go unnoticed. Consistency Verifier may be extended and offer the option to read the root directly from the blockchain (e.g. using the bcoin.js library).

Knowledge Prerequisites

· PHP and/or JAVA

· Mysql

. Blockchain

Mentors: Nikos Tsiridis,Panagiotis Kranidiotis

OpenProject module to support of the PM2 methodology for project management

Brief Explanation:

PM² is a Project Management Methodology developed by the European Commission, is built on Project Management best practices and is supported by the following four (4) pillars:

1) a project governance model(Roles & Responsibilities)

2) a project lifecycle (Project Phases)

3) a set of processes (Project Management activities)

4) set of project artefacts (templates and guidelines).

For a full support of PM² Project Management Methodology by OpenProject , new modules should be developed on OpenProject.

Expected Results:

Support for the process of PM² itself in OpenProject: That is, the tool should support defining and handling the roles, phases and activities (in terms of PM 2 governance, life-cycle and processes pillars) for every project.

The module should sθpport:

Support the PM² Governance Model (Roles ; Responsibilities)

Support the PM² Phases

Support the PM² Artefacts

Supportall PM2 plans and logs such as Change Log, Communications, Issue, Project ,Quality , Requirements, Risk

Related GitHub repositories\

Knowledge Prerequisites

  • ruby
  • scrum
  • angular
  • gantt-chart

Mentors: Diomidis Spinellis Stamelos Ioannis

Sampling and volume approximation

Brief Explanation:

Volume computation is a fundamental problem in discrete and computational geometry with many applications in statistics, biology and economics. There is a variety of implemented solutions for that problem but they scale only to low dimensions (typically less than 10). The first implementation that scales to high (i.e. few hundred) dimensions is implemented in the C++ open-source software package VolEsti ( The package contains algorithms for volume approximation as well as uniform sampling for polytopes. The main purpose of the current project is two-fold: first provide an non-C++ interface to the functionality of  VolEsti and second extend and improve VolEsti features.

The coding project could be divided in the following steps: 

  1. Understand the code structure and design of VolEsti as well as the implemented algorithms.
  2. Re-evaluate the dependency from some libraries. For example VolEsti depends on CGAL for linear programming but this is inefficient for high dimensions. Therefore, a goal is to rewrite parts of code to remove CGAL dependency and use an open-source library for convex optimization.
  3. Implement new features: new sampling and volume computation algorithms, support for polytopes given by the set of vertices of the convex hull, support for spectahedra.
  4. Create an non-C++ user friently interface, e.g. in python/SciPy/jupyter.
  5. Writing tests and documentation.

The steps are generic enough to be easily adapted to student's background and interests.

Expected Results:

A lot of users such as practicioners or researchers from a variety of scientific fields raging from biogeography to economics need a high level programming or scripting environment to test volume computation or sampling algorithms.

Related GitHub repositories

Knowledge Prerequisites

  • Must: C++, generic programming, linear algebra,
  • Plus: optimization, computational geometry, statistics

Mentors:Vissarion Fisikopoulos Panos Louridas Zafeirakis Zafeirakopoulos

Cryptocurrencies Wordpress plugin

Brief Explanation:

Combination of digital and physical donation channels in a single product/plugin that will be embeddable in both WordPress & plain HTML webpages. Usage of blockchain technology will enable users to donate without paying any amount of money. Charities will also be able to track users who donate via digital asset.

Expected Results

A functional WordPress plugin for Donation Box and a stand-alone module, which allows users to donate with fiat, cryptocurrencies and buy lending his computer’s computational power (hashing power).

Related GitHub repositories

Mentors: Spiros Kapetanakis


Brief Explanation:

netdata is a system for distributed real-time performance and health monitoring. It provides unparalleled insights, in real-time, of everything happening on the system it runs (including applications such as web and database servers), using modern interactive web dashboards. netdata is ideal for monitoring physical servers, VMs, VPS, containers, IoT.

netdata is highly active project, with hundreds of new users joining and hundreds of new installations completed, every day. netdata is also featured as one of the top github projects for 2016 (

Key components of netdata:

1. netdata daemon, its core, written in C, an asynchronous I/O, time-series database and web server, optimized for performance.

2. web dashboards, written in Javascript, HTML, CSS

3. external plugins, written in python, node.js, bash, C

netdata is the definition of full-stack development. It includes low-level C programming following embedded development principles, web-apps development for visualizing performance metrics, devops and sysadmin automation tasks for configuring health monitoring (alarms), statistical analysis of metrics, high level development for data collection from databases and third party apps, etc.

Related GitHub repositories:

Expected Results

These are the main development directions

1. add more data collection plugins in any computer language, for monitoring the performance of popular third party applications.

2. add more data visualizations and dashboards.

3. add more health monitoring alarms, for monitoring popular applications and operating systems.

4. port netdata to more operating systems.

5. improve the distribution of netdata by setting up and maintaining binary packages for popular distributions.

6. improve the technical documentation of netdata (setup guides, how-tos for monitoring applications or functions, etc).

7. help spreading netdata, by improving the community and social media reach of netdata.

More ideas can be found at the netdata github repo

Knowledge Prerequisites

Depending on the task in hand, different skills are required.

For example:

1. low level C programming (POSIX compliant, including pthreads)

2. python programming

3. node.js programming

4. web-app development (javascript, html, css)

5. bash scripting

6. linux system administration

Mentors: Costa TsaousisDiomidis Spinellis

Python PenTest Library (PyPen)

A collection of tools supporting penetration testers

Brief Explanation

Development of a Python library for penetration testers. The library will include a set of tools for performing the basic tasks for attacking a remote host. It will include reconnaissance tools such as modules that will be able to collect data for a specific target either through the web or through user input. Moreover, other tools will be developed to create custom dictionaries for username and password attacks. Other attack techniques that will be supported include DoS attack, BruteForce attack as well as Inclusion attack. The library will also include various statistical functions for extracting additional information from a captured host.

Related GitHub repositories

Expected Results

Development of an independent Python library which will also integrate other existing and well consolidated tools such as CUPP (already in Kali Linux) for assisting in penetration testing.

Proposed tools

A. User Reconnaissance & Information gathering

Α.1/ PyFBSniff: Facebook scraper

Α.2/ PyGenUser: Username list creation

Α.3/ PyDic: Dictionary creation

Future extensions will include tools similar to PyFBSniff for other social media such as Twitter and Google+.

B. Target System Reconnaissance & Information gathering

A collection of supportive tools gathering and presenting information about the Operating System and its processes.

Β.1/ PyPScanner: Port Scanner

Β.2/ PyPidStat: Process statistics creation

Β.3/ PySocketStat: Socket statistics creation

Β.4/ PyPipeStat: Pipe statistics creation

Β.5/ PyFileStat: File statistics creation

C. Attack PenTest tools

C.1/ PyDoS : DoS attack by flooding

C.2/ PyBruftp: Bruteforce attack to ftp server

C.3/ PyRansom: Ransomware script

The library will be expandable in order to incorporate more tools in the future.

Knowledge Prerequisites

Python fluency

OS basics

Networking basics

PenTest basics

Mentors Antonios Andreatos, Panagiotis Karampelas, Christos Pavlatos