Google Summer of Code 2019 proposed ideas
Students interested to participate should check which of the following projects fits their interests and skills.
For practical information, students visit this page.
- 1 Upgrade UMLGraph with Java's new doclet API
- 2 API Design Tool
- 3 Greek Government Gazette text mining, cross-linking, and codification - 3gm
- 4 Digital signing of grades - UniverSIS (open-sourse student info system)
- 5 Class and Room Scheduling - UniverSIS (open-source student info system)
- 6 Development of a DIY robot kit for educators
- 7 Addition of Greek glyphs in Open Source Fonts
- 8 Development of a Thesis Management System (TMS)
- 9 Extraction of Public Administration Organizations structure and assignment of Responsibilities from the Greek Government Gazette
- 10 Round-trip integration between GitHub/GitLab issues and git-issue
- 11 Symplegma
- 12 clio — Software Components and IP Management System
- 13 Replacement of LTSP
- 14 Port Qt Quick Controls Calendar widget to Qt Quick Controls 2 module
- 15 Development of a Tool for Extracting Quantitative Text Profiles
- 16 Anonymisation through data encryption of sensitive data in odt and text files in Greek Language
- 17 OpenProject Work-Package #1 to support modeling of the PM2 methodology for project management
- 18 OpenProject Work-Package #2 to support functionality of the PM2 methodology for project management
- 19 Real time Django monitoring and profiling
- 20 Moodle connection with multiple BigBlueButton servers
- 21 Creation of an online Greek mail dictation system, using Sphinx and personalized acoustic/language models training
- 22 Moodle ADaptable VIsualization for CommunitiEs ( Moodle - ADVICE)
- 23 Development of an open source Greek Spelling and Grammatical dictionary
- 24 CScout AJAX-based Interface
- 25 NextCloudPi
- 26 eIDAS SAML functionality to keycloak
- 27 Refactor open source remote systems and network management solution
- 28 Geometric sampling for volume computation and convex optimization
- 28.1 Brief Explanation
- 28.2 Expected Results
- 28.3 Expected impact
- 28.4 Related Repositories
- 28.5 Knowledge Prerequisites
- 28.6 Mentors:
- 28.7 Tests
Upgrade UMLGraph with Java's new doclet API
UMLGraph allows the declarative specification and drawing of UML class and sequence diagrams. One specifies a class diagram using the Java syntax complemented by javadoc tags. Running the UmlGraph doclet on the specification will generate a Graphviz diagram specification that can be automatically processed to create Postscript, PNG, SVG, JPEG, fig, or Framemaker drawings. The objective of the proposed project is to upgrade the UMLGraph code so that it uses the jdk.javadoc.doclet Doclet API rather than the currently used older package com.sun.javadoc. This new API provides an environment which, in conjunction with the Language Model API and Compiler Tree API, allows clients to inspect the source-level structures of programs and libraries, including API comments embedded in the source. Details on the mapping of old types to new types can be found in the Migration Guide https://docs.oracle.com/javase/9/docs/api/jdk/javadoc/doclet/package-summary.html#migration. In addition the project will also add support for Java features such as Lambdas and Generics, unit tests, and update the corresponding integration tests.
It is expected that the project will deliver a well-tested version of UMLGraph built around the new jdk.javadoc.doclet Doclet API with support for Java features such as Lambdas and Generics.
API Design Tool
Mentor Organisation: National Bank Of Greece
In the new world of “API Design First Approach” there is a need for a tool that can be used from both business or developer people to design the APIs. There are tools/IDE in the market at the moment that claim to fit this need but none has all the features required for the job.
Related GitHub repositories
There is no project URL at the moment.
- The API Design tool should fulfill the following requirements:**
- A person with no developer skills to be able to design/create an API visually.
- A developer to be able to design the API manually by writing/editing the swagger file.
- To be able to sync bidirectional changes to the API swagger file between the tool and a GitHub repository.
- To be able to download the API in a swagger yaml or json file (resolved or unresolved format).
- To be able to generate the API server stub in c#.
- To be able to preview the documentation of the API.
- To be able to download the documentation of the API.
- To be able to mock the API functionality.
- To be able to organize the APIs in projects and Teams.
- To be able to invite other people to collaborate with the API design.
- To be able to assign specific rights/roles to the people designing/viewing the the API.
- To be able to make public or private an API.
- To be able to create OAS2 and OAS3 APIs or convert between them.
- To be able to view at a glance what were the last changes and who made them.
AngularJS or ReactJS (latest stable version) C#
Government Gazette text mining, cross-linking, and codification Project (or 3gm for short) uses Natural Language Processing Methods and Practices on Greek Legislation.
The project is primarily aimed at providing with the most recent versions of each law, i.e. an automated codex (Code of Law)  via NLP methods and practices.
With 3gm, the Greek Government Gazzete Issues (FEKs) are automatically fetched, denoised and parsed in order to extract the amendments made to laws by newer ones . A versioning history of each law is kept on the database and is continuously served to the citizen via a web application. Therefore anyone has access to all different versions of each law at any time. The codification procedure is done by hand and this project automates it. The Greek Government Gazzete Documents are also kept on Internet Archive for easier retrieval and as a part of the public domain . The project was initiated in Google Summer of Code 2018  and a first phase was successfully carried out as a result of it. The most recent versions of laws can be found at https://3gm.ellak.gr.
The scope of this GSoC project for 2019 aims to expand the capabilities of the existing project by implementing NLP extensions (NER, Dep Parser etc.) in order to asses automated codification processes.
Possible extensions of 3gm for this year's GSoC can be found at the project's issue page here: https://github.com/eellak/gsoc2018-3gm/issues.
The candidate shall implement features which are part of the issue page or propose new ways and approaches to automated codification. The amount of work must be sufficient for the entire program. The issues have estimated durations and it is strongly advised to combine them in your proposal to a meaningful amalgamation.
1. Possibility of projection of the incorporation of a draft law into existing legislation. For example, incorporation of a suggestion from the consultation at opengov.gr and visualisation of possible changes it brings to existing legislation.
2. Ability of interactive corrections of encoded text resulting from the auto-coding algorithm.
imple users will be able to flag verbal description while advanced users will be able to interactively process / delete / modify / insert the correct references between 2 legends.
3. Ability to see the full history of a codified version of a law. (e.g. a page with the ability to track the changes that all the amending laws have brought to the text.)
4. Use of ELI (https://publications.europa.eu/en/web/eu-vocabularies/eli) as a metadata for the laws at 3gm.ellak.gr
5. Use one of the above Core Vocabularies to represent the structure / competencies / staffing of public administrations.
6. Possibility of interactive corrections of the structure and responsibilities derived from the NER & Metadata Extraction of the Greek Government Gazette
Source Code: https://github.com/eellak/gsoc2018-3gm
Web application: http://3gm.ellak.gr/
An ideal candidate would have the following skills:
- Advanced knowledge of Python
- Experience with at least one machine learning framework (e.g. PyTorch, Keras, Tensorflow)
- Basic DevOps skills (setting up a server with a database and deploying the web application)
- Greek as native language
- Solid understanding of machine learning algorithms and neural networks (DNNs, RNNs) as well as fundamentals of NLP (POS tagging, DEP parsing, NER, rule-based approaches)
- Basic knowledge in compilers would be appreciated
- Knowledge of MongoDB
- Familiarity with version control systems (git) and GitHub workflows (e.g. pull-requests, project boards)
Digital signing of grades - UniverSIS (open-sourse student info system)
UniverSIS is a student information system under development by and for the HEI community in Greece. It is built on open schemas and well-defined APIs (check https://www.universis.io/api-docs/ ) with node.js on the back-end and angular on the front-end. It currently has two front-end applications, one for Students and one for Teachers, while a third, regarding the Registrar is underway. We propose the integration of digital signing of grades for the Teachers front-end, which will allow the uploading of grades with cryptographic signing through hardware token devices. A previous implementation is in production at sis.auth.gr and supports only Chrome browsers through a custom plugin to allow the signing of a checksum through a usb cryptotoken device.
Related GitHub repositories
The work should produce a solution for protecting the long-term storage of course grades and allow for a posteriori verification of grades derived from the specific instructor. Digital signing of grades can be made available to all HEI in Greece, as all faculty obtain a national academicID (provided by GRNET) with a cryptotoken chip. Ideally, the UniverSIS open-source solution can be widely deployed and upgrade the security of grade/degree administration in Greek HEI.
Class and Room Scheduling - UniverSIS (open-source student info system)
UniverSIS is an effort for/by HEI in Greece to develop their own Student Information System. While front-end applications are being developed, the focus is on basic functionality only. We propose the extension of the schema of the UniverSIS platform to include the new models (classrooms, reservations) and the development of the relevant interface in the front end applications (Registrar, Teachers, Students). Basic functionality for making classroom reservations and course class scheduling by the Registrar. The teachers delivering the courses and the students enrolled should be able to view calendars of their scheduled classes. Advanced functionality such as cancellation of class and rescheduling could be implemented as an extra.
A previous implementation of such a system in full production for the Aristotle University of Thessaloniki can be viewed here:
The code for this fully functional but older application is based on AngularJS and deeply integrated with custom backend services available at our University. We intend to have a new implementation in Angular 7 integrated into the UniverSIS Student Information System framework and made available as an open-source solution that will provide a realistic alternative to the current aging solution in Greek universities.
Related GitHub repositories
To produce a solution for a university calendar system that allows easy adoption for any Higher Education institution (in Greece and abroad), integrating with an existing student information system (course and faculty assignments) for scheduling of classes and venues.
Development of a DIY robot kit for educators
The aim of the project will be to develop all the designs, guidelines and sample code for a starter DIY robot kit that can be 3d-printed, assembled and operated using basic electronics and sensors. This is expected to create a low-cost alternative to commercial robot kits (e.g. Lego Mindstorms) that does not require expert staff in robotics, electronics or IoT programming (e.g. using Arduino/Raspberry kits). The ability to 3D-print everything and combine it with low-cost basic electronics and sensors will allow regional open technologies initiatives to provide schools with starter kits and a full 'Robotic 101' introductory course.
The kit that will be developed and opened must comprise 3D-designs for all the necessary parts of a modular robot that can be printed and assembled following the assemble guidelines. The target audience of the project can be educators (e.g. high school ICT teachers), with minimum expertise in robotics, electronics, and programming. So the print and assembly guidelines must be detailed and simple. In addition, the project must have a modular structure that allows educators to guide their students to the step-by-step development of the robot and to the implementation of simple navigation or sensing scenarios, that require basic programming skills.
Deliverables of the project, apart from the robot parts' designs, include a detailed list of the necessary electronics and sensors and the specifications for a Raspberry pi or similar single board computer (SBG).
Detailed assembly instructions, images, and videos from the assembly process are desirable.
The open source code that will be installed and run on the SBG and will allow controlling the robot through a simple programming interface, along with installation guidelines must be developed.
The robot will be operated either manually using a browser that wirelessly connects with the robot, or automatically by uploading robot control scripts through the same environment.
Some sample control scripts and robot programming scenarios will also be developed.
In the three months of the project it is expected to have the basic robot designs, the libraries for controling basic sensors (ultrasonic sensor, IR sensor, micro switches, optical odometer, servo/dc motor), the core operating software for controlling the robot and some simple robot programming assignments.
The three months plan of the project must define: a) The selection of electronics parts, SBC, and motors. b) The 3D designs of the printed parts of the robot. c) The libraries and software for controlling the robot. d) The development of assembly guidelines and the creation of demo scenarios for the class.
Electronics, Robotics, Programming.
Addition of Greek glyphs in Open Source Fonts
Many of the Open Source fonts (e.g., available at https://fonts.google.com), do not include glyphs for Greek letters and are therefore useless for using in a Greek environment.
The aim of this project is to improve this situation and add the missing glyphs in the correct Unicode codepoints. The exact set of fonts to be completed will be determined in discussions between the student and the mentor(s).
This is not a typical programming project. If you have never designed fonts before, it is probably not for you.
Full support for Greek text in a number of Open Source fonts.
Type design, font technologies. Please note that this is a special project, where coding, in the traditional sense, will not be enough.
Development of a Thesis Management System (TMS)
The lifecycle of the final project Thesis takes a large amount of administrative work from the initiation phase of project assignment till the last stage of publishing the Thesis to the University's library catalog. The main entities in this lifecycle are students, teachers, externals (e.g. companies or academics that cooperates with the university) and of course the Thesis subject.
The cycle begins with professors announcing subjects, which can be their own subjects or subjects that have been suggested by externals or even the students who have contacted professors beforehand.
It continuous with students applying for subjects, from the list of available subjects and professors doing the final assignment.
When a student did not manage to get a subject for which he/she applied, the system raises a flag to the student advisor, who contacts the student and professors in order to find a subject.
Thesis subjects must fall under one or more topics, from a list that the department sets and the topic information along with the title, a description and a list of references is stored with each subject.
When the profesor finally decides on the student(s) that will carry out a project, he/she has to propose two more professors from the department or externals that will co-supervise the project.
When all the assignments have been fixed by the administrator of the TMS, they can be exported in a document which can be published on the department web site.
When the student completes the thesis he/she submits a draft to the TMS and automatically notifies the supervisor to provide feedback. This is repeated until the supervisor agrees that this is the thesis to be shared with the other two co-supervisors.
In the final step the supervisors comment on the thesis and the final document is submitted to the system.
The TMS must provide reports on the undergoing and completed thesis, must alert on delayed thesis and provide related statistics.
In the three months of the project it is expected to have the whole functionality required to support the TMS lifecycle.
Extraction of Public Administration Organizations structure and assignment of Responsibilities from the Greek Government Gazette
The project will expand the project "NER & Metadata Extraction from the Greek Government Gazette"(https://github.com/eellak/gsoc2018-GG-extraction) that has been developed in the last GSoC 2018 and resulted in a software that parses pdf files of the GGZ and automatically classifies its paragraphs to those that contain information about the Organization structure, those that contain Responsibility assignments and those that contain information about the required staff positions. The project is expected to fix any issues that the existing problem has with pdf file parsing, the classification of paragraphs and human-provided feedback but most important to provide new functionalities, that will extend the existing automatic annotation with entity extraction and codification of the extracted knowledge in a triple like format (e.g. RDF).
In the three months of the project it is expected to have a systems that takes a pdf as input or a csv annoted per paragraph and produces a file that summarizes the semantics of the Organization and its responsibilities (Departments, positions and responsibilities)
Python, RDF, NLP
Round-trip integration between GitHub/GitLab issues and git-issue
Git-issue is a minimalist decentralized issue management system based on Git. It has the following advantages over other systems.
No backend, no dependencies: You can install and use git issue with a single shell script. There's no need for a server or a database back-end, and the corresponding problems and requirements for their administration.
Decentralized asynchronous management: Anyone can add, comment, and edit issues without requiring online access to a centralized server. There's no need for online connectivity; you can pull and push issues when you're online.
Transparent text file format: Issues are stored as simple text files, which you can view, edit, share, and backup with any tool you like. There's no risk of loosing access to your issues because a server has failed.
Git-based: Issues are changed and shared through Git. This provides git issue with a robust, efficient, portable, and widely available infrastructure. It allows you to reuse your Git credentials and infrastructure, allows the efficient merging of work, and also provides a solid audit trail regarding any changes. You can even use Git and command-line tools directly to make sophisticated changes to your issue database.
Git-issue can currently import issues using the GitHub API. The project's objective is to extend this functionality with a way to synchronize between GitHub/GitLab issues and the issues kept under git-issue.
Git-issue extended for exporting its issues to GitHub/GitLab
Unix shell scripting
"Symplegma" stands for the combination of appropriate libraries for numerical computing with specialization to computational mechanics and orientation to educational and research purposes. Existing libraries, like "Apache Common Maths" for standard mathematics and statistics components, "FuturEye" a Java based Finite Element Method (FEM) Toolkit, "SymJava" for fast symbolic-numeric computation, among others, are combined with the in-house "Climax" library. "Climax" is a Java implementation of computational mechanics methods, e.g., the Boundary Element Method ("jbem" package) and the Finite Element Method ("jfem" package).
A simple IDE for manipulation of the above mentioned libraries, and possible extensions, has been developed in Java while it takes advantage of Apache Groovy, a powerful, optionally typed and dynamic language. That platform, under the acronym SDE, standing for Symplegma Development Environment.
Both educational and research activities are to be considered.
Toolbox development oriented to specific courses of higher education, Graphical User Environment update, extension of ploting capabilities.
numerical methods, computational mechanics, java, groovy
clio — Software Components and IP Management System
clio is a web-based system to manage data on software components and their relations. It started out as a GSoC 2018 project. For the 2019 GSoC, the main goals would be: - improvement of the UI - integration of SDPX data - extension to covering of file info (time permitting)
improvements to clio
Python, web front-end
Replacement of LTSP
LTSP (Linux Terminal Service Project) allows diskless workstations to be netbooted from a single server image, with centralized authentication and home directories. But the project shows its age; the initial thin-client focused design is no longer suitable for the netbooted fat client/wayland era, and it contains a lot of stale source code. This GSoC project is about designing and implementing a modern replacement of LTSP.
A modern replacement of LTSP should be implemented, as outlined in http://wiki.ltsp.org/wiki/Dev:GSoC. It should be ready for inclusion in Debian/Ubuntu, for LTSP users to be able to slowly migrate to it.
Netbooting internals, shell, python, git, debian packaging
Port Qt Quick Controls Calendar widget to Qt Quick Controls 2 module
Qt is an open source cross platform framework facilitating GUI applications development, for mobile, desktop and embedded devices. Nowadays it is widely used in applications from a variety of industries like automotive or medical. Although the framework is written in C++, it brings with it a meta-language (or modelling language), QML which’s purpose is to be used for creating the visual parts of the application easily and fast, thanks to its flexibility and clarity. To accelerate UI development, QML provides the Qt Quick Controls module with ready made widget types, each supported by a C++ class, like Button or Switch, ready to be styled and modified at our project needs. The module is currently on version 2.4 but there is no support for Calendar in the latest version, to be more specific, the Calendar was lastly provided in version 1.4 of the Qt Quick Controls module that was released with the Qt 5.3 version.
The Qt Calendar widget is updated, modified accordingly and ported into Qt 5.12 and Qt Quick Controls 2 current version. Ideally it will be upstreamed to Qt, contributing this way to the Qt ecosystem.
Alexandra Betouni, Amilcar Navarro
Development of a Tool for Extracting Quantitative Text Profiles
Quantitative text analysis is the basis of nearly every computational approach to text management and processing. All advanced Natural Language Processing (NLP) tasks including information retrieval, sentiment analysis, computational stylistics etc. involve the quantification of texts across a huge number of linguistic features and transform text into vectors. In many programming languages, e.g. R, Python, Java etc., there are numerous open source scripts, tools, packages and libraries that can transform texts to vectors of word frequencies, character and word n-gram frequencies, stylometric features etc. However, each of these tools covers only a restricted subset of the possible linguistic features.
Moreover, the available tools are written in different languages and require considerable efforts to be combined so that the user can extract a unified file of results. Due to the fragmentary nature of the programing environments and the highly technical skills that are required to operate the tools and combine their results, they can’t be used by large communities of scientists with humanities and sociopolitical background.
For the above reasons, we envisage the development of a user-friendly Graphical User Interface (GUI) based tool that shall provide integrated access to existing open NLP software. The new tool shall support the quantitative analysis of multilingual texts and produce quantitativetext profiles that can be used as input for further analysis, visualization, machine learning and other advanced computational processing. Such a tool does not exist to date and it will boost research in all scientific areas that require computational processing of large amounts of text.
The outcome of this project would be an open-source software with the following specifications: * User-friendly GUI that can guide intuitively its users to select the features they want to count in their text collections. * Large set of linguistic features that include at least:
** Most frequent words of the texts analyzed ** User-specified word lists ** Word and Character n-grams of arbitrary length ** Different stylometric features such as vocabulary diversity indices, readability indices, quantitative linguistic indices. * UTF-8 support * Corpus management features using text metadata
Good knowledge of the languages R, Java, Python and skills for GUI interfaces development. Good understanding of NLP concepts and tools.
Anonymisation through data encryption of sensitive data in odt and text files in Greek Language
Legal decisions that must be publicly available, contain a lot of sensitive information that has to be anonymized. GDPR defines pseudonymization in Article 3, as “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information.” To anonymise a data set, the “additional information” must be “kept separately and subject to technical and organisational measures to ensure non-attribution to an identified or identifiable person.”
A LibreOffice Extension and a Linux application with a web GUI that will anonymize information in legal documents(odt and txt). Must have the ability to mass edit files, and to recognize through NLP and anonymize entities (such as Names, Addresses- ID numbers- VAT- social security numbers or any other potentially sensitive information. The entities that will be anonymized through strong data encryption so that only people with access to a secret key or password can read the documents.
Python, Spacy, Encryption algorithms
OpenProject Work-Package #1 to support modeling of the PM2 methodology for project management
PM² is a Project Management Methodology developed by the European Commission. PM² is built on Project Management best practices and is supported by the following four (4) pillars:
# a project governance model(Roles & Responsibilities) # a project lifecycle (Project Phases) # a set of processes (Project Management activities) # a set of project artifacts (templates and guidelines).
For a full support of PM² Project Management Methodology by OpenProject, new modules should be developed on OpenProject that will allow modeling of the abovementioned four pillars of PM².
Support for the process of PM² itself in OpenProject: That is, development of Open Project plug-ins which should add support for defining and handling the roles, responsibilities, phases and activities (in terms of PM² governance, life-cycle and processes pillars) for every new project.
The work of this Work-Package includes: * the analysis of PM2 requirements * the setup of OpenProject development environment (Development VM) * the modeling of PM2 requirements in OpenProject environment via the development of one or more plug-ins. The plug-ins should follow the Open Project plug-in guidelines and be consistent with the OpenProject API (http://docs.openproject.org/apiv3-doc/). The plug-ins should add the required models, logic and DB tables in order to:
** Support the PM² Governance Model (Roles, Responsibilities) ** Support the PM² Phases ** Support the PM² Artifacts per Phase ** Support all PM2 plans and logs such as Change Log, Communications, Issue, Project, Quality, Requirements, and Risk * The development of tests for all the requirements (e.g. initiate PM² project, initiate phases, add artifacts, generate reports).
Related GitHub repositories
* Ruby on Rails
* Problem Analysis & Modeling
OpenProject Work-Package #2 to support functionality of the PM2 methodology for project management
This work-package is based on and extends OpenProject Work-Package #1 (Support modeling of the PM2 methodology for project management) by adding visual elements that will support for PM²project artifacts through templates, wizards, tools and guidelines.
Integrate the outputs of Work-Package #1 in OpenProject UI. For this, developed plug-ins from Work_Package #1 will be extended and new plug-ins will be developed in order to provide a rich User interface for the user. The plug-ins should follow the Open Project plug-in guidelines and be consistent with the OpenProject API (http://docs.openproject.org/apiv3-doc/) .
The developed functionality should:
* Provide wizards allowing the PM² user to:
** create and initiate PM² based project Templates ** Initiate Project phases ** add PM² artifacts per phase ** create reporting templates with visual content like Gantt and Pert charts ** add PM² artifacts with textual content like Risk Logs. * Support all PM2 plans and logs such as Change Log, Communications, Issue, Project ,Quality , Requirements, Risk * Provide tools such as
** 3-point estimation with triangular and/or beta distribution ** Critical Path Method (CPM) ** Monte-Carlo analysis/technique ** Network Diagram (FS, SS, FF) ** Work Breakdown Structure (WBS) * Support Agile Specific Artifacts and Tools, e.g. Iterations, Kanboards etc. * Support visual representation of PM² using standards like BPMN, STRATML, RDF etc * Allow connection with external systems (e.g. Open ID, Media-Wiki) * Allow for Export and import for/to other formats for data exchange (e.g. MS Project export/import, PDF, XLS etc.).
Related GitHub repositories
* Ruby on Rails
* ES6 *
statistical modeling methods for PM (Monte carlo , CPM etc)
Real time Django monitoring and profiling
Modern Django web applications expose a plethora of URLs and API endpoints that are consumed by a number of clients (web browsers, API clients, Django management commands) under different authorization and authentication credentials. It is not always feasible or easy to replicate the action that resulted in a system crash, a 500 error or a heavily delayed response but it is necessary to trace such incidents to facilitate debuggind and fixing of errors. There's also the obvious fact of monitoring the application availability and uptime and get alerted in case of mulfunctions. While there are great tools as Django Debug Toolbar and Silk that provide insights and application profiling for debugging issues, there's a lack of an open source solution that unifies real time monitoring and application profiling for both the application and database layers. This need is usually covered by either inhouse solutions, or by expensive proprietary Software as a Service solutions for those that can afford it.
As part of this project we suggest that a unified solution is developed, either as an extension to an existing open source tool (as Django silk) or by leveraging tools as ElasticSearch/Kibana. The system should be able to monitor and log all sort of requests, along with error stacktraces and related information and present on a web dashboard with grouping capabilities (eg similar endpoints) and statistics about slow requests. Given the fact that the tool should run in production, there are some architectural notes to take care, for example perform database saves not in real time but rather offloaded by a system as Celery, or sent to a separate database (eg ElasticSearch).
This is an opportunity to provide a really beneficial tool for the Django community.
An overview of similar solutions (mainly open source but also commercial offerings). A web dashboard exposing information about malfunctions and providing real time monitoring for Django applications.
Django, Celery, Django-debug-toolbar, optionally ElasticSearch/Kibana, Postgresql
Moodle connection with multiple BigBlueButton servers
Moodle is a free and open-source learning management system (the most widely used one) used by the our Institute in training programmes of thousands of teachers.
BigBlueButton is an open-source web conferencing system while the BigBlueButtonBN is an open source plugin that integrates BigBlueButton (BBB) within Moodle, providing online classrooms.
However, the plugin supports the configuration of only one BBB server per Moodle installation. So, the number of students that can participate concurrently in virtual classrooms is limited to the capacity of a specific BBB server.
This is why, for better load balancing, our institute is looking for an upgrade of the plugin so that one can select a different BBB server host (BigBlueButton Server URL and BigBlueButton Shared Secret) , for each BBB virtual classroom that one creates. We would also like to have all the rest configuration parameters set per virtual classroom.
BigBlueButtonBN plugin configuration parameters set per virtual classroom and not globally
PHP, MySQL, HTML5, CSS3, jQuery
Creation of an online Greek mail dictation system, using Sphinx and personalized acoustic/language models training
CMUSphinx comprises a collection of systems/algorithms, towards automatic speech recognition (ASR), and is one of the most well known open-source ASR toolkits. Its current version is Sphinx 4, written in Java, but PocketSphinx exists as well, being a lightweight version that can operate in embedded systems. Sphinx includes libraries for acoustic and language model training, recognizers, as well as a number of ready-to-deploy statistical language models, including Greek (from 2017). In our work, we aspire to utilize the Sphinx tool, so as to create an online Greek mail dictation system. The system will comprise several sequential steps. The first step concerns the personalized acoustic model adaptation using the Sphinx tools, done via providing specific sentences the user has to dictate. The second phase is for the user to provide access to some of their mails, in order to train a statistical language model, adapted to their way of writing. Furthermore, an automatic classification based on various topics will be performed, so as to create different statistical language models, for heterogeneous mail corpuses. Finally, the ASR output text will be fed to the NLP (natural language processing) system that, based on the provided corpuses, will auto-correct or suggest corrections on the (usually erroneous) generated text. This system will be deployed as an online webpage, where the heavy processing will occur in the cloud.
The expected outcome will be a standalone web page, via which automatic speech to text towards personalized mail dictation will be offered. The code will be opensource and provided by GitHub repositories. Our approach will offer A) improvements in the speech-to-text procedure by acoustic model adaptation to individual users and statistical text model adaptation based on already existent corpuses (the user’s mails) and B) a standalone tool for everyone to utilize.
The following are desired, but are not mandatory: Programming languages: Java, Python Techniques: Web protocols like REST and WebSocket , Natural Language Processing and Automatic Speech recognition
Moodle ADaptable VIsualization for CommunitiEs ( Moodle - ADVICE)
Make the forum communication more effective by using ADVICE, a visualization learning analytics tool. Through ADVICE, the users are able to characterize every message posted on the forum, based on the learning theory of Communities of Inquiry. At the same time, ADVICE provides to the learners and instructors, adaptable visualizations that reflect the development of the discussion as well as qualitative data about the user’s/learner’s contribution to the discussion. This way, the students are promoted to post messages with cognitive value and the forum is transformed from a simple means of communication into a tool of learning and community building.
Develop a moodle plugin with the following functionality: •Allow the users to characterize every user’s message posted in the forum through a dropdown menu with various characterizations that follow the Community of Inquiry theory. • Oblige the student to characterize his own message before posting it to the forum. •Capture the data derived from the above functionalities. •Statistically analyze and combine the data from students’ interaction within the forum (number of posted messages, number of visits in the forum, etc) and the data captured from the above functionalities of ADVICE, in order to calculate qualitative and qualitative indicators of the discussion that reflect discussion development according to the Community of Inquiry theory. • Provide an adaptable visualization of the progress of the discussion according to various dimensions that the user will select from (e.g. according to the community’s perspective or the “high-participation students’” perspective). •Provide a star-chart visualization for the contribution of each user according to the analysis results in means of the Community of Inquiry theory. • Provide to the instructor a file with the data captured and analyzed.
PHP, C++, visualization
Kyparisia Papanikolaou, Maria Tzelepi
Development of an open source Greek Spelling and Grammatical dictionary
Development of a spelling- grammatical tool that can work both as a LibreOffice extension and as a stand-alone web service by reusing the AfterTheDeadline API in order to be reused into a wide range of packages and platforms (Firefox, Chrome, Thunderbird, TinyMCE / Wordpress, jquery, etc.).
- Extraction of Greek words from platforms with open licences (Wikipedia, Wikinews - wiki dictionary- Wikipedia revision history etc)
- Creation of a morphological dictionary of Modern Greek which will include all the extracted verbs, adjectives into finite state transducers (for the implementation of morphological analyzer and morphological word generator through the tools of Apertium and HFST).
- Implementation of the tool:
as LibreOffice extension
CScout AJAX-based Interface
CScout is a source code analyzer and refactoring browser for collections of C programs. It can process workspaces of multiple projects (a project is defined as a collection of C source files that are linked together) mapping the complexity introduced by the C preprocessor back into the original C source code files. CScout takes advantage of modern hardware (fast processors and large memory capacities) to analyze C source code beyond the level of detail and accuracy provided by current compilers and linkers. The analysis CScout performs takes into account the identifier scopes introduced by the C preprocessor and the C language proper scopes and namespaces. CScout has already been applied on projects of tens of thousands of lines to millions of lines, like the Linux, OpenSolaris, and FreeBSD kernels, and the Apache web server.
A modern responsive web interface offering the current capabilities of CScout. Ideally this would include in-line editing of identifiers.
Related GitHub repository
- A modern development framework for interactive web content
NextCloudPi is an open source project that aims to simplify the installation of Nextcloud server to amateurs but also advanced users who want to avoid maintenance.
At the moment it has these main features:
- Ready to use Images for RasbperryPi and other ARM Boards based on Debian-ish Distros (Raspbian - Armbian)
- Docker Images for ARM and x86 architectures
- 1 Bash script that installs everything on a clean Debian System (Allows installation on ARM boards that not yet have image, or systems that don't support/want to use docker)
- State of the art configuration of Apache, PHP-FPM, Mariadb, Redis and more
- Features like: Backup, Restore, SSL Certificates, DDNS Clients, NFS, Samba, UFW, Fail2ban, modsecurity, nc-report and many (many) more.
- Offers 2 choices to manage the system (They both use the same back-end scripts)
- TUI (`ncp-config` from terminal)
- WebPanel (`https://nextcloudpi.lan:4443` from a web browser)
It is written mostly in `BASH` and a bit with `php`,`html`,`css`,`js` language
- Develop: Nextcloud Native NCP App (webpanel)
- Make a nice UI/UX
- Create Backups-Restore/Import-Export/Snapshots UI
- Develop Onlyoffice easy installation (not ARM architecture yet)
- Develop: Collabora online easy installation (not ARM architecture yet)
- Develop: Vagrant installation
- Develop: Ansible role
- Develop: CI/CD to build releases on github
- Design - Develop: High availability option for big installations
- Mobile app integration (Manage - Info - Users)
- Develop: Easy way to selfhost email
- Write: Best documentation possible (https://docs.nextcloudpi.com)
- Write: Guides
- Write: Make a simple small video for amateurs
- Make contacts and calendar encrypted
Related GitHub repositories
eIDAS SAML functionality to keycloak
Keycloak is an open source Identity and Access Management solution aimed at modern applications and services. It makes it easy to secure applications and services with little to no code.
We would like to add eIDAS eIDAS SAML functionality to keycloak. Keycloak is one of the open source IAM currently available. A comparison of open-source SSO implementations is available here, a recent video comparison between keycloak and WS02 was present in recent IDM2018 and an opinionated comparison in stackoverflow.
eIDAS(electronic IDentification, Authentication and trust Services) is an EU regulation on electronic identification and trust services for electronic transactions in the internal market. A cross border connection within the member states of EU uses an eIDAS network that consists of a series of eIDAS-nodes implemented at the member state level.
An eIDAS-node consists of an eIDAS connector and an eIDAS proxy service or eIDAS middleware service. An eIDAS node can request a cross border authentication through the eIDAS connector and provide a cross border authentication through the eIDAS service (an eIDAS service can operate either in an eIDAS proxy service or eIDAS middleware service). As a matter of fact eIDAS implements a specific dialect (i.e. Profile) of SAML. Following is an example eIDAS network with proxy to proxy connection between two member states(MS).
What happens here is as follows: -The user (citizen) of MS A requests access to a service provider in MS B. -The service provider in MS B sends the request to its own connector. -On receipt of the request, the connector asks the user for the country of origin (TLS protocol). -When the user selects the country of origin, the SAML request is forwarded by the connector to the eIDAS-node proxy service of the user's member state. - The eIDAS-node proxy service sends the SAML request to the identity provider for authentication, and a user authenticates using the electronic identity.Once authenticated, this identity is returned to the eIDAS-node proxy service. -The eIDAS-node proxy service sends a SAML assertion to the requesting connector, which forwards the response to the service provider. -Τhe service provider grants access to the user.
A similar functionality has been added to WSO2.We believe the Apache Keycloack presents a lower barrier to enter, technologically speaking due to its easier admin dashboard procedures. Furthermore keycloak brings enough community to actively support its open software base.
Related GitHub repositories
Refactor open source remote systems and network management solution
The project is about refactoring the current version of OpenRSM that was initially developed in 2011 It has the following main features:
- It addresses the needs of system and network admins
- Its philosophy is to be simple, fast and configurable in order to foster combinational solutions.
- The core system is capable to manage any workstation or server and monitor the operation of active network elements.
- Extensions of the system cover the management of wireless sensor networks and embedded systems.
- The system has been tested in pilot installations and stressed for scalability in the lab.
- Port OpenRSM to Qt 5.12
- Port from NINO to OpenNMS
- Port to Winventory to OCS (OCSInventory-Server)
- Port to latest version of OPSI
- Port from UltraVNC to latest version of TigerVNC
- Port OpenRSM from sourceforge to GITHUB
- Upgrade all subsystems of OpenRSM to the latest varsions
- Develop a Docker Installation for OpenRSM
- Develop a Vagrant Installation for OpenRSM
- Develop an Ansible role for OpenRSM
- Update documentation
- Update Installation Guides
- Make video Guides for admins
- QT (https://en.wikipedia.org/wiki/Qt_%28software%29)
- Object Pascal
- SQL(MySQL innodb - myisam)
- Linux shell/bash
- NSIS (Nullsoft Scriptable Install System)
- TCP/IP protocol
- SNMP protocol
Geometric sampling for volume computation and convex optimization
Convex optimization and volume computation are fundamental problems in mathematics and computer science with many applications that span the whole spectrum of sciences and engineering. It appears, for example, in problems in statistics, biology, and economics, to name a few concrete application areas.
VolEsti (https://github.com/GeomScale/volume_approximation) is a C++ package with an R interface that performs efficient high dimensional sampling and volume computation. It supports a variety of convex polytope representations and scales to high (i.e., a few hundred) dimensions. To our knowledge it is the only software that combines the above features.
The main purpose of this year’s projects is to extent VolEsti’s functionality and as a consequence to provide state-of-the-art algorithms for sampling and volume computations to the R-project. We propose the following projects:
Project 1. Sampling scalability
This project contains the empirical study of random walks for convex polytopes (mainly given by a set of linear inequalities). Currently variations of hit-and-run random walks are used but there are methods in bibliography with better mixing time; most notably the hamiltonian walk https://arxiv.org/pdf/1710.06261.pdf. We expect that an efficient implementation of such a method to have a dramatic effect in the scaling of the underlying algorithms (mainly sampling and volume computations, but there are also connections to convex optimization). We set the ultimate goal in one sentence: “scale from a few hundred dimensions to a few thousand”.
We can divide the coding project in the following steps:
- Understand the code structure and design of VolEsti.
- Create prototypes for new sampling algorithms (the main focus will be in Hamiltonian walk but we may investigate others too).
- Implement the best representatives from the previous step in VolEsti and create R interfaces.
- Write tests and documentation.
Project 2. Sampling and volume of spectahedra
This is a non-linear extension for VolEsti. Spectahedra are feasible regions of semidefinite programs and are known to be convex. They play an important role in optimization since they are the “next more understandable” convex objects after polytopes. Offering algorithms for sampling and volume computation will shed more light towards their study.
The coding project consists in the following steps:
- Understand the code structure and design of VolEsti. Understand the basics for spectahedra from bibliography.
- Implement a new convex body type and boundary oracle for spectahedra.
- Work on extensions of the problem such as replacing spectahedra by a spectahedral shadow.
- Write tests and documentation.
Project 3. Convex optimization with randomized algorithms
A very related topic to volume approximation via sampling is convex optimization. This project proposes the design and implementation of optimization algorithms (available in relevant bibliography) in VolEsti that utilize sampling (already available in the library) as a main subroutine.
- Understand the code structure and design of VolEsti.
- Implement optimization algorithms. A good place to start is http://www.optimization-online.org/DB_FILE/2008/12/2161.pdf
- Test implementations with various random walks available in VolEsti
- Write tests and documentation
A lot of users such as practitioners or researchers from a variety of scientific fields ranging from biogeography to economics need a high level programming or scripting environment to test volume computation or sampling algorithms. With the successful completion of the current proposed projects the library and thus the R-project will benefit from (a) faster and more scalable sampling methods, (b) support for non-linear objects, (c) support for optimization algorithms. The above benefits will enhance the experience of current users but more importantly it will attract new users since we expect that the library will be used to solve problems that cannot be solved today with available software tools.
There are practical and theoretical consequences. From a theoretical point of view we will be able to study the volume of the feasible region of SDP and experiment with theta bodies, thus we will provide a robust for experimentation in convex optimization. From a practical point of view we expect that the library will find use in the computation of equilibria in thermodynamics, in biology for understanding the evolution of coding sequences, in material sciences; these application require robust volume computations of convex bodies.
Students should have a solid background in C++, algorithms and linear algebra. Knowledge of computational geometry, optimization or statistical computing in R will be a plus.
Students, please do one or more of the following tests before contacting the mentors above.
- compile and run VolEsti.
- Solve this issue https://github.com/GeomScale/volume_approximation/issues/8
- Use the R extension to visualize sampling in a polytope.
- implement the Dikin walk http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.153.7868&rep=rep1&type=pdf
- extent hit-and-run to sample from the boundary of the polytope.
- add support to VolEsti to compute volumes for lower dimensional polytopes, e.g., a segment in the 3-dimensional space (for projects 1,2)
- implement the optimization algorithm from http://www.optimization-online.org/DB_FILE/2008/12/2161.pdf (for project 3)