Log In

G-2394

Development of analytical software system to discover hazardous sources using data science methods

Project Status: 3 Approved without Funding
Duration in months: 24 months

Objective

The Project aim. To minimize and manage potentially hazardous sources, the information about them have to be discovered and organized. This sources can be natural, dumped old stock, or unregistered materials reused in production containing heavy metals, biological or radioactive elements; chemicals, poisons, other harmful substances left from used household or manufacturing materials; from various medical or related devices, prescriptions, or other ways.
One of ways to obtain this information is through patient medical records that can contain different forms of intoxication (specified or indeterminate), high use of radiological diagnostic or treatment, etc.
Within the framework of this project it is planned to construct analytical system based on data science methods. Using collected data and classification algorithms it will analyze these data with the purpose to identify potentially harmful sources and related vulnerabilities. Among data science methods machine learning is most suitable here, as it can uncover facts that are not limited by initial assumptions.

Current status. We had consultations with specialists from many fields – reanimatology, radiology, oncology, bio-chemistry, and bio-physics. They indicated that data management of hazardous sources in Georgia is important and non-trivial at the same time, and there is a risk of unknown source existences. In addition, it is also important to have centralized registry of all kinds of radiological manipulations on patients. Reanimatologists and anesthesiologist positively valued automatic classification mechanisms especially for patients with central nervous system disorders or with unknown type of intoxication.

According to EU directives and recommendations of the IAEA, implementation of centralized patient ID’s are planned in Georgia, which can be one of the means registering cases of radiological diagnostics or treatment. The project is still in the preliminary stage. Regardless, our system will have ability to interface with this future database. Georgia has numerous small medical clinics. Their variety is considered a leverage for data collection because their location is decentralized. Besides capital we’ll use data from clinics in Ajara, Kakheti, Imereti, Samegrelo, Guria, Svaneti, Racha regions. In addition, medical tourism is popular in Georgia – patients with cancer or non-typical diseases often travel to Turkey, Israel, EU countries. Machine learning classifiers can be especially effective for data collected from all these patients.

Scientific publications regard systems based on machine learning as a future in the medical research. Another hot topic is patient data globalization to get more knowledge out of them. (See reference in “Supporting Information” section).

The project’s influence on progress in this area. When data analysis using machine learning methods reveals classes of relevant individuals, we’ll find distinguishing attributes (such as place of birth, occupation, etc.) that can point us towards potential objects or places containing hazardous materials, which can be compiled into a list and for further appropriate actions.

Initially the data will be collected from the medical specialists – project participants. At the first stage only test data will be entered but it will be based on real facts with all identifiable attributes stripped. This practice will avoid legal issues about handling personal information.

Even though our system is focused to identify possible sources of danger, it will also help to manage patient medical records. Unknown patient with missing data records can still be identified as one of the group member using collected data and one of the classifier algorithms. For example, it is hard to quickly determine what caused poisoning. In emergency situations machine learning classifier could assign patient to appropriate group and help with treatment choice.

Expected results and their application. As a result of project realization prototype software system will be created, which will enable the aggregation of data within the medical field and obtain from it direct and indirect information related to the threats. Consequently, the current parameters of hazard sources will be identified, as well as dangerous objects, locations, substances, which are a harmful for the population. Also, getting those material into wrong hands and building so-called "dirty bomb” could be averted. The system will also have means to help diagnose patients in somatic state with less invasive methods. In addition, it will have the ability to use a particular patient’s information in another potential patient’s diagnostic or medical treatment.

Scope of activities. In the process of project planning and tasks tracking, software systems development life cycle (SDLC) standard and object-oriented projecting approach will be used. During the project will be implemented the following activities:


- System’s diagnostics and analisys and formation of software related general requirements;
- Software projercting and modeling on the base of software engineering standards;
- Software prototype development;
- Data collection, aggregation and storage;
- Research and adjustment of machine learning algorithms for data analysis purposes;
- Software testing and assesment.

Participating Institutions

LEADING

Georgian Technical University (GTU)

COLLABORATOR

University of Debrecen

COLLABORATOR

Université Libre de Bruxelles (ULB)