DSB 2019

Data Structures in Bioinformatics

(Image by Dankersen, CC-by-SA 3.0)

Looking Forward to DSB 2020 ...

Thanks to everyone for excellent talks at DSB 2019!

DSB 2020 will take place in Rennes (France). Information will be sent out by email in due time.

Group photo

Welcome to DSB 2019 !

We are happy to announce the 5th workshop on "Data Structures in Bioinformatics" taking place

February 5th and 6th, 2019 (Tuesday + Wednesday)
in Dortmund, Ruhr Metropolitan Area, Germany
at the Dortmund U-Tower (Wikipedia article)

This meeting is supported by

Mercator Research Center Ruhr (MERCUR), grant Pe-2013-0012, UA Ruhr Professorship of Sven Rahmann
Netherlands Organisation for Scientific Research (NWO), Veni grant "Fully reproducible workflows scaling from workstations to the cloud" to Johannes Köster
Research group "Algorithmic Foundations and Education in Computer Science" headed by Johannes Fischer

The DSB workshop series is an incubator of ideas and facilitates exchange as well as collaborations on topics related to algorithms and data structures in bioinformatics. Speakers will present state of the art techniques and recent advances in this field. Traditionally, there is no peer review and there are no published proceedings at DSB. It is encouraged to talk about work in progress, partial results, open problem and even negative results.

In previous years, DSB took place in Helsinki (2018), Amsterdam (2017), Bielefeld (2015/16), Montpellier (2014), and Bielefeld (2013) (although it was not called "DSB" in 2013).

Dates 2019

any time before January 15: register and submit a talk (via EasyChair)
February 4 (Monday): arrival and evening welcome reception (no talks)
February 5 (Tuesday): first workshop day; dinner
February 6 (Wednesday): second workshop day; optional evening farewell reception for those who are still here

There is limited seating available in the U-Tower, so please submit your abstract and/or register early, ideally before the holiday season! Registration will be closed when the workshop is full.

After DSB, on February 7 and 8, there will probably be LSD & LAW in London. There are convenient flight options directly from Dortmund (DTM) to the London area.

Program

When	What	Who/Where
	Arrival day: Monday Feb 4
18:00	Welcome evening, Kumpel Erich (Kreuzstr. 87, 44137 Dortmund)	Pub "Kumpel Erich"

	Day 1: Tuesday Feb 5	U-Tower
9:00	Survey of k-mer data structures for querying large collections of sequencing datasets	Camille Marchet
9:30	Indexing de Bruijn graph with minimizers	Antoine Limasset
10:00	Coffee, Tea, Fruit
11:00	Alignment- and reference-free phylogenomics with colored de-Bruijn graphs	Roland Wittler
11:30	Pangenomic read mapping	Sandra Smit
12:00	Efficient querying of a pan-genome	Tizian Schulz
12:30	Lunch (Sandwiches, Soup)
13:30	Sequence to graph alignment	Mikko Rautiainen
14:00	Fully-sensitive Seed Finding in Sequence Graphs Using a Hybrid Index	Ali Ghaffaari
14:30	Tunneling on Wheeler graphs	Jarno Alanko
15:00	Coffee, Tea and Cake
15:30	Parallel decompression of gzip files	Maël Kerbiriou
16:00	Inducing the LCP array from the BWT in succinct space, with applications to reference-free SNP discovery	Nicola Prezza
16:30	Compact Bit-Sliced Signature Index (COBS)	Timo Bingmann
17:00	Time for Discussions / Sightseeing
19:00	Dinner

	Day 2: Wednesday Feb 6	U-Tower
9:00	Identifying Maximal Perfect Haplotype Blocks	Jens Stoye
9:30	CONSENT: Consensus-based Self-Correction of Third Generation Sequencing Data	Pierre Morisse
10:00	Coffee, Tea, Snacks
11:00	Fast and accurate correction of optical map data with spaced (el,k)-mers	Leena Salmela
11:30	Signal-based optical map alignment	Mehmet Akdel
12:00	Improved Representation of Sequence Bloom Trees	Paul Medvedev
12:30	Lunch (Sandwiches, Soup)
13:30	Lower and upper bounds for the Shortest Superstring	Eric Rivals
14:00	On the Complexity of Exact Pattern Matching in Labeled Graphs	Massimo Equi
14:30	Border dynamic RMQ and Maximum Segmentation problems in Linear Time in Column Stream Model	Bastien Cazaux
15:00	Coffee, Tea and Cake
	Farewell

(Updated: 03-Feb-2019. This program is final, but times and order of talks may change spontaneously during the meeting if desired.)

Abstracts

of accepted talks

Identifying Maximal Perfect Haplotype Blocks (Jens Stoye)

The concept of maximal perfect haplotype blocks is introduced as a simple pattern allowing to identify genomic regions that show signatures of natural selection. The model is formally defined and a simple algorithm is presented to find all perfect haplotype blocks in a set of phased chromosome sequences. Application to three whole chromosomes from the 1000 genomes project phase 3 data set shows the potential of the concept as an effective approach for quick detection of selection in large sets of thousands of genomes.

This is joint work with Yoan Diekmann, University College London.

Sequence to graph alignment (Miko Rautiainen)

Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan-genome and hence the genetic variation present in a population. Being able to align sequencing reads to such graphs is a key step for many analyses and its applications include genome assembly, read error correction, and variant calling with respect to a variation graph.

We present an overview of our recent work in aligning sequences to graphs. Compared to linear sequence-to-sequence alignment, the graph structure introduces new difficulties. Our work includes theoretical insights on how to generalize banded alignment and Myers' bitvector parallelism to arbitrarily shaped node-labeled directed graphs, and an implementation which uses these insights in practice. Our implementation inputs commonly used file formats; GFA or vg for graphs and fasta or fastq for reads.

This is joint work with Veli Mäkinen and Tobias Marschall.

Alignment- and reference-free phylogenomics with colored de-Bruijn graphs (Roland Wittler)

The traditional way to analyze phylogenetic relationships is (i) identifying marker genes, (ii) computing alignments of the corresponding orthologs to obtain pairwise distances, and (iii) reconstructing a tree that fits these distances.

Problems in this approach, especially in the analysis of a huge number of genomes, are (i) selecting suitable marker genes and solving gene-tree species-tree reconciliation issues, (ii) computing a quadratic number of pairwise alignments in reasonable time, and (iii) squeezing the data into a tree although the given information, i.e., parts of the phylogeny or parts of the sequences, might contradict a strict representation as a unique tree.

Methods exist, which avoid the first two problems by comparing to a reference sequence and/or considering SNPs. To be less biased and less limited, we present a new approach that is (i) whole-genome based and reference-free, (ii) does not rely on alignments or any other pair-wise comparison, and (iii) reconstructs phylogenetic networks instead of trees. We build a colored de-Bruijn graph from all genomes (finished genomes, assemblies or read data) and extract common paths as phylogenetic splits.

Parallel decompression of gzip files (Mael Kerbiriou, Rayan Chikhi)

Decompressing a file created by the gzip program at an arbitrary location is in principle impossible, due to the nature of the DEFLATE compression algorithm. Consequently, there is no known technique for parallel decompression of large files (the 'pigz' program is only parallel for the compression step). This is an unsatisfactory bottleneck, especially for the analysis of large sequencing data experiments. In this talk we will present two on-going experiments: (i) parallel decompression of arbitrary gzip files and (ii) heuristic random access to sequences within a gzip-compressed FASTQ file. We show a technique for decompression that is an order of magnitude faster than 'gunzip', and 5x-8x faster than a highly-optimized sequential implementation.

Lower and upper bounds for the Shortest Superstring (Eric Rivals)

Given a set of words, the Shortest Linear Superstring (SLS) problem is an optimisation problem that asks for a superstring of of minimal length. SLS has applications in data compression, where a superstring is a compact representation of , and in bioinformatics where it models the first step of genome assembly. Unfortunately SLS is hard to solve (NP-hard) and to closely approximate (MAX-SNP-hard). If numerous polynomial time approximation algorithms have been devised, few articles report on their practical performance. We lack knowledge about how closely an approximate superstring can be from an optimal one in practice. Here, we exhibit a linear time algorithm that reports an upper and a lower bound on the length of an optimal superstring. The upper bound is the length of an approximate superstring. This algorithm can be used to evaluate beforehand whether one can get an approximate superstring whose length is close to the optimum for a given instance. Experimental results suggest that its approximation performance is orders of magnitude better than previously reported practical values. Moreover, the proposed algorithm remainso efficient even on large instances and can serve to explore in practice the approximability of SLS.

Fully-sensitive Seed Finding in Sequence Graphs Using a Hybrid Index (Ali Ghaffaari)

Sequence graphs are versatile data structures that are, for instance, able to represent the genetic variation found in a population and facilitate genome assembly. Read mapping to sequence graphs constitutes an important step for many applications and is usually done by finding exact seed matches in the graphs by using a path index. Existing methods for constructing a path index need to prune the graph in complex regions leading to loss of information especially in highly polymorphic regions of the genome. In addition, present methods suffer from poor locality of memory reference, entailing a reduced efficiency for large graphs. We present a fully-sensitive hybrid method for seed finding, with better locality of memory references. This enables our method to find all available seeds in the graph while eliminating the need to prune the graph. We demonstrate the performance of our method with different parameter settings on both simulated data set and on a whole human genome graph constructed from variants in the 1000 Genome Project data set.

Fast and accurate correction of optical map data with spaced (el,k)-mers (Leena Salmela)

Optical mapping is a system that produces high throughput genomic restriction map data. Raw optical mapping data, called Rmaps, are sequences of sizes of fragments between restriction cut sites. Rmaps are notoriously noisy. Sometimes the restriction enzyme misses a cut site causing two consecutive fragments to merge into one. The DNA molecule can also break on its own causing a fragment to split into two. Fragment sizes include sizing error when the size of a fragment is incorrectly estimated. Analysis of optical mapping data is challenging due to the errors. Thus it is attractive to correct the errors in the data prior further processing.

Previously one method, COMET, has been proposed for correcting optical mapping data. COMET indexes k-mers, i.e. sequences of k fragments, that occur in the Rmaps to find related Rmaps. The related Rmaps are then aligned to correct the data. COMET has a long runtime and the recall of the k-mer index for finding related Rmaps is rather low because the high error rate induces errors in most k-mers.

Instead of k-mers, we propose to use (el,k)-mers which are sequences of at least k fragments where the total length of the fragments is at least el. We further introduce spaced (el,k)-mers which are defined by minimum number of fragments k, the minimum total fragment length el and a spacing pattern which defines gaps within the (el,k)-mer. Any restriction cut site that falls within a gap is ignored when forming the spaced (el,k)-mer. Our experimental results show that spaced (el,k)-mer index has superior performance for retrieving related Rmaps as compared to the k-mer index. Furthermore our preliminary results on error correcting Rmaps show that a method based on the spaced (el,k)-mer index is both more accurate and faster than COMET.

This is joint work with Kingshuk Mukherjee, Christina Boucher, Simon J. Puglisi, and Martin Muggli.

Tunneling on Wheller graphs (Jarno Alanko)

The Burrows-Wheeler Transform (BWT) is an important technique both in data compression and in the design of compact indexing data structures. It has been generalized from single strings to collections of strings and some classes of labeled directed graphs, such as tries and de Bruijn graphs. The BWTs of repetitive datasets are often compressible using run-length compression, but recently Baier (CPM 2018) described how they could be even further compressed using an idea he called tunneling. In this paper we show that tunneled BWTs can still be used for indexing and extend tunneling to the BWTs of Wheeler graphs, a framework that includes all the generalizations mentioned above.

On the Complexity of Exact Pattern Matching in Labeled Graphs (Massimo Equi)

Exact pattern matching in labeled graphs is the problem of locating subpaths of a graph G=(V,E) that spell the same string as the pattern string P[1..m]. We give a conditional lower bound that, for any constant \epsilon>0, an O(|E|^{1 - \epsilon} m)-time or an O(|E| m^{1 - \epsilon})-time algorithm for exact pattern matching on graphs, with node labels and patterns drawn from a binary alphabet, cannot be achieved unless the Strong Exponential Time Hypothesis (SETH) fails. The result holds even if restricted to undirected graphs of maximum degree three, directed acyclic graphs of maximum sum of indegree and outdegree three and, most importantly, for deterministic graphs. As approximate pattern matching on graphs can be solved in O(|E| m) time, exact and approximate matching are thus equally hard (quadratic time) on graphs under the SETH assumption. In comparison, the same problems restricted to strings have linear time vs quadratic time solutions, respectively, where the latter ones have a matching SETH lower bound on computing the edit distance of two strings (Backurs and Indyk, STOC'15).

This is joint work with Roberto Grossi, Veli Mäkinen and Alexandru Tomescu.

Pangenomic read mapping (Sandra Smit)

In recent years, large genome-sequencing projects have led to a wealth of genomic resources, typically several annotated reference genomes and resequencing data for many strains or accessions. These projects have illustrated that there is significant genetic variation within a species, which is poorly represented by a single reference genome. To incorporate this variation in genomic analyses and analyze multiple genomes simultaneously, the field of pangenomics is emerging.

To enable the efficient analysis of large sets of genomes, we developed PanTools. Here, multiple annotated genomes are represented in a single pangenome data structure, which is stored in a Neo4j graph database. However, comparative genomics requires more than just the sequences. Therefore, our solution includes annotations (from genes to phenotypes) and cross-links to connect the genomes and their annotated features.

Here we present pangenomic read mapping. Using the pangenomic data structure, we can efficiently map a set of genomic reads to all genomes in the pangenome, eliminating redundant mappings. We demonstrate speed and accuracy by a comparison to conventional read mappers. Finally, we show some applications for regular or competitive read mapping. This essential application brings pangenomes a step closer to replacing linear reference genomes as central entity in comparative genomics.

This is joint work with Siavash Sheikhizadeh Anari, Eric Schranz, Dick de Ridder

Inducing the LCP array from the BWT in succinct space, with applications to reference-free SNP discovery (Nicola Prezza)

In this talk I will present a new algorithm for computing the generalized LCP array of a string collection from its Burrows-Wheeler transform. The new algorithm uses negligible space on top of the input and output, and is very fast also in practice. I will show how this strategy can be integrated in a reference-free procedure for finding SNPs/indels between two DNA reads-sets by analyzing their Burrows-Wheeler transforms. Our tool uses just 1 Byte per nucleotide in RAM, processes the data with a throughput of 1 microsecond per nucleotide, and outperforms the sensitivity and precision of existing tools based on de Bruijn graphs.

Joint work with Nadia Pisanti, Giovanna Rosone, and Marinella Sciortino.

Efficient querying of a pan-genome (Tizian Schulz)

One of the most common tasks in computational genomics is the comparison of biological sequences, e.g. DNA sequences, to answer biological questions of any kind. Due to the lack of such sequences in former times, analyses have often been limited to inter-species comparisons on the gene level. Nowadays, modern high-throughput sequencing technologies provide a continuously growing wealth of DNA sequences that allows the genome-wide comparison of thousands of individual sequences of the same species or any other taxonomic unit. Such a set of highly redundant and similar sequences is called a pan-genome. Pan-genomes can be stored as sequence graphs which significantly reduces memory requirements and facilitates the comparison of the sequences involved. On the other hand, the analysis of a graph appears to be much more challenging than the comparison of a set of strings. In this talk, a new method is introduced that tries to query a pan-genome represented as a sequence graph in spirit of the famous alignment search algorithm BLAST.

This is joint work with Jens Stoye and Faraz Hach.

Signal-based optical map alignment (Mehmet Akdel)

Optical mapping provides long-range contiguity information to improve genome sequence assemblies and detect structural variation. Originally a laborious manual process, Bionano Genomics platforms now offer high-throughput, automated optical mapping based on chips packed with nanochannels through which unwound DNA is guided, recording the fluorescent DNA backbone and specific restriction sites. Although the raw image data obtained is of high quality, the processing and assembly software accompanying the platforms is closed source and seemingly inefficient, labeling approximately half of the measured signals as unusable.

We introduce three tools, completely independent of BNG software, to extract and process molecules from raw images (OptiScan), to align the detected molecules using a novel signal-based cross-correlation approach (OptiMap) and finally to assemble these molecules de novo (OptiAsm) using an overlap-layout-consensus approach. We find that the molecules detected by OptiScan yield better assemblies and that the approach taken by OptiMap-OptiAsm results in high-quality initial assemblies. Our approach lays the foundation for a suite of open source methods to process and analyze high-throughput optical mapping data.

Compact Bit-Sliced Signature Index (COBS) (Timo Bingmann)

We present ongoing work on our Compact Bit-Sliced Signature Index (COBS), a new implementation of a q-gram Bloom Filter index in C++ with significantly improved space and time requirements. The index allows approximate membership queries on terabytes of DNA documents to be answered in only a few seconds. The trade-off for speed are false positive results in the output set, which we keep small by adaptively sizing the Bloom Filters in the index. We also present our future plans on how to improve this data structure further.

Joint work with Florian Gauger, Simon Gog, and Peter Sanders.

Survey of k-mer data structures for querying large collections of sequencing datasets (Camille Marchet)

The storage, compression and indexing of collections of next-generation sequencing datasets is a fundamental computational problem with many important bioinformatics applications. A natural way to perform those tasks is to use full-text indexes such as the FM-index. However, full-text indexing requires significant computational resources and so far its applications have been limited. In particular, they do not allow the querying of thousands of sequencing experiments by sequence. In order to enable sequence searches on large databases, data structures improvements were made in pioneer works such as Sequence Bloom Trees (SBT, Solomon et al. 2015). Such works rely on the key idea of storing k-mer set of sets instead of full-text. K-mers sets of sets indexation is linked to the colored De Bruijn Graph intuition that many k-mers can be shared by a majority of screened datasets. One can notice that, since many k-mers are shared across datasets, the representation of sets of sets is not just the concatenation of representation of sets. Redundant information should be factorized and stored efficiently, such as initiated in VARI (Muggli et al. 2017).

This field is recently fast-expanding, and includes a range of contributions that goes from theoretical works to more applied research. In this talk we propose a survey of state-of-the-art methods for indexing and querying k-mers set of sets. We present two key features: the storage space of data structures, and the types of queries they support (e.g. search for a given k-mer within many datasets, enumeration of k-mers, dynamic insertions and deletion, etc.). When possible we draw underlying methodological links between them and we propose some common schemes, as an attempt to represent a map of current k-mer set of sets indexation methods.

This is joint work with Rayan Chikhi and Mikaël Salson.

Indexing de Bruijn graph with minimizers (Antoine Limasset)

A simple but fundamental need when dealing with genomic sequences is to be able to index very large sets of fixed length words (k-mers) and to associate information to these sequences (origin, abundance, strand, score etc. . . ). As trivial as this need may seem, computationally challenging instances are extremely common in Metagenomic, pangenomic and even for the study of single large genomes, where sets of dozens or hundreds billions k-mers need to be processed. We propose a novel data structure able to both test the membership and associate information to the k-mers of a De Bruijn graph in a very efficient and exact way. This data structure can be seen as an improvement of Pufferfish (Fatemeh Almodaresi, Hirak Sarkar, Avi Srivastava, and Rob Patro. A space and time-efficient index for the compacted colored de bruijn graph. Bioinformatics 2018.) We wrote a proof of concept dubbed Blight available at https://github.com/Malfoy/Blight to assess the performances of our proposed scheme. We were able to index all the k-mers of a human genome with less than 8GB of memory (≈ 24 bits per k-mer). Our proposed index is also designed to provide extremely fast and parallel operations, able to perform billions queries in minutes.

Third generation sequencing technologies such as Pacific Biosciences and Oxford Nanopore allow the sequencing of long reads of tens of kbs, that are expected to solve various problems, especially in the genome assembly field. However, they also reach high error rates of 10 to 30%, and thus require efficient error correction. As first long reads sequencing experiments produced reads displaying error rates higher than 15% on average, most methods relied on the complementary use of short reads data to perform correction, in a hybrid approach. However, these sequencing technologies evolve fast, and now manage to produce long reads displaying error rates around 10-12%. As a result, correcting the long reads solely based on the information they contain, in a self-correction approach, is now an efficient alternative. Various self-correction methods were thus recently developed, but most of them either face scalability issues or require deep long reads coverage.

We introduce CONSENT, a new method for the self-correction of long reads that combines different strategies from the state-of-the-art. CONSENT starts by overlapping the long reads via a fast mapping approach, thus allowing to avoid resource consuming computation of precise alignments between all the long reads. This way, only matched regions of the long reads need to be properly aligned in order to compute a consensus sequence. These alignments are performed using a multiple sequence alignment strategy based on partial order graphs. This multiple alignment strategy also benefits from an efficient heuristic, based on k-mers chaining, allowing to achieve both high quality and fast computation time. Finally, once the consensus of a given long read region has been computed, it is polished with the use of a local De Bruijn graph, in order to further reduce the error rate.

Our experiments show that CONSENT compares well to, or even outperforms, the latest state-of-the-art self-correction methods, in terms of runtime, throughput, and quality of the correction. We also show that CONSENT scales well to large genomes and novel Oxford Nanopore ultra-long reads, by presenting correction results on a human dataset containing such reads, reaching lengths up to 340kbp.

This is joint work with Camille Marchet, Antoine Limasset, Pierre Peterlongo, Arnaud Lefebre, and Thierry Lecroq.

Improved Representation of Sequence Bloom Trees (Paul Medvedev)

Algorithmic solutions to index and search biological databases are a fundamental part of bioinformatics, providing underlying components to many end-user tools. Inexpensive next generation sequencing has filled publicly available databases such as the Sequence Read Archive beyond the capacity of traditional indexing methods. Recently, the Sequence Bloom Tree (SBT) and its derivatives were proposed as a way to efficiently index such data for queries about transcript presence. We build on the SBT framework to construct the HowDe-SBT data structure, which uses a novel partitioning of information to reduce the construction and query time as well as the size of the index. We evaluate HowDe-SBT by both proving theoretical bounds on its performance and using real RNA-seq data. Compared to previous SBT methods, HowDe-SBT can construct the index in less than half the time, and with less than 32% of the space, and can answer small-batch queries nine times faster.

This is joint work with Robert S. Harris, and the full paper is available on bioRxiv.

Border dynamic RMQ and Maximum Segmentation problems in Linear Time in Column Stream Model (Bastien Cazaux)

The compression of a set of strings is a well-known and well-studied problem. In this article, we present a way to compress linked to a biological problem about founder sequences. The idea is to modify the alphabet of the set of strings by partitioning this set of strings in common blocks. The Maximum Segmentation problems are, given a threshold $M$ and a set $\mathcal{R} = [\mathcal{R}_1, \dots, \mathcal{R}_m ]$ of strings of the same length $n$, to find a shortest partition $P$ where for each segment $[i,j] \in P$, the number $| [ \mathcal{R}_k[i,j]: 1\leq k \leq m ]| $ is bounded from above by $M$. We give an optimal $O(mn)$ time algorithm to solve the problem for two different measures: the average length of the intervals of the partition and the length of the minimal interval of the partition. We can exploit these algorithms to find a representative set of founders that can be indexed for variant calling and read alignment.

This is joint work with Tuukka Norri, Dmitry Kosolobov and Veli Mäkinen.

Participants

(so far)

Name	Affiliation	Role
Mehmet Akdel	Wageningen University	Speaker
Jarno Alanko	University of Helsinki	Speaker
Hideo Bannai	Kyushu University	Participant
Timo Bingmann	KIT Karlsruhe	Speaker
Bastien Cazaux	University of Helsinki	Speaker
Rayan Chikhi	Institut Pasteur, Paris	Participant
Daniel Dörr	Bielefeld University	Participant
Johannes Dröge	Chalmers University	Participant
Jonas Ellert	TU Dortmund	Participant
Massimo Equi	University of Helsinki	Speaker
Johannes Fischer	TU Dortmund	Organizer
Jan R. Forster	University of Duisburg-Essen	Participant
Ali Ghaffaari	MPII Saarbrücken	Speaker
Till Hartmann	University of Duisburg-Essen	Participant
Hezha Hassan	Bielefeld University	Participant
Shunsuke Inenaga	Kyushu University	Participant
Johannes Köster	University of Duisburg-Essen	Organizer
Maël Kerbiriou	INRIA Lille Nord Europe	Speaker
Florian Kurpicz	TU Dortmund	Partcipant
Elias Kuthe	University of Duisburg-Essen	Participant
David Lähnemann	University of Duisburg-Essen	Participant
Thierry Lecroq	Université de Rouen Normandie	Participant
Téo Lemane	INRIA Rennes	Participant
Brice Letcher	EMBL EBI Hinxton	Patcicipant
Antoine Limasset	Université Lille	Speaker
Stephan Majda	University of Duisburg-Essen	Participant
Veli Mäkinen	University of Helsinki	Participant
Camille Marchet	CRIStAL Lille	Speaker
Tobias Marschall	University of the Saarland	Participant
Paul Medvedev	Pennsylvania State University	Speaker
Felix Mölder	University of Duisburg-Essen	Participant
Pierre Morisse	Université de Rouen Normandie	Speaker
Pierre Peterlongo	INRIA Rennes	Participant
Nicola Prezza	University of Pisa	Speaker
Simon Puglisi	University of Helsinki	Participant
Sven Rahmann	University of Duisburg-Essen	Organizer
Mikko Rautiainen	MPI for Informatics, Saarbrücken	Speaker
Dick de Ridder	Wageningen University	Participant
Eric Rivals	CNRS; LIRMM Montpellier	Speaker
Leena Salmela	University of Helsinki	Speaker
Peter Sanders	KIT Karlsruhe	Participant
Tizian Schulz	Bielefeld University	Speaker
Kévin Da Silva	INRIA Rennes	Participant
Sandra Smit	Wageningen University	Speaker
Jens Stoye	Bielefeld University	Speaker
Henning Timm	University of Duisburg-Essen	Participant
Uriel Elias Wiebelitz	TU Dortmund	Participant
Roland Wittler	Bielefeld University	Speaker
Jens Zentgraf	TU Dortmund	Participant

Travel and Venue

How to find it

The workshop will take place at the U-Tower in Dortmund city center, close to the main train station ("Dortmund Hbf"). Dortmund is a major train hub and a stop of the high-speed ICE trains, as well as many local trains. There is a small local airport (code DTM) with good connections to the London area and Eastern Europe. An express bus connects the city and main station with the airport. Otherwise, the major international airport of Düsseldorf (DUS) is a short train ride away.

We recommend to book a hotel or other accomodation close to the venue and/or city center, for example:

Novum Hotel Unique, very close to the venue
A&O Hostel Dortmund Hbf, budget hostel, still within walking distance
Boutique 102 Dortmund City, central middle class hotel

We ask you to use the hotels' websites directly to book your accomodation, or via the usual booking sites.

Please note that we cannot cover or re-imburse your travel or accomodation costs.

Please be aware that, while major credit cards are frequently accepted, especially smaller amounts are usually paid in cash (Euros).

Local Map

Local map

Call for Contributions

and Participation

How to participate

To participate (without giving a talk), please use the EasyChair website, pretend to submit an abstract, but write "None" into title and abstract field, and indicate "I only want to register and participate" by choosing the corresponding radio button.

How to contribute a talk

Please go to the workshop's EasyChair page, create an EasyChair account or log in with your existing account and submit a title and short abstract of your talk (one author: yourself, text only, no PDF papers). Additionally, please indicate in the choice field that you want to give a talk. Please only indicate yourself as an author in the author fields (so we don't get confused with registrations), and mention your co-authors in the abstract ("This is joint work with ..."). Thank you!

If the workshop is heavily over-subscribed, the organizers will make a selection among submitted abstracts to ensure a program that is diverse both in terms of topics and presenters.

Imprint

Disclaimers and Data Privacy

This website, https://dsb2019.gitlab.io, is an informational website of the "Data Structures in Bioinformatics" workshop in 2019. The website is published by the organizers, namely Johannes Fischer (TU Dortmund), Johannes Köster (University of Duisburg-Essen), and Sven Rahmann (University of Duisburg-Essen).

Disclaimer

Although the organizers take all possible care to ensure the correctness of published information, no warranty can be accepted regarding the correctness, accuracy, uptodateness, reliability and completeness of the content of this information.

The organizers expressly reserve the right to change, to delete or temporarily not to publish the contents wholly or partly at any time and without giving notice.

Liability claims against the organizers or their affiliated institutions, TU Dortmund or University of Duisburg-Essen, because of tangible or intangible damage arising from accessing, using or not using the published information, through misuse of the connection or as a result of technical breakdowns are excluded.

The organizers have not checked third party web sites, i.e. web sites not located on their servers or in their area of influence, that may be connected to this web site via links, and do not accept any responsibility for the contents or the services offered thereon.

Data Privacy

This website is informational only and does not ask for personal information. However, it as well as its hoster gitlab.io may use external scripts and cookies.

Contact

In case of questions or comments about the website, please contact us.

dsb2019@easychair.org