Welcome to BigSDM2018!
Invited Speakers



Cees Snoek

Real-Time Video Surveillance Search

Abstract:Video surveillance has become a crucial tool for securing airports, train stations, shopping malls, and many other types of meeting places for large audiences. While large amounts of cameras have been installed and even bigger amounts of video data are being generated, the interpretation of these data is becoming an unsurmountable, error-prone and highly labor-intensive task. Automating the interpretation task through computers is the only way forward to survive this data deluge, but unfortunately it also is one of the hardest problems in today’s computer science both in terms of efficient hardware usage and artificial intelligence software. Although much progress has been made over the past decades with analyzing and classifying objects in still images and video, automatic and real-time interpretation of activities in realistic surveillance data is way beyond the current state-of-the-art. It is the reason why present-day video surveillance solutions still depend predominantly on manual inspection. The major research challenges in video surveillance are that activities of interest are rare, scenes are crowded, computing demands are humongous, and offloading to commercial label and compute services like YouTube, Amazon and Facebook is impossible due to privacy constraints. In this talk I will highlight recent computer vision by learning work that studies these research challenges.

Short Bio:Cees G.M. Snoek is a full professor in computer science at the University of Amsterdam, where he heads the Intelligent Sensory Information Systems Lab. He is also a director of the QUVA Lab, the joint research lab of Qualcomm and the University of Amsterdam on deep learning and computer vision. He received the M.Sc. degree in business information systems and the Ph.D. degree in computer science both from the University of Amsterdam, The Netherlands. He was previously an assistant and associate professor at the University of Amsterdam, as well as Visiting Scientist at Carnegie Mellon University, Fulbright Junior Scholar at UC Berkeley, head of R&D at University spin-off Euvision Technologies and managing principal engineer at Qualcomm Research Europe. His research interests focus on video and image recognition. Cees is recipient of an NWO Veni award, a Fulbright Junior Scholarship, an NWO Vidi award, and the Netherlands Prize for ICT Research. All for research excellence.



Di Li

FAST and the Dawn of EB Astronomy

Abstract: Radio telescopes sample the phase and amplitude of EM waves in time. The raw data will then be down-sampled in size by about 2-3 orders of magnitude. Even after the down-sampling, it is becoming common-place to accumulate ~PB data. The increasing computational capability and the never-ending major discoveries, such as the fast radio burst (FRB), have both made feasible and necessitate an era of big-data astronomy. Currently under commissioning, the Five-hundred-meter Aperture Spherical radio Telescope (FAST) is a Chinese mega-science facility, currently under commissioning. We have designed an unprecedented survey mode, namely, the Commensal Radio Astronomy FAST Survey (CRAFTS), aspiring to fully utilized the world-leading sensitivity provided by FAST and to discover ~1000 new pulsars, hundreds of thousands of galaxies, to obtain a >10 trillion voxel image of atomic hydrogen, and to explore the unknown unknowns in the universe. The data rate from CRAFTS is expected to be 10-30 PB per year. Such a survey is just one of many current efforts in radio astronomy. The era of EB astronomy is dawning on us.”

Short Bio:Dr. Li is a radio astronomer, currently the chief scientist of the Five-hundred-meter Aperture Spherical radio Telescope (FAST). He pioneered several observing techniques, including HI narrow self-absorption technique and a new inversion algorithm for solving the dust temperature distribution. These techniques facilitated important measurements of star forming regions, such as their formation time scale. His published more than 100 international journal papers. He won the National Research Council (US) Resident Research Fellow award (2005) for his "outstanding ability as a result of national competition”, the NASA outstanding team award as a member (2009), the Chinese Academy of Sciences Distinguished Achievement Team Award (key member, 2017). He is now leading the science preparation efforts of the Five-hundred-meter Aperture Spherical radio Telescope (FAST). He served on the Steering Committee of Australia Telescope National Facility (ATNF), was a co-chair of the “Cradle of Life” science working group (SWG) of SKA, is a member of the Chinese Academy of Sciences Major-facilities Guidance Group and an adviser to the Breakthrough Listen initiative.



Peter Fox

Open-World, Integrative, Transparent, Collaborative Research Data Platforms: addressing the life-cycle of large scale scientific data collections.

Abstract:As collaborative, or network science spreads into more science, engineering and medical fields, both the participants and their funders have expressed a very strong desire for highly functional data and information capabilities that are a) easy to use, b) integrated in a variety of ways, c) leverage prior investments and keep pace with rapid technical change, and d) are not expensive or time-consuming to build or maintain. Inherently, many phases of the research and data life cycle are active in collaborative settings. In response we have adapted, extended, and integrated several open source applications and frameworks that handle major portions of needed functionality for these platforms. At minimum, these functions include: an object-type repository, collaboration tools, an ability to identify and manage all key entities in the platform, and an integrated portal to manage diverse content and applications, with varied access levels and privacy options, and traceable provenance.

In this presentation, methods and results for information modeling, adapting, integrating and evolving a networked data science and information architecture based on several open source technologies (e.g. Drupal, VIVO, the Comprehensive Knowledge Archive Network; CKAN, and the Global Handle System; GHS) and many semantic technologies are discussed in the context of the Deep Carbon Observatory. The conclusion includes thoughts on how the smart mediation among system architecture components has been modeled and managed, and its general applicability and efficacy for Big-Data Scientific Data Management.

Short Bio:Peter Fox is Tetherless World Constellation Chair and Professor of Earth and Environmental Science, Computer Science and Cognitive Science, and Director of the Information Technology and Web Science Program at Rensselaer Polytechnic Institute. He holds B.Sc. (Hons) and Ph.D. in Applied Mathematics from Monash University. He spent 17 years at the High Altitude Observatory of the National Center for Atmospheric Research as Chief Computational Scientist. Fox's research specializes in the fields of data science and analytics, ocean and environmental informatics, computational logic, semantic Web, cognitive bias, semantic data frameworks, and solar and solar-terrestrial physics.. This research utilizes state-of-the-art modeling techniques, internet-based technologies, including the semantic web, and applies them to large-scale distributed scientific repositories addressing the full life-cycle of data and information within specific science and engineering disciplines as well as among disciplines. Fox is currently PI for the Integrated Ecosystem Assessment, the Deep Carbon Observatory Data Science, Global Change Information System, The Marine Biodiversity Virtual Laboratory, and the GeoData projects and is co-PI on 4 others. Fox has spent over 30 years bridging science and distributed data and information systems to support community activities utilizing use case driven design. Fox is past-President of the Earth Science Information Partners, was chair of the International Union of Geodesy and Geophysics Union Commission on Data and Information from 2007-2015, and is past chair and co-founder of the AGU Special Focus Group on Earth and Space Science Informatics. Fox is an associate editor for the Earth Science Informatics journal, is a member of the editorial boards for Computers in Geosciences, Geoscience Data Journal and Nature's Scientific Data Journal. Fox served on the International Council for Science's Strategic Coordinating Committee for Information and Data. Fox was awarded the 2012 Martha Maiden Lifetime Achievement Award for service to the Earth Science Information community, and the 2012 European Geoscience Union, Ian McHarg/ Earth and Space Science Informatics Medal. In 2015, Fox was elected as the first Earth and Space Science Informatics fellow to the American Geophysical Union.



Ricardo Jimenez-Peris

LeanXcale: A database combining operational data with analytical processing

Abstract:This talk will first introduce the spectrum of database managers that exists in todays technology landscape describing the different capabilities of databases on the different extremes of this spectrum. This spectrum covers from pure operational databases with full ACID transactional guarantees to analytical databases or data warehouses with pure analytical capabilities.

Then, the main issues of why databases are either operational or analytical will be identified.

The talk will continue by characterizing what is an HTAP (Hybrid Transactional Analytical Processing) database that aims at combining both capabilities on a single database manager.

The rest of the talk will be devoted at how LeanXcale solved these specific challenges to deliver an HTAP database.

Short Bio:Dr. Ricardo Jimenez-Peris is CEO and Co-Founder of LeanXcale. He is an expert on scalable data management and transactional processing, co-author of a book on scalable database replication, 100+ papers at international conferences and journals, and co-inventor of several patents. He has been invited speaker at the headquarters of Facebook, Twitter, Oracle, Salesforce and Microsoft among others. He is member of the expert group advising the European Commission on Cloud Computing.



Romulo Gonçalves

E-Science technology for remote sensing data exploration

Abstract:Earth observation has a new boost with high resolution monitoring programs and missions (e.g. Copernicus program and the up-coming Landsat 9) that offer new opportunities for land cover, vegetation monitoring and phenology studies. At the same time, this data deluge brings new computational challenges that limit Scientists’ search space, especially when working at Continental scales.

Often the approach is to use a single product, such as a Vegetation Index (VI), and one set of predefined parameters to derive a new product for the entire search space. This one size fits all approach deviates the researcher from the truth, and thus barely exploit the real potential value of these high-resolution data sets.

High resolution is not always a sign of more information. Since it is gathered by machines with finite precision, it means that the information contained in the data is limited by noise level. Besides noise, RS data sets often contain values that vary slowly, and neighboring values are not entirely independent of one another, neither in space nor in time. In these cases, there is a high level of auto-correlation which means the data can be reduced in size without losing much relevant information.

During this talk the audience will hear about our findings and approaches to deal with these challenges when exploring Remote Sensing data.

Short Bio:Romulo obtained a Systems and Computer Science Engineering Academic Degree at University of Minho, Braga, Portugal. He received a PhD degree from the University of Amsterdam entitled “The DataCyclotron: Juggling data and queries for a data warehouse audience”. The research was conducted at CWI Database group, where he also worked in several other projects, e.g. SkyServer, DataCell and Recycler. After his PhD, Romulo joined IBM Almaden for his post-doctoral research. At IBM he worked on Big SQL 3.0, an SQL on Hadoop offering that leverages IBM’s state-of-the-art relational database technology.

In 2014 Romulo joined the Netherlands eScience Center (NLeSC) as core eScience engineer responsible for data management and databases. He works on strategic collaborations and alliances to research, design and implement data management solutions for spatiotemporal analysis on large banks of remote sensing data. Since June 1, 2018 he is the Technology Lead for Data Management. He is responsible to foresee which technology directions on optimized data handling NLeSC should follow to enable Space, Earth and Life Sciences short- and long- term research agendas.



Samarth Swarup

Sense-making for Large-scale Social Simulations

Abstract: The availability of large amounts of data about socio-technical systems has opened up the possibility of building large-scale, highly detailed simulations that can be used to aid in planning and decision-making. These Big Simulations put Big Data into motion, allowing us to investigate courses of action. In this talk, I will discuss two main sense-making challenges to this approach. First, while unprecedented amounts of information are being gathered about social systems, it isn't necessarily the right information for the problems that face us. Developing the appropriate simulation models therefore requires synthesizing the needed information by combining multiple sources of data. We call this synthetic information. Second, large-scale simulations generate even more data than goes into them. New methods are needed in order to understand the results of such simulations to facilitate decision-making and planning. We call this simulation analytics. I will discuss approaches to both these aspects of the design and analysis of large-scale social simulations, including several examples.



Stefano Cozzini

NFFA-EUROPE data repository: how to build a data infrastructure for large scale scientific European infrastructure

Short Bio:S.C.,PhD in Physics from University of Padova, is a development scientist at CNR/IOM with more than 20 years experience in the area of scientific computing and HPC/Data e-infrastructures. He is currently coordinating the Data management research activities of two European Projects (NFFA-EUROPE and EUSMI). He is actively involved in of the Master in High Performance Computing promoted by SISSA and ICTP (www.mhpc.it) and in the recently launched master degree on “data science and scientific computing” master at University of Trieste. He has considerable experience in leading HPC and data infrastructure projects at national and international level. He served as scientific consultant for International Organization: Unesco -ICTP (2003-2012) and UNDP/UNOPS (2011-2012) At the end of 2011 S.C. co-founded his own spin-off company of CNR/IOM institute eXact lab srl (www.exact-lab.it). The company, qualified as Innovative start-up, provides advanced computation services by means of HPC and CLOUD infrastructure.



Ying Zhang

Finding pitfalls in query performance

Abstract:Despite their popularity, database benchmarks only highlight a small part of the capabilities of any given system. They do not necessarily highlight problematic components encountered in real life or provide hints for further research and engineering.

In this talk we introduce SQALPEL, a platform for "discriminative performance benchmarking”. SQALPEL aids in exploring a larger search space to find performance outliers and their underlying causes. This approach is based on deriving a domain specific language from a sample query to identify a query workload. SQALPEL subsequently explores the space using query morphing, and simulated annealing to find performance outliers, and the query components responsible. To speed-up the exploration for often time-consuming experiments SQALPEL has been designed to run asynchronously on a large cluster of machines.