NeuroNex DataJoint Training Workshop

NeuroNex DataJoint Training Workshop, April 19-20, 2018


Building scientific data pipelines from Python and MATLAB 

In this hands-on workshop, participants will learn to build relational databases to support data pipelines for complex scientific projects using the open-source DataJoint framework. A data pipeline is a sequence of steps (or, more generally, a directed acyclic graph) with integrated data storage and computations at each step. DataJoint pipelines can be created and used from either MATLAB or Python. The workshop will be taught in Python with brief discussions of the MATLAB equivalent. Although DataJoint is a general framework, illustrating examples and worked problems will be based on common neuroscience experiments and recording modalities. We invite scientists who are involved in acquisition, processing, and analysis of data in neuroscience and related fields, particularly those that are part of collaborative projects. Basic knowledge of Python is required for solving exercise problems.

For questions please contact Dimitri Yatsenko.

Day 1
9.00 am – 10.00 am Background and motivation An overview of the challenges around building data pipelines in neuroscience and beyond. Hosting locally and in cloud. Data models and database architectures.

10.10 – 10.40 am Group setup Set up access to JupyterHub and workshop database for use during the workshop.

10.50 am – 11.50 am Getting started with DataJoint Defining a simple data pipeline from scratch in Python. Inserting data into the pipeline and performing basic queries to retrieve data.

11.50 am – 12.40 pm Lunch break

12.40 pm – 2.50 pm Defining computations in a DataJoint pipeline Adding computed tables to the existing pipeline, and automatic population. We will cover data integrity through transactions, distributed computations, and the use of Lookup tables to store computation/analysis parameters.

3.00 pm – 4.00 pm Design patterns and advanced queries Exploration of common design patterns and learn how dependencies (foreign keys) are used to express and enforce relationships and integrity. Visually exploring a pipeline by plotting the entity diagrams. Building more complex queries.

4.10 pm – 4.30 pm Day 1 recap Brief summary of what was covered, and an overview of Day 2 material. References to additional learning resources will be covered.

Day 2
9.00 am – 10.20 am
Case study Part 1: Pipeline design exercises Work with multiple real world experimental and analysis requirements to practice designing and building pipelines.

10.30 am – 11.50 am Case study Part 2: Extending pipelines Work in groups to explore, understand, and extend an existing pipeline with new computations.

11.50 am – 12.40 pm Lunch break

12.40 pm – 1.50 pm Best practices in scientific computations and data sharing Explore tools for sharing code and environments. Discuss best practices in designing and maintaining pipelines in a team of any size.

2.00 pm – 2.30 pm DataJoint roadmap Future developments and available resources.

2.40 – 3.00 pm Workshop recap and concluding remarks Brief summary of the workshop, and open discussion.

BioScience Research Collaborative, 280

  • April 19, 2018 - April 20, 2018
    9:00 am - 3:00 pm