, April 19-20, 2018
In this hands-on workshop, participants will learn to build relational databases to support data pipelines for complex scientific projects using the open-source DataJoint framework. A data pipeline is a sequence of steps (or, more generally, a directed acyclic graph) with integrated data storage and computations at each step. DataJoint pipelines can be created and used from either MATLAB or Python. The workshop will be taught in Python with brief discussions of the MATLAB equivalent. Although DataJoint is a general framework, illustrating examples and worked problems will be based on common neuroscience experiments and recording modalities. We invite scientists who are involved in acquisition, processing, and analysis of data in neuroscience and related fields, particularly those that are part of collaborative projects. Basic knowledge of Python is required for solving exercise problems.
For questions please contact Dimitri Yatsenko.
Session 1. 9.00 – 10.20 am “Background and Motivation”Challenges and solutions for data pipelines in neuroscience and beyond. Hosting locally and in cloud. Data models and database architectures.
Session 2. 10.30 – 11.50 am “Defining and querying a simple pipeline.”
Defining a data pipeline from scratch in Python. Definition of nodes and their dependencies. Data entry. Basic queries.
Session 3. 12.10 pm – 1.30 pm “Automated computations.”
Configuring and executing automated computations. Default and custom key source. Data integrity through transactions. Distributed computations.
Session 4. 1.40 pm – 3.00 pm “Design patterns.”
How dependencies (foreign keys) express and enforce entity relationships. Forming complex relationships through foreign key expressions. Plotting and interpreting entity diagrams.
Session 5. 3.10 pm – 4.30 pm “Precise queries.”
Combining relational operators to derive new results. Work in groups to generate queries to solve given problems.
Session 6. 9.00 am – 10.20 am “Case study Part 1.”
Work in groups to make sense of a given pipeline and query and plot results.
Session 7. 10.30 am – 11.50 am “Case study Part 2.”
Work in groups to extend an existing pipeline with new computations.
Session 8. 12.10 pm – 1.30 pm “Utilities.”
Environment sharing, interfaces, integration.
Session 9. 1.40 pm – 3.00 pm “Roadmap.”
Future developments and available resources.
- April 19, 2018 - April 20, 2018
9:00 am - 3:00 pm