CSC 371 – SPRING 2017 DATA MANAGEMENT AND VISUALIZATION ASSIGNMENT 3

UNIVERSITY OF VICTORIA
Due: Thursday, March 9th, 2017 before 11:59pm. Late assignments will not be accepted.
This assignment will be submitted electronically through conneX (as described in ‘Submission Instructions’ below). Do not submit a hard copy of your answers; pa- per submissions will not be marked. All code submissions must be your own work. However, you are permitted to use the code discussed in lectures (or code posted on conneX) as the basis for your submissions if proper attribution is given.
This assignment involves writing queries to analyze the BC Ferries departure and arrival data. Four datasets are posted: a3_7days.sql, a3_14days.sql, a3_21days.sql and a3_28days.sql. Your queries are expected to work correctly on all of the datasets, and to be substantially correct on any other datasets with the same schema. Note that the schema of the previously posted BC Ferries datasets is different from the schema used for the assignment data: specifically, the assignment data uses timestamps for departures and arrivals, rather than a collection of integer columns.
Create a single submission a3_queries.sql containing each of the queries below. Do not include any of the CREATE or INSERT statements for the data in your submission. You may include .read, .mode or .header statements in your submission if you want, but it is not necessary. Please put comments in your submission to identify each query.
In the queries below, the ‘duration’ of a sailing is assumed to be the time between the scheduled departure and the arrival (so if the ferry leaves 15 minutes late and arrives one hour later, the duration is 75 minutes).
You are encouraged to use CREATE VIEW statements to organize your subqueries. In particular, you may want to use the following view, which adds a column called ‘duration’ (containing the duration in minutes) to the columns of the sailings table.
create view sailings_and_durations as
select *,
(strftime(‘%s’,arrival)-strftime(‘%s’,scheduled_departure))/60
as duration
from sailings;
1. [2 marks] Route 1 is the Swartz Bay/Tsawwassen crossing. Construct a query to find the average duration (in minutes) for each vessel which served route 1. The result should have two columns (vessel name and average time) and one row for each vessel serving route 1 in the dataset.
2. [4 marks] Construct a query to find, for each vessel in the fleet, the total number of routes it has served. The result should be ranked in descending order by the number of routes (vessels with more routes should appear first). The result should have two columns (vessel name and
1
number of routes) and a row for each vessel in the fleet (even vessels which have not been associated with any routes).
3. [4 marks] Construct a query to list the names of all vessels that have ever visited any port (either as source or destination) which was (at any time) used by the vessel called ‘Queen of New Westminster’. The result should have only one column (vessel name) and be free of duplicates. The Queen of New Westminster itself should appear in the result.
4. [4 marks] For many routes, there are simultaneous sailings in both directions at the same time. For example, on a typical day at 9:00am, a ferry leaves Tsawwassen for Swartz Bay and a different ferry leaves Swartz Bay for Tsawwassen. Not all routes or sailings have this property. We will define two vessels as ‘paired up’ if they have both served the same route number at the same departure time/date. Construct a query to count the number of times each distinct pair of ferries have been paired up (your result does not need to contain counts for vessels which were never paired up). Each distinct pair of ferries should appear only once in your result. The result should contain three columns (the name of one vessel, the name of the other vessel, and the number of sailings when they’ve been paired).
5. [2 marks] The routes table contains the ‘nominal duration’ of each route, which is the expected crossing time. The nominal duration is determined by BC Ferries based on the average marine and traffic conditions, along with information like loading times and the speed of the vessels. We can test the accuracy of this calculation by computing the average time of each crossing. Construct a query to find, for each route number, the nominal duration and the average duration of a crossing based on all available data for that route. The result should have three columns (route number, nominal duration and average computed duration).
6. [4 marks] Suppose we define a sailing to be ‘late’ if the duration is at least five minutes longer1 than the nominal duration in the routes table. Construct a query to find, for each route number, all month/day pairs in which there was at least 1 late sailing. The result should have three columns (route number, month and day) and should contain a row (route, month, day) if there was at least one late sailing for that route on that month and day.
Hint: To get the month or day number (respectively) from a timestamp column in SQLite, use notation like strftime(‘%m’, column_name) or strftime(‘%d’, column_name).
7. [Bonus: 4 marks (optional)] Construct a query to find, for each vessel with any sailings, the total number of sailings it has made, the number of late sailings it has made and the average number of its sailings that were late (that is, the number of late sailings divided by the total number of sailings). Note that you should be careful with this query: a vessel may be involved in multiple routes, each with a different nominal duration.
8. [Thought Question] (not for marks): Ferry travellers that get stuck on an old ferry (like the Queen of New Westminster) might be quick to blame the ferry’s age for the lateness of their sailing. The data in the assignment dataset contains enough information on ferry’s ages and on-time performance to investigate this correlation. Can you adapt the last query to determine if the most late ferries are also the oldest ones?
1. A duration which is exactly five minutes longer is still considered late
2
Submission Instructions
All submissions for this assignment will be accepted electronically. You are permitted to delete and resubmit your assignment as many times as you want before the due date, but no submissions or resubmissions will be accepted after the due date has passed.
Ensure that each file contains a comment with your name and student number, and that the files for each question are named as dictated by the question. If you do not name your files correctly, or if you do not submit them electronically, it will not be possible to mark your submission and you will receive a mark of zero.
After submitting your assignment, conneX will automatically send you a confirmation email. If you do not receive such an email, your submission was not received. If you have problems with the submission process, send an email to the instructor before the due date.

Leave a Reply

Your email address will not be published. Required fields are marked *