代写mysql | 代做assignment | IT代写 | 代写sql | database代做 – Assignment 3: SQL Golf

Assignment 3: SQL Golf

代写mysql | 代做assignment | IT代写 | 代写sql | database代做 – 这个题目属于一个mysql的代写任务, 是有一定代表意义的mysql/IT/sql/database等代写方向, 这是值得参考的assignment代写的题目

数据库代写 代写数据库 database代做 sql代写

Hide  assignment Information
Instructions

Assignments Assignment 3: sql Golf

CSC 370 SQL Golf

Assignment Goals

In this assignment you will:

  • demonstrate data analysis skills with SQL write SQL queries with varied complexity to extract desired information from a relational database optimise SQL queries towards simplicity

Task

Code Golf is a sort of recreational programming activity in which one tries to implement functionality using as few characters as possible. The general goal is to think of alternative solutions to a problem and IT derives its name from the sport of golf, in which one tries to minimise the number of whacks with an iron shaft to put a tiny ball in a farflung hole. In this assignment, you will try not just to write SQL queries that are correct, but that also are "minimal."

As an example, imagine that you have two relations: Employee(employee_id, employee_name, dept_id) Department(dept_id, dept_name)

You would like to find the names of all employees in a department named "Shipping & Receiving". A simple solution would be:

SELECT employee_name FROM Employee NATURAL JOIN Department WHERE dept_name LIKE ‘Shipping & Receiving’;

Certainly, another "correct" solution would be:

SELECT employee_name FROM Employee WHERE dept_id IN ( SELECT dept_id FROM Department WHERE dept_name LIKE ‘Shipping & Receiving’ );

Both queries retrieve the same result, but the second query is unnecessarily complex, or at the very least nonidiomatic. I hope that you prefer the first solution. Even if not, this assignment is designed to encourage you to write the first query by rewarding you inversely to the number of times any of the following tokens appears in your SQL query:

  • SELECT (i.e., projection operator)
  • FROM (i.e., the table or indexscan operator)
  • , (i.e., the cross product operator, including other appearances such as in a SELECT clause)
  • JOIN (i.e., a theta, natural, or outer join or per mysql an intersection)
  • UNION (i.e., the bag union operator)
  • DISTINCT (i.e., the duplicate elimination operator)
  • GROUP (i.e., the groupby operator)
  • ORDER (i.e., the sort operator)
  • HAVING (i.e., the selection operator applied to groups)
  • WHERE (i.e., the selection operator applied to tuples)
  • LIMIT (i.e., the MySQL topk operator)

This gives us a metric by which to claim the first query is better: it only uses 4 instances of the above set of operators (SELECT, FROM, JOIN, and WHERE), whereas the second query uses 6 instances (2SELECT, 2FROM, 2WHERE). This is the metric that you should aim to minimise with the SQL queries that you submit. You would receive more marks for the first query than the second one.

It is important to remember that this is an exercise in code simplification and creative thinking, not in performance optimisation. Although you are trying to minimise the number of operator references, SQL is a declarative language and there is no specific reason to assume that the first example query will run faster than the second one. However, simple and idiomatic code is easier for compilers to optimise, so there could be tangential performance benefits to striving for simpleror at least shorter queries. The real intent here is to leverage an assumed correlation between this "golf score" metric and the quality of a SQL query to encourage you to write better SQL.

You are given instructions to create (optionally) a MySQL database or to connect to one on a remote server. Moreover, you are given fifteen .sql files that are unfortunately empty except for a comment indicating their intended query and their mapping between "SQL Golf" scores (i.e., total instances of the aforementioned operators/tokens) and grade. For example, the above problem would be represented by the following example.sql file:

— Find the names of all employees in a department named "Shipping & Receiving" — 1.1 marks: <4 operators — 1.0 marks: <6 operators — 0.8 marks: correct answer

— Replace this comment line with the actual query

Alongside the .sql file will be a .tsv file showing the expected result, which you can use for testing.

Submission

You should submit all fifteen of the .sql files without renaming them, but after replacing the final comment line with an actual SQL query that achieves the stated objective. Ordinarily, you should submit fifteen .sql files, though it is okay to submit fewer files if you do not have a solution for all fifteen tests.

Evaluation

Your grade on the assignment will be the sum of your scores on each query. This could be in excess of 15 (i.e., full marks), particularly if you minimise your queries more effectively than the teaching team has. However, to receive any marks on a

particular query, you must produce the correct result, including attribute names. We will ascertain this by performing a diff between the corresponding .tsv file and your query results on an uptodate MySQL instance prior to counting operators, similar to:

sudo mysql -u root counties < query01.sql > query01-your-solution.tsv diff query01-solution.tsv query01-your-solution.tsv

In the running example, the first query would score 1.0 marks and the second query would score 0.8 marks. If you can answer the query with fewer operator instances than the first query, you would score 1.1 marks. The following query would obtain 0.0 marks, even though the number of operators is small, because it does not produce the same result (namely, it doesn’t filter by department):

SELECT employee_name FROM Employee;

Note that we may modify our testing dataset to avoid hardcoded solutions, such as by changing the id values of or deleting tuples. You should not make assumptions (e.g., of uniqueness or nonnulls) that are not supported by the data model in the CREATE TABLE statements below. Moreover, if you create temporary files, you are responsible for monitoring their existence: there is no guarantee that we will run queries in numerical order.

Dataset

For this assignment, we will use a compilation of US countylevel census data that has been transformed from a documentoriented to a relational format for the purpose of this assignment. The data has been loaded into MySQL and exported into .sql format (counties.sql). You can import it into your local instance of MySQL from the command line as follows:

mysql -u [username] -p -e "CREATE DATABASE counties;" mysql -u [username] -p counties < counties.sql

(Or, you could simply copypaste the whole file into a query window.)

Finally, you can also access a readonly, prepopulated version of this database on port 3306 and server csc370db.csc.uvic.ca. The server is only available from within the firewall; so, you must first connect to the intranet via VPN or to linux.csc.uvic.ca via SSH.

Queries

The queries are available in two locations:

  • You can access them on the publicfacing GitHub repo for this course’s assignments. The advantages to this source are that it is the freshest (first place updates are pushed) and that you can directly check out the code with git, which might be an easier toolchain
  • You can access them by downloading the compressed tarball that is attached to this assignment description and unpack it locally.

Remember to edit and upload the .sql files and to use the .tsv files to check the correctness of your solutions.

Sources and Academic Integrity
Start Date

Jun 25, 2022 9:45 PM

Due Date

Jul 24, 2022 11:59 PM

Attachments
sql.tar.gz (954.46 KB)
Download All Files

Submit Assignment

Allowed File Extensions

sql

Files to submit

You are permitted to use sources that you find on the Internet, so long as the source is clearly dated with a last edit prior to 1January2022 and you provide a citation in your source code. For example, GitHub and StackOverflow content is permitted, so long as they are clearly dated prior to this year. If you do not include a citation in your source code, your work will be considered plagiarism.

You must otherwise complete the assignment independently, including the development of pseudocode. Submissions may be subjected to plagiarism detection software and evidence of collaboration will be reported as an Academic Integrity infringement. You are welcome to prepare for the assignment with peers in the class by working through the ungraded worksheets together, which are designed to prepare you well for this assignment.

Illness, Lateness, Technical Issues, and Personal
Circumstances

Submissions will be accepted until the end date of the assignment listed in Brightspace, which provides a three day buffer to address most challenges that are likely to arise. Note that support for the assignment will not be available after the deadline, however. Submissions will not be accepted after the end date; if you have not submitted code by then, whether by choice or circumstances, the weight for this assignment will be shifted to the corresponding midterm exam.

Summary

I hope that this assignment is a fun way to learn and/or practice the SQL query language. Good luck!

Change Log

[Tue 5 July 13:55] Several query descriptions were improved and query 9’s target result was corrected

Add a File Record Audio Record Video

(0) file(s) to submit

After uploading, you must click Submit to complete the submission.

Comments
Submit Cancel