CSC 370 – SQL Golf
mysql | 代写quiz | assignment | IT | 代做sql | database – 这道题目是利用sql进行的编程代写任务, 包括了mysql/IT/sql/database等方面, 这个项目是assignment代写的代写题目
assignment Goals
In this assignment you will:
demonstrate data analysis skills with SQL
o write sql queries with increasing complexity to extract
desired information from a relational database
o optimise SQL queries towards simplicity
Task
Code Golf is a sort of recreational programming activity in which one
tries to implement functionality using as few characters as possible. The
general goal is to think of alternative solutions to a problem and it
derives its name from the sport of golf , in which one tries to minimise
the number of whacks with an iron shaft to put a tiny ball in a far-flung
hole. In this assignment, you will try not just to write SQL queries that
are correct, but that also are "minimal."
As an example, imagine that you have two
relations: Employee(employee_id, employee_name,
dept_id) and Department(dept_id, dept_name) and you would like to find
the names of all employees in a department named "Shipping &
Receiving". A simple solution would be:
SELECT employee_name
FROM Employee
NATURAL JOIN Department
WHERE dept_name
LIKE ‘Shipping & Receiving’;
Certainly, another "correct" solution would be:
SELECT employee_name
FROM Employee
WHERE dept_id
IN ( SELECT dept_id
FROM Department
WHERE dept_name
LIKE ‘Shipping & Receiving’ );
Both queries retrieve the same result, but the second query is
unnecessarily complex, or at the very least non-idiomatic. I hope that
you prefer the first solution. Even if not, this assignment is designed to
encourage you to write the first query by rewarding you inversely to
the number of times any of the following tokens appears in your SQL
query:
* SELECT (i.e., projection operator)
* FROM (i.e., the table- or index-scan operator)
* , (i.e., the cross product operator, including other appearances such as
in a SELECT clause )
* JOIN (i.e., a theta-, natural, or outer join or per mysql an
intersection)
* UNION (i.e., the bag union operator)
* DISTINCT (i.e., the duplicate elimination operator)
* GROUP (i.e., the group-by operator)
* ORDER (i.e., the sort operator)
* HAVING (i.e., the selection operator applied to groups)
* WHERE (i.e., the selection operator applied to tuples)
* LIM IT (i.e., the MySQL top-k operator)
This gives us a metric by which to claim the first query is better: it only
uses 4 instances of the above set of operators (SELECT, FROM, JOIN,
and WHERE), whereas the second query uses 6 instances (2SELECT,
2FROM, 2WHERE). This is the metric that you should aim to
minimise with the SQL queries that you submit. You would receive
more marks for the first query than the second one.
It is important to remember that this is an exercise in code
simplification and creative thinking, not in performance optimisation.
Although you are trying to minimise the number of operator references,
SQL is a declarative language and there is no specific reason to assume
that the first example query will run faster than the second one.
However, simple and idiomatic code is easier for compilers to optimise,
so there could be tangential performance benefits to striving for
simpleror at least shorterqueries. The real intent here is to leverage an
assumed correlation between this "golf score" metric and the quality of a
SQL query to encourage you to write better SQL.
You are given instructions to create (optionally) a MySQL database.
Moreover, you are given twenty .sql files that are unfortunately
empty except for a comment indicating their intended query and their
mapping between "SQL Golf" scores (i.e., total instances of the
aforementioned operators/tokens) and grade. For example, the above
problem would be represented by the following example.sql file:
— Find the names of all employees in a department named "Shipping & Receiving" — 1.1 marks: <4 operators — 1.0 marks: <6 operators — 0.8 marks: correct answer
— Replace this comment line with the actual query
Alongside the .sql file will be a .tsv file showing the expected result,
which you can use for testing.
Submission
You should submit all of the .sql files without renaming them, but
after replacing the final comment line with an actual SQL query that
achieves the stated objective. Ordinarily, you should submit
twenty .sql files, though it is okay to submit fewer files if you do not
have a solution for all twenty tests.
Evaluation
Your grade on the assignment will be the sum of your scores on each
query, scaled up by a factor of 20/14. This could be in excess of 20 (i.e.,
full marks), particularly if you minimise your queries more effectively
than the teaching team has. However, to receive any marks on a
particular query, you must produce the correct result, including
attribute names. We will ascertain this by performing a diff between
the corresponding .tsv file and your query results on an up-to-date
MySQL instance prior to counting operators.
Above, the first query would score 1.0 marks and the second query
would score 0.8 marks. If you can answer the query with fewer
operator instances than the first query, you would score 1.1 marks. The
following query would obtain 0.0 marks, even though the number of
operators is small, because it does not produce the same result (namely,
it doesn’t filter by department):
SELECT employee_name
FROM Employee
;
Note that we have manipulated our testing dataset to avoid hard-coded
solutions. You should not make assumptions (e.g., of uniqueness or non-
nulls) that are not supported by the data model in the CREATE TABLE
statements above.
Dataset
For this assignment, we will use the data dump from one of the Stack
Exchanges: fittingly, the one for Code Golf. If you want to run the
queries before submitting them, you should follow the instrutions
below to create a local database.
First, download and extract the entire data dump from
here: https://archive.org/download/stackexchange/codegolf.stackexch
ange.com.7z.
To load the XML data into a MySQL database, you first need to create a
database with the same structure. Open an instance of MySQL and
execute the following DDL queries:
— Create and switch to a new database for this project
DROP DATABASE IF EXISTS code_golf
; CREATE DATABASE code_golf
; USE code_golf
;
— Construct tables into which the data will be imported — You may want to keep the order of these tables because of FK references CREATE TABLE User
( Id
INT , Reputation
INT , CreationDate
DATETIME , DisplayName
VARCHAR(60) , LastAccessDate
DATETIME , WebsiteUrl
VARCHAR(255) , Location
VARCHAR(120) , AboutMe
TEXT , Views
INT , Upvotes
INT , Downvotes
INT , AccountId
INT , PRIMARY KEY( Id
) );
CREATE TABLE Badge
( Id
INT
, UserId
INT , Name
VARCHAR(30) , Date
DATETIME , Class
SMALLINT , TagBased
ENUM(‘True’, ‘False’) , PRIMARY KEY( Id
) , FOREIGN KEY( UserId
) REFERENCES User
( Id
) );
CREATE TABLE Tag
( Id
INT , TagName
VARCHAR(30) , Count
INT , ExcerptPostId
INT , WikiPostId
INT , PRIMARY KEY( Id
) );
CREATE TABLE Post
( Id
INT , PostTypeId
INT , ParentId
INT , CreationDate
DATETIME , Score
INT , Body
LONGTEXT , OwnerUserId
INT , LastEditorUserId
INT , LastEditDate
DATETIME , LastActivityDate
DATETIME , CommentCount
INT , ContentLicense
VARCHAR(30) , PRIMARY KEY( Id
) , FOREIGN KEY( ParentId
) REFERENCES Post
( Id
) , FOREIGN KEY( OwnerUserId
) REFERENCES User
( Id
) , FOREIGN KEY( LastEditorUserId
) REFERENCES User
( Id
) );
CREATE TABLE Link
( Id
INT , PostId
INT , RelatedPostId
INT , CreationDate
DATETIME , LinkTypeId
INT , PRIMARY KEY( Id
) — , FOREIGN KEY( PostId
) REFERENCES Post
( Id
) — violated by actual data — , FOREIGN KEY( RelatedPostId
) REFERENCES Post
( Id
) — violated by actual data );
CREATE TABLE Vote
( Id
INT , PostId
INT , VoteTypeId
INT , CreationDate
DATETIME , PRIMARY KEY( Id
) — , FOREIGN KEY( PostId
) REFERENCES Post
( Id
) — violated by actual data );
CREATE TABLE Comment
( Id
INT , PostId
INT , Score
INT
, Text
TEXT , CreationDate
DATETIME , UserId
INT , ContentLicense
VARCHAR(30) , PRIMARY KEY( Id
) , FOREIGN KEY( PostId
) REFERENCES Post
( Id
) , FOREIGN KEY( UserId
) REFERENCES User
( Id
) );
Finally, you can load all the XML files directly into the database by
executing the following statements:
USE code_golf
;
— You will likely find it easiest to move the files to the following directory — and change ‘/var/lib/mysql-files/’ in the paths below to the result of this — SHOW VARIABLES query. — https://stackoverflow.com/q/32737478/ SHOW VARIABLES LIKE "secure_file_priv";
— Finally, load data into each of the tables — Again, ensure that you maintain this order — for the sake of referential integrity LOAD XML INFILE ‘/var/lib/mysql-files/Users.xml’ INTO TABLE User
ROWS IDENTIFIED BY ‘
LOAD XML INFILE ‘/var/lib/mysql-files/Badges.xml’ INTO TABLE Badge
ROWS IDENTIFIED BY ‘
LOAD XML INFILE ‘/var/lib/mysql-files/Tags.xml’ INTO TABLE Tag
ROWS IDENTIFIED BY ‘
LOAD XML INFILE ‘/var/lib/mysql-files/Posts.xml’ INTO TABLE Post
ROWS IDENTIFIED BY ‘
LOAD XML INFILE ‘/var/lib/mysql-files/Votes.xml’ INTO TABLE Vote
ROWS IDENTIFIED BY ‘
LOAD XML INFILE ‘/var/lib/mysql-files/Comments.xml’ INTO TABLE Comment
ROWS IDENTIFIED BY ‘
LOAD XML INFILE ‘/var/lib/mysql-files/PostLinks.xml’ INTO TABLE Link
ROWS IDENTIFIED BY ‘
Queries
The queries are available in two locations:
- You can access them on the public-facing GitHub repo for this
course’s assignments. The advantages to this source are that it is
the freshest (first place updates are pushed) and that you can
directly check out the code with git, which might be an easier
toolchain
- You can access them by downloading the compressed tarball (that