代写Web|Project代做|Css|Python作业|代做Html|Lab代写|代写Html5 – 这是一个综合web、python、html等的综合的web代写任务
Home Page Optometrist
Objectives
You will learn how to: Create server-side web applications with Python and Flask. Scrape web pages to extract content, including images, using LXML. Detect faces using OpenCV. Programmatically edit images using Pillow and/or OpenCV.
Overview
You will create a web application that puts glasses on any profile photos found in another person’s home page. Before After l Here’s how it works:
- User visits your web application and enters the URL of someone’s home page.
- Fetch the html for that web page and find the image that is most likely to be the face.
- Draw eyeglasses over the person’s eyes.
- Serve a copy of the original home page to the user, with glasses added to the profile photo.
Milestones
You have from Sun 11/11 to Sat 12/8 (last official day of the semester). To keep you on track, there will be three milestones, including the final turnin. (Milestone numbers do not correspond to the steps above.) Tue11/20 Milestone 1: Fetch a home page and display it to the user as is. Sat 12/1 Milestone 2: Find the profile photo and draw glasses on the face. Sat 12/8 Milestone 3: Display the home page with the modified face.
Scoring
Unlike previous assignments in this course, projects will be mostly hand-graded. Scores will be calculated based on Milestone 1 (20%), Milestone 2 (20%), Completeness (30%), and Quality/Stability (30%), Milestones 1 and 2 are there primarily to help you make progress. Feedback will be given only at the end. Completeness means finishing all parts of the project. Quality/Stability includes smooth operation, portability, usability, html5 compliance, and code quality. Projects are subject to the base requirements in the syllabus.
Building the project
Milestone 1: Fetch a home page and display it to the user as is.
Before you begin writing code, read the section on Security below. If you run into trouble, don’t hesitate to post on Piazza. This project is new. Create a web application using Flask.
- Create a project directory called hpo in your home directory. (mkdir ~/hpo)
- In your project directory, create three subdirectories: templates , static , data , and adhoc . For this stage, you will only use the templates directory.
- In your templates directory, create a file called root.html with the code below. This is the boilerplate for a new HTML5 document. < html lang="en"> < head > < meta charset="utf-8"> < title >Home Page Optometrist</ title > </ head > < body > < h1 >Home Page Optometrist</ h1 > < p >TODO</ p > </ body > </ html > You do not need to give attribution for this snippet because it is generic HTML5 boilerplate, with nothing added other than the name of this assignment. (Update: lang="en" was added to silence a validator warning. There will be no penalty for those who didn’t catch that for Milestone 1.) An HTML document starts with a tag that indicates what version of HTML is being used. In this case, we use which indicates HTML5. The rest of the document comes inside an tag. Most tags must be closed with another tag that has the name, prefixed with a slash (e.g., ). Inside the tag, there are two sections, and . The tag contains links to some supporting files, such as css style files. It also contains metadata and tags that direct the browser on how to interpret the rest. The
specifies what title the browser should show in the browser title (and tab title). The tag specifies that if there are any non-ASCII characters in this document, they will be encoded using the UTF-8 encoding. This is similar to how we open files in Python. The tag contains the visible content of the page. It may sometimes contain other things, such as links to JavaScript, that allow for scripting on the page, but mostly is for the visible content.
is a first-level heading. It will normally show in large font. is a container for a paragraph of other text. It will normally have a little margin above and below it.
- Create a Python file called main.py in your project directory (i.e., ~/hpo/main.py) with the following code. You may copy this code. As with any code you use with permission, you must give credit. A
####### format is listed under Code Reuse and in the syllabus for this course.
#!/usr/local/bin/python3.
import flask
app = flask.Flask(__name__)
@app.route('/')
def root_page():
return flask.render_template('root.html')
if __name__ == '__main__':
app.run(host="127.0.0.1", port=int(os.environ.get("ECE364_HTTP_PORT", 8000 )),
use_reloader=True, use_evalex=False, debug=True, use_debugger=False)
# Each student has their own port, which is set in an environment variable.
# When not on ecegrid, the port defaults to 8000. Do not change the host,
# use_evalex, and use_debugger parameters. They are required for security.
#
# Credit: Alex Quinn. Used with permission. Preceding line only.
- Find your port. Type echo $ECE364_HTTP_PORT in bash. In this document, we will assume your port is
####### 12345. Whenever you see 12345 , please substitute your own port.
- If you are connected to ecegrid via SSH, you need an SSH tunnel. In PuTTY settings, go to
####### Connection SSH Tunnels. Source port: 12345 Destination: localhost: 12345
- Test your web application. python3 main.py
####### Open your web browser and go to http://localhost: 12345 You should see a simple message.
- Add an input form to your template. An HTML form looks like this
####### The method should be "GET". That means any inputs entered by the user will be passed to the server
as part of the URL. For now, leave the action blank.
- Inside your
there will be some variation. We are intentionally leaving a few details underspecified to get you used to thinking for yourself. Any reasonable approach will be given full credit. You may share ideas for social network sites and addresses with friends if you wish. 11.Add code to your view_page() function to fetch the requested page. Use urllib.request to fetch the URL. Read the relevant part of the urllib.request documentation and use
####### the method under Adding HTTP headers . Set the User-Agent header to the following string (exactly),
####### substituting your Purdue login for USERNAME . This is so anyone who finds us in their logs can see what
we're up to and contact us in case of any concerns.
####### PurdueUniversityClassProject/1.0 ( USERNAME @purdue.edu https://goo.gl/dk8u5S)
####### You may copy the code from the Adding HTTP headers section of the documentation. That code is
explicitly licensed under the Python Software Foundation License, which allows reuse. However, you must give credit with the following comment (exactly).
Credit: Adapted from example in Python 3.4 Documentation, urllib.request
License: PSFL https://www.python.org/download/releases/3.4.1/license/
https://docs.python.org/3.4/library/urllib.request.html
Be clear what part of your code is derived from the borrowed code. At this point, your function should have a variable with the HTML (as a string) found at the given URL. 12.Try testing your application. For now, clicking Submit won’t do anything. That is because the action attribute is not filled in. Go ahead and try it anyway. It may be instructive to witness this. 13.Go back to your root.html and fill in the action. In theory, you could fill in "view/" since that is the relative URL of the endpoint that it should submit to. However, when developing web applications, you should always avoid duplicating strings, such as relative URLs. It is better to refer to something in the code. In Flask, this is done using the url_for(…) function. Calling url_for("view_page") returns "view/", the default URL to which the view_page(…) function is mapped. We need to dynamically insert that into the template, when your program runs. Flask uses a template language called Jinja for its HTML templates. This allows you to fill in information at runtime (i.e., when your endpoint function is called). To fill in a value or the result of a function call, you use {{ }} (where is a Python expression). In your template, change action="" to action="{{ url_for("view_page") }}". (We are giving you the code, but you should understand what it does. 14.Try testing your application again. From your root page, enter the URL to any home page. (You can use mine, if you wish.) It still won’t look right, yet, but it should be close. The problem is that any supporting files (images, CSS style files, JavaScript, etc.) that the HTML document refers to using relative URLs will now be pointed to your site. For example, an image in HTML might be specified like . The image filename ("headshot.jpg") is actually a relative URL. When the page loads from a person’s home page (e.g., http://example.com/~js/), it expands and requests the image from http://example.com/~js/headshot.jpg.
15.Modify your view_page() function so that it loads the supporting files correctly.
The easiest way (to think about) is to expand relative URLs to absolute URLs (i.e., including the full
domain name). You could do this manually. Alternatively, the lxml library has a method for this, if you
want to dig into the documentation.
A fancier way to do this is to add a <base> tag to the document. That tells the browser to expand
relative URLs using a different URL from where the page was loaded. For more information and an
example, see the documentation of the <base> tag.
16.Test your application again. For reasonably simple pages, it should display the same from your site as
it did from the original location.
17.Edit your root page template to ensure that it is understandable and nominally presentable. Write a
####### short explanation of what will happen. (1-2 sentences should suffice.) You may add styling if you wish,
but that is optional. (This is text the user will see, not a comment in your code.) This will be checked by a TA. There are no additional points for beauty per se, but it should be understandable to someone not familiar with the project, and it should not look ridiculous. For example, there should be no code fragments showing, text should be legible, and inputs should be labelled. Do not use any site templates, or libraries that modify an entire site’s appearance (e.g., Bootstrap). 18.Check the output of your root page with the W3C validator service to ensure that it outputs valid HTML markup. It essentially checks for syntax errors. Browsers will tolerate malformed HTML, but different browsers may interpret it differently. Validation ensures that your app looks the same to everyone. 19.To submit, add your hpo directory (svn add hpo) and commit it (svn commit hpo). There will be no feedback given for Milestone 1 and Milestone 2 in the manner done for the labs and prelabs, but we will be happy to give feedback in office hours. Final grading will be done primarily by hand, but may employ some automated checks (e.g., for code quality). Checklist Root page displays a form for entering a URL, with brief instructions. Root page HTML passes W3C validator as HTML5. Entering URL and clicking submit returns a page which looks the same as the original, but with the HTML served from your application. Follow base requirements, and code quality standards, as well as rules for code reuse, security, and ethics given in this document.
Milestone 2: Find the profile photo and display it to the user
You should have already read the section on Security. There are penalties for not following the guidelines. In this stage, you will get your application to level that can find the profile photo in a page, find the face within the photo, add glasses to the face, and display the image to the user.
- Write a helper function to create filenames for images, based on their URL.
Your function should be called make_filename(url, extension) and should return the SHA checksum of the given URL (in hex), concatenated with the given file extension.
make_filename(b’http://people.eecs.berkeley.edu/~graham/SLG1jpg.jpeg’, ‘jpg’) ‘e67b1add53d3e079020b6ded39efc175b6553251.jpg’
This will be used later, to create the filenames with with you will store profile images. This function should require only one line of code inside the function. While we’re here, let’s take a dive into checksums. The concept is widely used in web applications, distributed systems, system security, block chain (e.g., bitcoin), cryptography, databases, and beyond. Checksums SHA1 is a checksum function. In general, checksum functions take a string of bytes (e.g., contents of a file) and return a shorter string of bytes that (almost) uniquely identifies the input. If you modify the input, even just one byte, the checksum will change completely. Unlike compression (e.g., zip), with checksums, it is impossible to get the input from the checksum. Thus, it can be thought of as a one-way function. For a given checksum function, the output (checksum) will be the same length. For example, SHA1 always produces 160-bit (20-byte) checksums. The actual checksum is binary bytes, but we often display them in hex digits (0123456789abcdef). Recall from ECE 26400 that one hex digit can store 4 bits, and thus one byte (8 bits) requires two hex digits. In bash, the sha1sum command produces a SHA1 checksum for files or whatever you pass on stdin. $ sha1sum jellyfish.jpg e379d8f44c3b0c2ea476494481a84701f3cacc53 jellyfish.jpg $ echo -n "ECE 36400" | sha1sum a56554a16804b49e9b8867b52b2fcbc495c4b0b9 – In Python, we use the hashlib module to create checksums.
import hashlib hashlib.sha1(b"ECE 36400").hexdigest() ‘a56554a16804b49e9b8867b52b2fcbc495c4b0b9’
The hashlib functions require a byte string (b"…"). Passing a unicode string ("…") results in an error.
hashlib.sha1("ECE 36400").hexdigest() Traceback (most recent call last): File "
", line 1, in TypeError: Unicode-objects must be encoded before hashing
You can convert a unicode string (str) to a byte string (bytes) with s.encode(‘utf8’).
s = "abc" s.encode(‘utf8′) b’abc’
######## >>>
>>> s = "32F"
>>> s.encode('utf8')
b'\xe2\x89\x8832\xc2\xbd\xc2\xb0F'
>>>
Why bother?
As you should remember from the section on Security, strings from the user should never be used as
part of filenames. The URL to the home page is a string from the user. The image filename (in that
home page) is also considered a string from the user. There is always a chance a user could try to
trick your application into overwriting some crucial files by passing a malicious URL or a URL to a
page with a maliciously named image.
As a trivial example, imagine if the image filename were "../.bashrc" and the contents of the image
was actually a new .bashrc file with commands they wanted you to execute. Next time you logged
in, you would be execute whatever shell commands the attacker wanted you to execute. That
particular attack would most likely fail, but history is full of more sophisticated ploys of a similar nature.
Using a SHA1 checksum in hex format guarantees that only the characters "0123456789abcdef" can
be part of the filename. Other methods of creating a safe filename can be vulnerable to tricky attacks.
For security, you must store images using a filename that is guaranteed to be safe.
Using the checksum of the URL (nearly) guarantees that you will not conflict with an image file from
some other home page. The chance of two URLs (or other inputs) having the same SHA1 checksum
is less than 0.000000000000000000000000000000000000000000000007%. For high security
applications, even stronger checksum functions are available, with checksums of up to 512 bits (and
possibly more).
Note: You do not need to give attribution for the snippets in this box ("Checksums").
- Write a helper function to extract and save all images in the given home directory. Your function should be called fetch_images(etree) . It will search the given etree for images, fetch each one using urllib.urlopen(…) (with the special header, as before), and save each image into a temporary directory. You will build an OrderedDict that associates image filenames with the nodes in the ElementTree. There are a few requirements. Requirement #1: Save images using a filename generated using your make_filename(…). Requirement #2: It must be a context manager function. The with statement should return an OrderedDict where the keys are image filenames and the values are nodes in the ElementTree. (We will illustrate how to make a context manager function below.) Requirement #3: It must automatically cd to the temporary directory, and then cd back to the original location when the with statement exists. In addition, the temporary directory must be deleted. (We give some starter code below to make this easy.) Requirement #4: When creating temporary files, do not simply write them in your project directory. You need to create a separate directory for the temporary files, with a name that is guaranteed to be
unique. Python provides a function called tempfile.mkdtemp(…) which does this. You must use it, directly or indirectly. (The starter code we provide makes this easy.) When creating temporary directories, you must use the tempfile.mkdtemp(…) function (or a helper function that calls it). Usage: You will call your fetch_images(etree) like this. with fetch_images(etree) as filename_to_node: _# Do stuff with images inside the temp directory.
Back in original directory again._
pushd_temp_dir( …… ) helper function To allow you to focus on other aspects of the project, we are providing this helper function. Using pushd_temp_dir(…) is optional, but it should make your life even easier. You may copy or adapt the code below. Attribution in the required format is required. import os , sys , tempfile , shutil , contextlib @contextlib.contextmanager def pushd_temp_dir(base_dir=None, prefix="tmp.hpo."): _”’ Create a temporary directory starting with {prefix} within {base_dir} and cd to it. This is a context manager. That means it can—and must—be called using the with statement like this: with pushd_temp_dir(): …. # We are now in the temp directory
Back to original directory. Temp directory has been deleted.
After the with statement, the temp directory and its contents are deleted._
####### Putting the @contextlib.contextmanager decorator just above a function
makes it a context manager. It must be a generator function with one yield.
_- base_dir — the new temp directory will be created inside {base_dir}. This defaults to {main_dir}/data … where {main_dir} is the directory containing whatever .py file started the application (e.g., main.py).
- prefix —– prefix for the temp directory name. In case something happens that prevents ”’_ if base_dir is None: proj_dir = sys.path[ 0 ] # e.g., "/home/ecegridfs/a/ee364z15/hpo" main_dir = os.path.join(proj_dir, "data")
# e.g., "/home/ecegridfs/a/ee364z15/hpo/data" # Create temp directory temp_dir_path = tempfile.mkdtemp(prefix=prefix, dir=base_dir) try : start_dir = os.getcwd() # get current working directory os.chdir(temp_dir_path) # change to the new temp directory try : yield finally : # No matter what, change back to where you started. os.chdir(start_dir) finally : # No matter what, remove temp dir and contents. shutil.rmtree(temp_dir_path, ignore_errors=True) _# __________________________________________________________________________
EXAMPLE USAGE_
assert not os.path.exists("a.txt") with make_temp_dir(): with open("a.txt", "w", encoding="utf-8") as outfile: outfile.write("a a a") assert os.path.isfile("a.txt") print (os.path.abspath("a.txt")) with open("a.txt", "r", encoding="utf-8") as infile: print (infile.read()) assert not os.path.exists("a.txt") Here is a skeleton for your fetch_images(…). You may use this snippet (below) without attribution. @contextlib.contextmanager def fetch_images(etree): with pushd_temp_dir(): filename_to_node = collections.OrderedDict() # # Extract the image files into the current directory # yield filename_to_node This will involve traverse the HTML document using the methods you practiced last week. After saving each image, you will use code something like this: filename_to_node[filename] = node Tip: You should test this function in isolation. Do not attempt to do all of your testing via your web application. You will be wasting your time. (This was covered in lecture on 11/19.)
You cannot reliably infer the file extension from the URL. You need to use Content-Type, just as you
did in lab 12. (That was to get you ready for this.)
- Write a helper function that returns the dimensions of an image. It should be called get_image_info(filename) and should return a dict like this:
img_info = get_image_info(filename) print( img_info ) {"w":400, "h":600}
img_info["w"] and img_info["h"] are the dimensions of the entire image, in pixels. In this project, we will use OpenCV, a very popular library for computer vision. import cv To open a single image using OpenCV: img = cv2.imread(path) OpenCV can get the dimensions of the image. Use the OpenCV documentation to find out how. Note: cv2.imread() cannot read GIF files. Here are some suggestions for how you might deal with this: Convert them to another format (e.g., JPG) in fetch_images(…) and then deal with them as JPG thereafter. You can use the Pillow library (aka PIL) for that. Convert to JPG temporarily within your get_image_info(filename) so that OpenCV can open them, and then delete the temporary JPG file when you are done with it. It’s up to you. Just make sure your application can deal with profile photos, even if they are in GIF format. It is best to avoid keeping two copies of the same image (e.g., JPG and GIF of the same photo) around at the same time, but this is not a strict requirement.
- Add a comment in your code, within get_image_info(filename) with the URL of where you found how to get the width and height of the image. It should be in the official OpenCV documentation (i.e., not a Stackoverflow post).
- Modify your get_image_info(filename) helper function to return details about faces in the image. Your enhanced get_image_info(filename) should return a dict like this:
img_info = get_image_info(filename) print( img_info ) {"w":400, "h":600, "faces":[{"w":200, "h":200, "x":100, "y":50}, …]}
img_info["w"] and img_info["h"] are the same as before. img_info["faces"] should be a list of dictionaries, each representing one face in the image.
img_info["faces"] must be sorted by the size of the face. Thus, img_info["faces"][0] should be
the largest face in the image. To sort by face size, use the key=... parameter to sort(...). See the
Python documentation for sort(...).
If there are no faces found in the image, then img_info["faces"] will be [].
To find faces in an image, we will use OpenCV, a very popular library for computer vision.
import cv
The method we are using is called Haar cascades. First, you create a classifier object. It has methods
which can locate faces within an image.
Normally, to create a classifier from scratch, one would need a large collection of photos, along with
human-created data indicating the location and size of each face. That is called training data. Then,
the classifier generalizes from that data to find other faces in new images.
For this project, we are providing the underlying data for a pre-trained classifier. The data is here:
FACE_DATA_PATH = '/home/ecegridfs/a/ee364/site-packages/cv2/data/haarcascade_frontalface_default.xml'
Do not copy that file to your home directory. Your code should read it from that location when
running on ecegrid.
To create the classifier:
face_cascade = cv2.CascadeClassifier(FACE_DATA_PATH)
Before you can find the faces using your classifier, you will need to convert the image to grayscale.
You do not need to save it to disk. This is done in memory. To create a grayscale version of the
image:
= cv.cvtColor(img, cv.COLOR_BGR2GRAY)
Finally, fetch the face information.
faces = face_cascade.detectMultiScale(img_grayscale, 1.3, 5)
Each face will be as a 4-tuple of (x, y, w, h).
Test and make sure the results are reasonable. Test in isolation, not as a web app.
- Write a helper function to find the image and node that is most likely to contain the main profile photo. It should be called find_profile_photo_filename(filename_to_etree) , and should return the filename of the image most likely to be the main profile photo. You have some flexibility on how you determine which is the profile photo. Be reasonable.
- Write a helper function to copy the most likely profile photo to the static directory within your application. It will be called copy_profile_photo_to_static(etree) and will return the absolute path to that file. To get the path to your static directory, use the method illustrated in the pushd_temp_dir(…) function. You may also use methods from the flask API. Do not hard code the absolute path.
####### This helper function will use (i.e., call) the helper functions you created above.
Reminder: Test in isolation, without running as a web server. Exactly how you do this is up to you.
- Modify your "/view" endpoint function so that it displays the most likely profile photo. When someone enters a URL and clicks submit from your root page, it use the helper functions you created above to copy the profile image to your static directory. Then, it should redirect to a static URL that displays only the profile image. To get a static URL: static_url = flask.url_for(‘static’, filename=) Note: The filename that you pass to flask.url_for(‘static’, filename=) should be a base filename, not an absolute path. To get the base filename, use os.path.basename(abs_path). To redirect from within your endpoint function: return flask.redirect(url) Note: This time, instead of seeing a web page, the user will see only an image. This is temporary, just for Milestone 2.
- Submit using the same method used for Milestone 1.
Milestone 3: Add glasses to the face and display in the modified page.
More details may be added. This is intended to be much lighter than the first two milestones.
- Write a function that modifies an image to add glasses to the face (if any). It should be called add_glasses(filename, face_info) where face_info is a dict like {"w":, "h":, "x":, "y":}. To find the eyes, as best as this classifier is able, you can use an eye cascade classifier. EYE_DATA = "/home/ecegridfs/a/ee364/site-packages/cv2/data/haarcascade_eye.xml" Create the classifier in the same was as you did for faces.
- Modify your "/view" endpoint to display the entire home page, with the modified image substituted for the original image.
- Submit using the same method used for Milestone 1.
Bonus
You may receive up to 10 bonus points for extensions to the project. Each bonus point counts as 1% toward your final grade. For example, if you had an average of 50% on the prelabs but received the maximum bonus on the project, then you would recoup half the loss from the prelabs. We are providing a few examples, but we strongly encourage your proposals for bonus extensions.
Proposing an extension. To propose an extension idea, post to Piazza with a subject line like this:
####### Bonus proposal:
In the message, use the following format: Idea: I propose to . (Describe your idea.) Effort: I expect this will entail . (Estimate effort in terms of steps, hours, and/or sloc.) Proposed credit: bonus points We will respond with either an approval or an adjustment of your proposed points. If you don’t receive a reply within 24 hours of your post (or latest edit), your proposal is automatically approved. Sharing ideas. Proposals may be private (note to instructor and TAs) or public (even if anonymous to your classmates). Public proposals may attempted by others (separately). Public is nice but not required. Fairness. If an extension is approved for one person, others may use it, as well. It is impossible to ensure perfect equivalence of points/effort between bonus extensions. Programming effort is hard to compare. Different people require different amounts of time and code to accomplish the same goal. I will do my best. Adjustments. If your bonus ends up being much more involved that it first appears, we will consider increasing the points. If you find a shortcut that makes it trivial (e.g., 100 sloc 5 sloc), we might ask you to either extend it or decrease the amount of credit. We consider this unlikely, and reserved for extreme cases. Examples. The following are examples, and automatically approved. We encourage you to think of your own. Very creative extension ideas (i.e., things we wouldn’t have thought of ourselves) may receive additional credit.
- Add options to specify different colors or styles of glasses. 1 point
- Add mustache or hat. 2 points
- Scrape the Purdue ECE faculty directory and make your own directory with links to modified photos for all Purdue faculty. 4 bonus points
- Detect if person is already wearing glasses. 5 bonus points Notification. If you attempt a bonus, make a note in a file called BONUS.txt with the Piazza message URL of your proposal or the example number (if you use one above).
Rules
Code reuse allowed with limitations
This is an individual project. You are expected to develop the project on your own. However, some code reuse from the web is not uncommon in some software development projects. It depends on the organization you are working for and/or the project rules and goals. Read this section carefully before copying any amount of code from anywhere. What is/isn’t allowed. For this project, some code reuse from the web is allowed. There are limitations. You may reuse example code from the official Python documentation, provided you give credit with the following comment (exactly).
Credit: Adapted from example in Python 3.4 Documentation, urllib.request
License: PSFL https://www.python.org/download/releases/3.4.1/license/
https://docs.python.org/3.4/library/urllib.request.html
You may copy example code from the official Flask, OpenCV, and lxml documentation, provided the license allows it and you give attribution in a comment very similar to the snippet above. You may copy any example snippet from this document, unless otherwise prohibited in/near the snippet. You may reuse up to 10 logical lines of code (lloc) from preexisting sources on the web, provided the source has an explicit license that allows reuse (e.g., MIT License, Creative Commons, etc.), and attribution is noted in your code, as specified below. Code copied from the official Python, Flask, OpenCV, and lxml documentation, and from this document, does not count toward the 10-line limit. StackOverflow uses a Creative Commons license for all posted snippets, unless otherwise mentioned. Snippets from the Python documentation do not count toward this limit. You may not use code written by anyone at Purdue (including course staff), even if it is posted online, unless explicitly allowed. Attribution. You must give explicit credit in a comment that begins with "Credit: " and includes the following information: author and/or project name (whichever is most important) For Stackoverflow (or similar) posts, click the profile and try to find the person’s name. If anonymous, then list the credit as ‘user123’ at Stackflow (for example). license (e.g., MIT, CC-BY-3.0) Give URL to license for any license other than MIT, BSD, and CC-*. URL where you found the code Be clear about what is reused (e.g., above 3 lines, this function, etc.) and whether you Copied (verbatim), Copied with minor changes (near-verbatim), or Adapted (copied as starting point, but with major changes). Example:
Credit: Adapted from example in Python 3.4 Documentation, urllib.request
All other code reuse, including from any of the following, is strictly prohibited unless explicitly allowed. Code written by someone else in this class (even if it is posted online) Example code from ECE 36400 lecture slides or examples. Example code written or posted by anyone at Purdue. Python modules other than what is specified in this document. Allowed modules: Python standard library modules (e.g., os, urllib, shutil, etc.), lxml, flask, OpenCV (cv2), validator, Pillow/PIL, google-cloud-vision is allowed for those doing relevant bonuses. Any other source not explicitly allowed. Suggesting other Python modules. Feel free to suggest Python modules that you would like to use. We will consider adding them. (We will not add BeautifulSoup, a popular scraping module, because its API design is contrary to what we teach in this course.)
Copyright and terms of service
Do not violate the copyright of any person or entity. Historically, copyright law has allowed brief storage of web content, in the course of providing some service (e.g., proxy servers). Do not violate the terms of service of any web site. Social network sites typically have specific policies on web scraping. Do not use this on such sites. Test only with home pages that do not have such a notification. If you are unsure about copyright or terms of service issues, check with the instructor.
Security
Universities are a popular target for malicious hackers due to the relatively open infrastructure, high bandwidth, personal information of prominent individuals, and access to technical secrets. Some servers are protected by firewalls that block access from outside the university. However, once an adversary gains access to a machine or account in a university, they can use that internal location to attack internal protected targets. Your application matters. You might think a small class project application would be irrelevant to security concerns, but the opposite is sometimes true. Small, ad hoc applications (e.g., class projects) are often written hastily without systematic consideration of security, code review, and audits. However, you can greatly reduce the risk that your application will be misused by following two four rulesat all times. Follow these security rules at all times (including development, debugging, testing, etc.). Failure to follow these rulesat any time during development, debugging, and testing of the projectwill result in penalties of 10-20% (depending on severity) of the total possible project score, per occurrence.
- Always use the code below to start your server. app.run(host="127.0.0.1", port=os.environ.get("ECE364_HTTP_PORT", 8000 ), use_reloader=True, use_evalex=False, debug=True, use_debugger=False) Explanation The required parameters to app.run(…) disable features that would increase your risk. host="127.0.0.1" ensures that your application is only accessible to people logged into ecegrid via SSH (i.e., browsing through a tunnel) or ThinLinc.
use_evalex=False turns off a feature that lets you debug in the browser. While convenient, it would
give an attacker permanent command line access to your account. The meager security mechanism it
uses is not hardened enough to be trusted.
use_debugger=False ensures that tracebacks are not shown in the browser (in case of errors). You
can get the same information at the command line. Seeing the errors in the browser is slightly more
elegant to look at, but it enables an attacker to examine your code and find vulnerabilities.
- Do not pass user input (e.g., request.args, request.form, etc.) to subprocess.(…) or os.system(…), or use as part of filenames. If you’re unsure, the safest path is to just not use subprocess.(…) or os.system(…), and name all files with strings given in your code. Explanation Many attacks occur because an attacker sends inputs that the application was not expecting. They
####### can easily send your application inputs even without using your HTML form. There are many tricks.
Some people think they can sanitize the inputs to make them safe (e.g, by removing certain
characters). That is risky because many attackers find clever ways to defeat your defenses.
- Do not use eval( …… ) , exec( …… ) , or execfile( …… ) in your code . Explanation These were not covered in ECE 36400, so it is unlikely that you would use them. They execute code given as a string (or file). They are rarely needed. In web programming, if you accidentally pass user-provided text (e.g., from URL parameter) to them, then an attacker can compromise your application, and permanently control your account. Note: These rules implement multi-layered security. We must always assume some of our defenses will fail.
Ethics
We are trying something new and (we hope) fun for the project this semester. However, there is potential for pranks or harassment in the form of defacing other people’s home pages. Do not modify any photo or home page content in a way that may reasonably be expected to disparage, defame, provoke, or harass any person. That includes adding unkind textual annotations, body parts, body fluids, changing eye color, stretching to change shape, etc. Adding eyeglasses to a person’s eyes is allowed (unless you have reason to believe it would upset them). Ethics education is required by ABET, the accrediting organization that certifies Purdue to grant engineering degrees. The IEEE, ACM, AAAI, and all other major professional organizations have ethical standards. Misusing project resources (e.g., ecegrid, code snippets, etc.) to disparage, defame, provoke, or harass, will result in penalties of 10-30% (depending on severity) of the total possible project score, per occurrence.
Q&A
This will be filled in later.
Updates
Updates will be logged here. 11/11/2018 1:30 PM Posted 9:35 PM Correction: Python file should be main.py, in your hpo/ directory (not in templates). 10:20 PM By request, the module validators is now installed and allowed. It may help with checking if a web address is valid. You could also use the standard module urlparse. 11/12/2018 10:52 AM Clarification: Highlighted the Python filename (main.py) in yellow; added a few words. 11/19/2018 6:00 PM Minor clarifications. lang="en" in HTML boilerplate 11:41 PM Submission instructions; more explanation about lang="en" 11/24/2018 6:51 PM Added instructions for Milestones 2 and 3. 11/30/2018 3:52 PM Added clarification about base filename to Milestone 2 step 8. 4:40 PM Attribution is not needed on hashlib.sha1(b"").hexdigest().
####### 4:49 PM Underlined "This helper function will use the helper functions you created above."
5:05 PM Added warning about not trusting URL to get file extension.
Added tip to Milestone 2 step 3 about GIF images.
12/1/2018 1:50 AM Changed variable name from gray to img_grayscale.
2:00 PM Clarified the list of modules that may be used.