0%

Coursera | Introduction to Data Science in Python(University of Michigan)| Assignment1

   u1s1,这门课的assignment还是有点难度的,特别是assigment4(哀怨),放给大家参考啦~
   有时间(需求)就把所有代码放到github上(好担心被河蟹啊)
   相关链接:
   Coursera | Introduction to Data Science in Python(University of Michigan)| Quiz
   Coursera | Introduction to Data Science in Python(University of Michigan)| Assignment1
   Coursera | Introduction to Data Science in Python(University of Michigan)| Assignment2
   Coursera | Introduction to Data Science in Python(University of Michigan)| Assignment3
   Coursera | Introduction to Data Science in Python(University of Michigan)| Assignment4
   CSDN链接:
   Coursera | Introduction to Data Science in Python(University of Michigan)| Quiz答案
   Coursera | Introduction to Data Science in Python(University of Michigan)| Assignment1
   Coursera | Introduction to Data Science in Python(University of Michigan)| Assignment2
   Coursera | Introduction to Data Science in Python(University of Michigan)| Assignment3
   Coursera | Introduction to Data Science in Python(University of Michigan)| Assignment4

assignment1挺简单的,就是个入门。

Assignment 1

For this assignment you are welcomed to use other regex resources such a regex “cheat sheets” you find on the web.

Before start working on the problems, here is a small example to help you understand how to write your own answers. In short, the solution should be written within the function body given, and the final result should be returned. Then the autograder will try to call the function and validate your returned result accordingly.

1
2
3
4
5
6
7
8
9
10
11
def example_word_count():
# This example question requires counting words in the example_string below.
example_string = "Amy is 5 years old"

# YOUR CODE HERE.
# You should write your solution here, and return your result, you can comment out or delete the
# NotImplementedError below.
result = example_string.split(" ")
return len(result)

#raise NotImplementedError()

Part A

Find a list of all of the names in the following string using regex.

Code

1
2
3
4
5
6
7
8
9
import re
def names():
simple_string = """Amy is 5 years old, and her sister Mary is 2 years old.
Ruth and Peter, their parents, have 3 kids."""

# YOUR CODE HERE
# raise NotImplementedError()
pattern = "[A-Z][a-z]*"
return re.findall(pattern, simple_string)
1
assert len(names()) == 4, "There are four names in the simple_string"

结果



Part B

The dataset file in assets/grades.txt contains a line separated list of people with their grade in
a class. Create a regex to generate a list of just those students who received a B in the course.

Code

1
2
3
4
5
6
7
8
9
import re
def grades():
with open ("assets/grades.txt", "r") as file:
grades = file.read()

# YOUR CODE HERE
# raise NotImplementedError()
pattern = "[\w ]*:\ B"
return re.findall(pattern, grades)

   下面这个也可以。其实都可以,两个的区别是是否包含成绩。

1
2
3
4
5
6
7
8
def grades():
with open ("assets/grades.txt", "r") as file:
grades = file.read()

# YOUR CODE HERE
# raise NotImplementedError()
pattern = "[\w]*\ [\w]*(?=:\ B)"
return re.findall(pattern, grades)
1
assert len(grades()) == 16

结果

   包含成绩:

   不包含成绩:

Part C

Consider the standard web log file in assets/logdata.txt. This file records the access a user makes when visiting a web page (like this one!). Each line of the log has the following items:

  • a host (e.g., ‘146.204.224.152’)
  • a user_name (e.g., ‘feest6811’ note: sometimes the user name is missing! In this case, use ‘-‘ as the value for the username.)
  • the time a request was made (e.g., ‘21/Jun/2019:15:45:24 -0700’)
  • the post request type (e.g., ‘POST /incentivize HTTP/1.1’ note: not everything is a POST!)

Your task is to convert this into a list of dictionaries, where each dictionary looks like the following:

1
2
3
4
example_dict = {"host":"146.204.224.152", 
"user_name":"feest6811",
"time":"21/Jun/2019:15:45:24 -0700",
"request":"POST /incentivize HTTP/1.1"}

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import re
def logs():
with open("assets/logdata.txt", "r") as file:
logdata = file.read()

# YOUR CODE HERE
# raise NotImplementedError()
pattern = """
(?P<host>[\d]*.[\d]*.[\d]*.[\d]*)
(\ -\ )
(?P<user_name>[\w-]*)
(\ \[)
(?P<time>\w*/\w*/.*)
(\]\ \")
(?P<request>.*)
(")
"""
# YOUR CODE HERE
result = []
for item in re.finditer(pattern, logdata, re.VERBOSE):
result.append(item.groupdict())
return result
1
2
3
4
5
6
7
assert len(logs()) == 979

one_item={'host': '146.204.224.152',
'user_name': 'feest6811',
'time': '21/Jun/2019:15:45:24 -0700',
'request': 'POST /incentivize HTTP/1.1'}
assert one_item in logs(), "Sorry, this item should be in the log results, check your formating"

结果

  部分:



   大家其他还有需要的就在评论留言哦 :) 欢迎讨论分享~

------------------   The End    Thanks for reading   ------------------