MAS.500 - How to build software - taught but Rahul Bhargava
Homework for this class is being stored in this Github repository.
Lecture 1 - Intro and basics
- Assembly code is made up of ~ 200 instructions (ADD, LD, STR), more complex instructions can carry out parallel computations (based on the GPU)
- Assembly code instructions can fetch values from the memory and put them in registers
- Different languages compile in different ways - check how python and javascript differ... https://blog.glyphobet.net/essay/2557/
- Imperative code doesn't scale well
- Procedural code uses functions/procedures to separate things
- Object oriented - encapsulate complexity via abstraction to prevent spagetti code, enabled using classes, in Python, classes are defined by a module. The class describes the object
- In a Python class setup using
def __init__(self, filemane): self.filename = filename
to get all the functions with self to keep track of the file name
- The most important audience of your code is your future self in 6 months time
- Event driven code limits the scope to reduce complexity - almost always runs around and around a loop -
while True:
- Functional programming - takes an input that goes through a function and produces an output - makes it easier to scale reliably.
names = [line.split(",")[1] for line in all_lines] - Scala is a functional programming language so doesn't let you hold onto state
- The packages required in a python project are stored in requirements.txt so use
pip install requirements.txt
- Use a virtual environment to store the current libraries at that time and run the project inside of. To run separate OS's then you use containers and to control the hardware that the stuff actually runs on then you use virtual machines.
- API - An Application Programming Interface requires a key and user ID. Some of them have a python client. We'll be using Mediacloud as an example
- Questions - Is object-oriented enabled by class structures, how does the functional example not hold onto state (what is the state in imperative)
Lecture 2 - Building software strategically
- Unit test - critcal test for the most used components and all the edge cases that they may undergo
- Data structures - Stack (First In First Out), queue (Last in First Out)
- Logging - way to figure out when something breaks and what happened when it broke, you can use Sentry for displaying logs created with the logging module in Python
- Data for services logging, analytics, server storage all go through HTTP, encoded in json.
- Licenses - the MIT license is a general open source license that people use, it doesn't have to be a project in MIT. You apply a license by including a LICENSE file in the repository.
- Majority of IDE's these days are specialised versions of Eclipse. IDE's are good for debugging tool and linting (enforcing norms for good code)
- Some architectures like Rail and iOS bake in MVC (Model, View, Controller) as the file names. You can have thick models (that contain all the business logic) and thin controllers, or thin models and thick controllers.
- When logging you can adjust the logger level. At the DEBUG logger level you get more details. Levels of logs include
info
, debug
, warn
and error
Lecture 3 - Building for the web
- Approaches - Content Management Systems - CMS (Wordpress, Drupal), micro-frameworks (Sinatra - Ruby, Flask - Python), macro-frameworks (Rails - Ruby, Django - Python, metero - JS)
- Traditionally the software stack of a web app is - server (Flask/backend) > HTML > CSS > JS. Over time, more functionality is moving to the front end into JS.
- React is a JavaScript library for building user interfaces. React lets you do HTML where you're doing you're JS (and sometimes the CSS too).
- You can get React apps up and running quickly by using create-react-app with
npm install -g create-react-app
. You can then set up a single page application using create-react-app my-app
, cd my-app
and npm start
.
- Redux is how a react app talks to the server
- You use HTTP 'get' to fetch things from a server and the server gives back using 'put' or some other relevant
verb.
- Use tool in Chrome Inspect/Network/Slow 3G to experience your app with a simulated slow network. This doesn't represent latency as a result of outdated hardware. Also apps that use websockets (such as Google docs) don't get slowed down by this tool.
- There's a big reaction to the extensively gimicky web designs that throw as much click bait as you as possible.
This reaction is demonstrated at http://brutalistwebsites.com/ and http://motherfuckingwebsite.com/.
- Progressive web apps cache files previously loaded from the server so when you revisit the site you only need
to make an API call.
- So Facebook has a list of templates (profile_template, newsfeed_template) which are stored on a server. And so
the user gets a template which is seeded with data from the Facebook database. Increasingly, the templates are stored as raw files in the browser cache. And then the user only has to make a single call through an API to a database stored on a separate server.
- All this adds up to metrics including time to load a website which is a function of a data call through an API to a database on a server.
- When you want to spin up a quick server to host a web app then Heroku is best (build on top of AWS). Amazon Web
Services (AWS) is a collection of cloud-computing services. The service that best matches Heroku is AWS Elastic Beanstalk.
- Flask uses decorators such as
@app.route("/")
to specify code for tasks and templates to use for
different URLs. Templates in Flask are written using Jinja2. Jinja2 is a full featured template engine for Python. React uses routes.js to specify where to go for each URL.
- Redux provides a state container for JavaScript applications that will help your applications behave
consistently.
- Questions - Don't understand the file structure for React, using the date html field and converting this to
date time without having to pull out characters of the string, how are you meant to interpret the links in the API documentation
Intro to WebGL and shader programs to use the GPU for computation
- In WebGL, the GPU is used to receiving images which it can render for the user. Shader programs repurpose the GPU to carry out arbitrary mathematical calculations.
- A WebGL context is instantiated within a canvas.
- Shader code is usually kept in script tags in the head of an HTML file. Amanda puts most of her code in
2d-fragment-shader
- This reference card has lots of information about how to use WebGL.
- Information for each pixel is stored in the r, b, g and/or a value for that pixel.
- The most advanced form of Amanda's projects that utilize the GPU separate out the calculations for rendering from the mathematical functions.
- Nice web design kit is Bootstrap and FlatUI. Bootstrap modals are good for help boxes and JQuery is helpful too.
Lecture 4 - Building mobile applications
- Current state of mobile apps is that they minimise the number of tasks per app. Facebook divides messaging into a separate app.
- Be aware of the types of phones, connectivity and environment that people will be using your app.
- There are useful design guides for the size of features like buttons and gestures that are required to make apps useable
- Apps are constantly stopped and started based on events like loosing connectivity or moving between different apps. Mobile apps have constrained resources
- React nativeb> is wrapped inside many major mobile apps. Create React Mative is a bunch of command line
options that help speed up the development cycle. Expo, Ignite and Ionic are other cross platform mobile
development tools for building React Native apps
- There are several platforms to choose from including Android, iOS and Cordova. Cordova wraps your HTML/JavaScript
app into a native container which can access the
device functions of several platforms.
- Writing a requirements list in terms of expected connectivity, platforms and hardware on the phone which is
required is really important and useful for choosing which technology you're going to use to make the app.
- In order to over ride UI styles nad manage complexity in Ionic you need to master Angular which can be a headache
.
Lecture 5 - Managing data
- Relational databases are most commonly used. An object has various attributes and the different objects and relations relate to one another as many to many, or many to one, or one to one relationships.
- Link tables are used for many to many relationships?
- The point of this type of data storage is to ensure that data isn't replicated, so it doesn't have to be updated in multiple places
- Keys - ID might be the primary key. You can also used foreign keys.
- SQL (Structured Query Language) - Standard method to extract data from a relational database. SQL is often written using a data abstraction layer
- NoSQL
- Document databases - When data isn't relational or you want a flexible way to quickly store data e.g. logging, document storage, low latency
- MongoDB exists to let you save a json element very quickly. MongoDB lets you use methods like map reduce which lets you split up your data to let you parallelize data management
- Transactions - When you're doing things in a relational database you need to do many things at once. Transactions contain a list of changes to be made. If any of the tasks aren't carried out then the commit isn't made and everything is rolled back.
- B-trees - Speed up searching for data. You make rows as you grow the tree, you need to keep balancing the tree. Explain lets you do query optimization
- Version control in databases is managed using migrations
- Stick to conventions - use the created_at and updated_at columns
- SQLAlchemy is the Python SQL toolkit and Object Relational Mapper. Using a code layer on top like this can ensure that your SQL is more secure (robust to hacks)
- SQLite is not a client–server database engine. Rather, it is embedded into the end program. Faster to get up and running with
- Relational databases can be stored using a range of cloud services and accessed using a connection string. That's overkill for short term, quick projects.
Lecture 5 cont. - Machine learning
- Further notes and documentation at https://colcarroll.github.io/working_ml/
- All algorithms minimize a thing, you should know what this is and why you picked it.
- Machine learning is useful for certain tasks, not ALL!
- Regression and classification - Supervised method for labelled data
- Clustering - Unsupervised learning, can be dangerous in that it's more opaque.
- Analysis is normally incorporated into a data science role, relies heavily on statistics. Tools include Excel, Jupyter, Tableau
- Products can involve some kind of machine learning, tools involve Python, R and Scala
- There can be a trade of between explainable vs accurate algorithms. Interactive models can help bridge the gap between the two.
- Algorithms can stream results live as data comes in or batch which are easier to plan and manage (Spotify Discover weekly is most probably a batch algorithm)
- sklearn is a python library to apply a range of machine learning algorithms. Use tab complete where you initialise an algorithm to learn what the algorithm uses to get to work. sns.pairplot lets you create a table of plots
- seaborn is library built on top of matplotlib to make it a bit easier to use
- Spacy is an open source library for NLP (Natural Language Processing). You can use 'named entities' to extract important parts/words in text
- Keras is a deep learning python library which wraps around TensorFlow
Lecture 6 - Prototype to product
- A well built app will separate the software, database and hardware are separate entities.
- Software - Heroku, Joylent and Engine Yard can store the back end of your product in containers.
- Data hosting - MongoHQ, MongoLab and Object Rocket can offer this microservice.
- Local environment, development copy, staging copy and product are often used as different stages of product management pipeline.
- CDN - Content Delivery Networks are operated by companies like Akamai and Cloudflare. They host copies of your app that don't change around the world to minimise latency for all users. You could point the CDN to your Heroku container.
- To host a web app in Heroku or , set up Heroku as a location that you can push to using git.
- Containers don't hold state so don't hold photos in there, put photos in the database.
- Chron jobs can be set up for tasks that are recurring if you're operating something like Dokku on your own server. On Heroku you can use Scheduler which is a free add-on for running jobs on your app at scheduled time intervals.
- Questions on Heroku deployment - difference between requirements.pip and requirements.txt?, what is the role of gunicorn and what if I don't have a flask/bin/pip folder to install it in, should I have a better file system for things like Procfiles?
- Gunicorn - Flask speaks http but it isn't built to handle loads beyond you running a local server on your computer. Gunicorn sits between the web server (NGINX and Apache) and the request from computer. The app I'm building is configured to run certain code with gunicorn. Gunicorn sits between web server and code to help manage the code. Gunicorn is set up to scale in a way that Flask isn't.
- Heroku - In order to trouble shoot when deploying to Heroku, use deployment logs to see what's happening when you're actually pushing it (check for errors). Otherwise look at the logs from your app on the Heroku UI, or go to Sentry if you've got that set up.
- Cloud9 (recently purchased by AWS) lets you write, run and debug code in the browser which means you don't need to spend time setting up software and dependencies on your computer. You can also use this for Google docs style collaborative coding.
- Programming for hardware is typically referred to as embedded programming. To do this you'd typically use a Real Time Operating System (RTOS) which run faster as they're lower level. You're normally resource constrained so need to be careful about memory usage. You'd typically use interrupts which are inputs which send signals straight to pins on the microcontroller. RTOS stuff have more rigorus testing suites.
- Don't think about programming skills in terms of individual languages, it's more about learning a full workflow and the syntax, conventions, libraries, tools and services needed to deploy a web app or an iOS app.
Lecture 7 - Building visualizations
- Approaches include configuration, transformation and render
- In the configuration approach you set up some metadata (attributes of the graph/visualization), you pass that metadata to the library and the library renders it for you - E.g. Highcharts is a JS library with a big community and lots of examples
- The transformation approach utilizes a certain amount of operations to transform the data before you present it - E.g. d3
- The render approach, you think about what's going on the screen at ta very low level, perhaps at a pixel level using WebGL and shaders. And you respond to events that happen in the browser E.g. p5.js
- Desktop based interactive visualization - Processing is the defacto standard, Flash used to be but has a diminishing following. Game design tools like Unity are gaining in usage, they have game engine and game physics built in.
- Complicated visuals and simpler maths work best in framebased tools like Processing, simplier visuals and more complex maths work better in persistent tools like d3.
- Using p5.js you can create interactive video that pulls in information realtime from API's and databases.