10 Jul Building Enterprise AI Applications
By Timothy Chou
It’s no secret that over the past 4 years there have been dramatic improvements in the usage of AI technology to recognize images, translate text, win the game of Go or talk to us in the kitchen. Whether it’s Google Translate, Facebook facial recognition or Amazon’s Alexa these innovations have largely been focused on the consumer.
On the enterprise side progress has been much slower. We’ve all been focused on building data lakes (whatever that is), and trying to hire data scientists and machine learning experts. While this is fine, we need to get started building enterprise AI applications. Enterprise AI applications serve the worker not the software developer or business analysts. The worker might be an fraud detection specialist, a pediatric cardiologist or a construction site manager. Enterprise AI applications leverage the amazing amount of software that has been developed for the consumer world. These applications have millennial UIs and are built for mobile devices, augmented reality and voice interaction. Enterprise AI applications use many heterogeneous data sources inside and outside the enterprise to discover deeper insights, make predictions, or generate recommendations. A good example from the consumer world is Google Search. It’s an application focused on the worker, not the developer, with a millennial UI and uses many heterogeneous data sources. Open up the hood and you’ll see a ton of software technology inside.
With the advent of cloud computing, and continued development of open source software, building application software in the past 5 years has changed dramatically. It might be as dramatic as moving from ancient mud brick to modern prefab construction. As you’ll see we have a ton of software technology that’s become available. Whether you’re an enterprise building a custom application, or a new venture building a packaged application, you’ll need to do three things.
- Define the use-case. Define the application. Who is the worker? Is it an HR professional, reliability engineer or a pediatric cardiologist?
- The Internet is the platform. Choose wisely. We’ll discuss this more in depth in this article.
- Hire the right team. The teams will have a range of expertise including business analysts, domain experts, data scientists, data engineers, devops specialist and programmers.
For enterprises that are considering building scalable, enterprise-grade AI applications it’s never been a better time — there are hundreds of choices, many inspired by innovations in the consumer Internet. To understand the breadth I’ve arbitrarily created sixteen different categories, with a brief description and some example products. We’ll mix both open source software, which can run on any compute and storage cloud service along with managed cloud services.
- Compute & Storage Cloud Services provide compute and storage resources on demand, managed by the provider of the service. While you could build your application using on-premises compute & storage, it would both increase the number of technology decisions and raise the overall upfront cost both in capital equipment and people to manage the resources. Furthermore the ability to put a 1000 servers to work for 48 hours for less than a $1000 is an economic model unachievable in the on-premises world. Choices include but are not limited to AWS, Google Cloud, Microsoft Azure, Rackspace, IBM Cloud, AliCloud.
- Container Orchestration. VMWare pioneered the ability to create virtual hardware machines, but VMs are heavyweight and non-portable. Modern AI applications are using containers based on OS-level virtualization rather than hardware virtualization. They are easier to build than VMs, and because they are decoupled from the underlying infrastructure and from the host file system, they are portable across clouds and OS distributions. Container orchestration orchestrates computing, networking, and storage infrastructure on behalf of user workloads. Choices include but are not limited to Kubernetes, Mesos, Swarm, Rancher and Nomad.
- Batch Data Processing. As data set sizes get larger, an application needs a way to efficiently process large datasets. Instead of using one big computer to process and store the data, modern batch data processing software allows clustering commodity hardware together to analyze large data sets in parallel. Choices include but are not limited to Spark, Databricks, Cloudera, Hortonworks, AWS EMR and MapR.
- Stream Data Processing. An AI application, which is designed to interact with near real-time data, will need streaming data processing software. Streaming data processing software has three key capabilities: publish and subscribe to streams of records; store streams of records in a fault-tolerant durable way and finally the ability to process streams of records as they occur. Choices include but are not limited to Spark Streaming, Storm, Flink, Apex, Samza, IBM Streams.
- Software Provisioning. From traditional bare metal to serverless, automating the provisioning of any infrastructure is the first step in automating the operational life cycle of your application. Software provisioning frameworks are designed to provision the latest cloud platforms, virtualized hosts and hypervisors, network devices and bare-metal servers. Software provisioning provides the connecting tool in any of your process pipelines. Choices include but are not limited to Ansible, Salt, Puppet, Chef, Terraform, Troposphere, AWS CloudFormation, Docker Suite, Serverless and Vagrant.
- IT Data Collect. Historically, many IT applications were built on SQL databases. Any analytic application will need the ability to collect data from a variety of SQL data sources. Choices include but are not limited to Teradata, Postgres, MongoDB, Microsoft SQL Server and Oracle.
- OT Data Collect. For analytic applications involving sensor data, there will be the need to collect and process time-series data. Products include traditional historians such as AspenTech InfoPlus.21, OSISoft’s PI, Schneider’s Wonderware and traditional database technologies extended for time-series such as Oracle. For newer applications product choices include but are not limited to InfluxDB.Cassandra, PostgreSQL, TimescaleDB, OpenTSDB.
- Message Broker. A message broker is a program that translates a message from a messaging protocol of the sender, to a messaging protocol of the receiver. This means that when you have a lot of messages coming from hundreds of thousands to millions of end points, you’ll need a message broker to create a centralized store/processor for these messages. Choices include but are not limited to Kafka, Kinesis, RabbitMQ, Celery, Redis and MQTT.
- Data Pipeline Orchestation. Data engineers create data pipelines to orchestrate the movement, transformation, validation, and loading of data, from source to final destination. Data pipeline orchestration software allows you to identify the collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies. Choices include but are not limited to Airflow, Luigi, Oozie, Conductor and Nifi.
- Performance Monitoring. Performance of any application, including analytic applications requires real time performance monitoring to determine bottlenecks and ultimately be able to predict performance. Choices include but are not limited to Datadog, AWS Cloudwatch, Prometheus, New Relic and Yotascale.
- CI/CD. Continuous integration (CI) and continuous delivery (CD) software enables a set of operating principles, and collection of practices that enable analytic application development teams to deliver code changes more frequently and reliably. The implementation is also known as the CI/CD pipeline and is one of the best practices for devops teams to implement. Choices include but are not limited to Jenkins, Circle CI, Bamboo, Semaphore CI and Travis.
- Backend Framework. Backend frameworks consist of languages and tools used in server-side programming in an analytic application development environment. A backend framework is designed to speed the development of the application by providing a higher-level programming interface to design data models, handle web requests, and other commonly required features. Choices include but are not limited to Flask, Django, Pyramid, Dropwizard, Elixir and Rails.
- Front-end Frameworks. Applications need a user interface. There are numerous front end frameworks used for building user interfaces. These front end frameworks as a base in the development of single-page or mobile applications. Choices include, but are not limited to Vue, Meteor, React, Angular, jQuery, Ember, Polymer, Aurelia, Bootstrap, Material UI and Semantic UI
- Data Visualization. An analytic application needs plotting software to produce publication quality figures in a variety of hard-copy formats and interactive environments across platforms. Using a data visualization software allows you can generate plots, histograms, power spectra, bar charts, error charts, scatter plots, etc., with just a few lines of code. Choices include, but are not limited to Tableau, PowerBI, Matplotlib, d3, VX, react-timeseries-chart, Bokeh, seaborn, plotly, Kibana and Grafana.
- Data Science. Data science tools allow you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, and support for large, multi-dimensional arrays and matrices. Choices include, but are not limited to Python, R, SciPy, NumPy, Pandas, NetworkX, Numba, SymPy, Jupyter Notebook, Jupyter Labs.
- Machine Learning. Machine learning frameworks provide useful abstractions to reduce amounts of boilerplate code and speed up deep learning model development. ML frameworks are useful for building feed-forward networks, convolutional networks as well as recurrent neural networks. Choices include, but are not limited to Python, R, TensorFlow, Scikit-learn, PyTorch, Spark MLlib, Spark ML, Keras, CNTK, DyNet, Amazon Machine Learning, Caffe, Azure ML Studio, Apache MXNet and MLflow.
If you’re curious check out some of the product choices Uber made.
We need to begin the next era of enterprise software and start to build custom or packaged enterprise AI applications. Applications that serve the workers, not developers; have millennial UIs and use the oceans of data coming from both the Internet of People and the Internet of Things. Luckily many of the infrastructure building blocks are now here, so stop using those mud bricks.
Timothy Chou was was one of only six people to ever hold the President title at Oracle. He is now in his 12th year teaching cloud computing at Stanford and recently launched another book, Precision: Principals, Practices and Solutions for the Internet of Things. Invite Timothy to keynote your next event!