April 12, 2017

How to Allow MySQL Remote Access in Ubuntu Server 16.04 on AWS

Step1:  Enable MySQL Server Remote Connection in Ubuntu
By default MySQL Server on Ubuntu run on the local interface, This means remote access to the MySQL Server is not Allowed. To enable remote connections to the MySQL Server we need to change value of the bind-address in the MySQL Configuration File.
Search  mysqld.cnf file
$ mlocate mysqld.cnf
Output: /etc/mysql/mysql.conf.d/mysqld.cnf

Edit and modify mysqld.cnf  file to change bind-address   from 127.0.0.1 to  0.0.0.0

sudo nano /etc/mysql/mysql.conf.d/mysqld.cnf
bind-address            = 0.0.0.0



Step 2. Modify PRIVILEGES
mysql> GRANT ALL ON *.* TO 'root'@'localhost';
Query OK, 0 rows affected (0.00 sec)

CREATE USER 'root'@'%' IDENTIFIED BY 'yourpassword';
mysql> GRANT ALL ON *.* TO 'root'@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> FLUSH PRIVILEGES; 


Query OK, 0 rows affected (0.00 sec)


Step 3. Allow remote computer to allow port 3306 



Step 4. Restart Server

$ sudo /etc/init.d/mysql  restart

-->

April 05, 2017

Python to Access Web Data



What is web scraping?

Web sites are written using HTML, which means that each web page is a structured document. Web sites don’t always provide their data in comfortable formats such as csv or json.
This is where web scraping comes in. Web scraping is the practice of using a computer program to sift through a web page and gather the data that you need in a format most useful to you while at the same time preserving the structure of the data. Understanding HTML Basics Scraping is all about html tags. I will be using two Python modules for scraping data.
  • Urllib
  • Beautifulsoup

Parsing HTML using Urllib
Using urllib, you can treat a web page much like a file. You simply indicate
which web page you would like to retrieve and urllib handles all of the HTTP
protocol and header details. We can construct a well-formed regular expression to match and extract the link values from the above text as follows:

href="http://.+?"
The question mark added to the “.+?” indicates to find the smallest possible matching string and tries to find the largest possible matching string.



import urllib
import re
url = raw_input('Enter - ')
html = urllib.urlopen(url).read()
links = re.findall('href="(http://.*?)"', html)
for link in links:
print link  # tab 


Parsing HTML using BeautifulSoup

BeautifulSoup library is used parse some HTML input and lets you easily extract the data you need. 
You can download and “install” BeautifulSoup or you can simply place the
BeautifulSoup.py file in the same folder as your application.
We will use urllib to read the page and then use BeautifulSoup to extract the
href attributes from the anchor (a) tags.

import urllib
from bs4 import BeautifulSoup
url = raw_input('Enter - ')
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
print tag.get('href', None)    # tab

print 'TAG:',tag
print 'URL:',tag.get('href', None)
print 'Content:',tag.contents[0]

print 'Attrs:',tag.attrs

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...