Google BigQuery Analytics

Google BigQuery Analytics

Jordan Tigani, Siddartha Naidu

Language: English

Pages: 528

ISBN: 1118824822

Format: PDF / Kindle (mobi) / ePub


How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets

Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. The book uses real-world examples to demonstrate current best practices and techniques, and also explains and demonstrates streaming ingestion, transformation via Hadoop in Google Compute engine, AppEngine datastore integration, and using GViz with Tableau to generate charts of query results. In addition to the mechanics of BigQuery, the book also covers the architecture of the underlying Dremel query engine, providing a thorough understanding that leads to better query results.

  • Features a companion website that includes all code and data sets from the book
  • Uses real-world examples to explain everything analysts need to know to effectively use BigQuery
  • Includes web application examples coded in Python

Responsive Web Design with HTML5 and CSS3

Professional Python Frameworks: Web 2.0 Programming with Django and Turbogears (Programmer to Programmer)

Django Design Patterns and Best Practices

Twitter API: Up and Running: Learn How to Build Applications with the Twitter API

Learning SQL Server Reporting Services 2012

 

 

 

 

 

 

 

 

 

 

 

technologies: Are BigQuery tables Bigtables, for example? Is user data stored in GFS? This section attempts to answer, at a high level, how BigQuery relates to the Google infrastructure stack. Chapter 9 goes into more detail about the architecture; if you’re interested in how these systems work, you may want to skip ahead. If Chapter 9 isn’t enough detail for you, it provides references to the research papers that Google has published on the underlying technologies. Metadata Storage BigQuery

store; you can think of it as key value storage. Although it does have some support for queries and indexes, it is not well suited to ad-hoc queries over your data. It is, however, fast for point lookups and indexed queries, and it scales virtually infinitely. Cloud Datastore is backed by Google’s Megastore distributed consistent storage system. If you want to run analytics queries over your Cloud Datastore storage, you can export it to BigQuery. Chapter 11, “Managing Data Stored in BigQuery,”

"bigquery-e2e" }, { "id": "420824040427" } ], "totalItems": 3 } ETags and the If-None-Match Header One of these advanced HTTP features that can come in handy with BigQuery is the combination of ETags and the If-None-Match HTTP header. Sometimes, you want to know if a resource or list of resources has changed since the last time you read it. ETags are a convenient mechanism to do this; they are fingerprint values that are returned in the API call. If you read the same object twice and it has the

responses from Java (ResultReader.java) import import import import import import import import com.google.api.client.util.Data; com.google.api.services.bigquery.model.GetQueryResultsResponse; com.google.api.services.bigquery.model.QueryResponse; com.google.api.services.bigquery.model.TableCell; com.google.api.services.bigquery.model.TableDataList; com.google.api.services.bigquery.model.TableFieldSchema; com.google.api.services.bigquery.model.TableRow;

to load jobs. Access control works as you would expect; the creator of the job must have reader access for all the files you enumerate in the sourceUris list. If you include a glob in the list, you must also have reader access on the bucket, which grants permission to list the contents of a bucket. Because GCS and BigQuery are both a part of the Google Cloud Platform, it is easy to forget that loading data from GCS into BigQuery creates an additional copy of your data. The data stored in GCS is

Download sample

Download